Edward Capriolo, Dean Wampler, Jason Rutherglen - Programming Hive [2012, PDF, ENG]

Страницы:  1
Ответить
 

kathleen1

Top Seed 02* 80r

Стаж: 12 лет 10 месяцев

Сообщений: 173

kathleen1 · 26-Янв-13 09:35 (12 лет 5 месяцев назад, ред. 12-Фев-13 09:15)

Programming Hive
Data Warehouse and Query Language for Hadoop
Год: 2012
Автор: Edward Capriolo, Dean Wampler, Jason Rutherglen
Издательство: O'Reilly Media
ISBN: 978-1-4493-1933-5
Язык: Английский
Формат: PDF
Качество: Изначально компьютерное (eBook)
Интерактивное оглавление: Да
Количество страниц: 352
Описание:Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem.This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.• Use Hive to create, alter, and drop databases, tables, views, functions, and indexes
• Customize data formats and storage options, from files to external databases
• Load and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methods
• Gain best practices for creating user defined functions (UDFs)
• Learn Hive patterns you should use and anti-patterns you should avoid
• Integrate Hive with other data processing programs
• Use storage handlers for NoSQL databases and other datastores
• Learn the pros and cons of running Hive on Amazon’s Elastic MapReduce
Примеры страниц
Оглавление
Chapter 1 Introduction
An Overview of Hadoop and MapReduce
Hive in the Hadoop Ecosystem
Java Versus Hive: The Word Count Algorithm
What’s Next
Chapter 2 Getting Started
Installing a Preconfigured Virtual Machine
Detailed Installation
What Is Inside Hive?
Starting Hive
Configuring Your Hadoop Environment
The Hive Command
The Command-Line Interface
Chapter 3 Data Types and File Formats
Primitive Data Types
Collection Data Types
Text File Encoding of Data Values
Schema on Read
Chapter 4 HiveQL: Data Definition
Databases in Hive
Alter Database
Creating Tables
Partitioned, Managed Tables
Dropping Tables
Alter Table
Chapter 5 HiveQL: Data Manipulation
Loading Data into Managed Tables
Inserting Data into Tables from Queries
Creating Tables and Loading Them in One Query
Exporting Data
Chapter 6 HiveQL: Queries
SELECT … FROM Clauses
WHERE Clauses
GROUP BY Clauses
JOIN Statements
ORDER BY and SORT BY
DISTRIBUTE BY with SORT BY
CLUSTER BY
Casting
Queries that Sample Data
UNION ALL
Chapter 7 HiveQL: Views
Views to Reduce Query Complexity
Views that Restrict Data Based on Conditions
Views and Map Type for Dynamic Tables
View Odds and Ends
Chapter 8 HiveQL: Indexes
Creating an Index
Rebuilding the Index
Showing an Index
Dropping an Index
Implementing a Custom Index Handler
Chapter 9 Schema Design
Table-by-Day
Over Partitioning
Unique Keys and Normalization
Making Multiple Passes over the Same Data
The Case for Partitioning Every Table
Bucketing Table Data Storage
Adding Columns to a Table
Using Columnar Tables
(Almost) Always Use Compression!
Chapter 10 Tuning
Using EXPLAIN
EXPLAIN EXTENDED
Limit Tuning
Optimized Joins
Local Mode
Parallel Execution
Strict Mode
Tuning the Number of Mappers and Reducers
JVM Reuse
Indexes
Dynamic Partition Tuning
Speculative Execution
Single MapReduce MultiGROUP BY
Virtual Columns
Chapter 11 Other File Formats and Compression
Determining Installed Codecs
Choosing a Compression Codec
Enabling Intermediate Compression
Final Output Compression
Sequence Files
Compression in Action
Archive Partition
Compression: Wrapping Up
Chapter 12 Developing
Changing Log4J Properties
Connecting a Java Debugger to Hive
Building Hive from Source
Setting Up Hive and Eclipse
Hive in a Maven Project
Unit Testing in Hive with hive_test
The New Plugin Developer Kit
Chapter 13 Functions
Discovering and Describing Functions
Calling Functions
Standard Functions
Aggregate Functions
Table Generating Functions
A UDF for Finding a Zodiac Sign from a Day
UDF Versus GenericUDF
Permanent Functions
User-Defined Aggregate Functions
User-Defined Table Generating Functions
Accessing the Distributed Cache from a UDF
Annotations for Use with Functions
Macros
Chapter 14 Streaming
Identity Transformation
Changing Types
Projecting Transformation
Manipulative Transformations
Using the Distributed Cache
Producing Multiple Rows from a Single Row
Calculating Aggregates with Streaming
CLUSTER BY, DISTRIBUTE BY, SORT BY
GenericMR Tools for Streaming to Java
Calculating Cogroups
Chapter 15 Customizing Hive File and Record Formats
File Versus Record Formats
Demystifying CREATE TABLE Statements
File Formats
Record Formats: SerDes
CSV and TSV SerDes
ObjectInspector
Think Big Hive Reflection ObjectInspector
XML UDF
XPath-Related Functions
JSON SerDe
Avro Hive SerDe
Binary Output
Chapter 16 Hive Thrift Service
Starting the Thrift Server
Setting Up Groovy to Connect to HiveService
Connecting to HiveServer
Getting Cluster Status
Result Set Schema
Fetching Results
Retrieving Query Plan
Metastore Methods
Administrating HiveServer
Hive ThriftMetastore
Chapter 17 Storage Handlers and NoSQL
Storage Handler Background
HiveStorageHandler
HBase
Cassandra
DynamoDB
Chapter 18 Security
Integration with Hadoop Security
Authentication with Hive
Authorization in Hive
Chapter 19 Locking
Locking Support in Hive with Zookeeper
Explicit, Exclusive Locks
Chapter 20 Hive Integration with Oozie
Oozie Actions
A Two-Query Workflow
Oozie Web Console
Variables in Workflows
Capturing Output
Capturing Output to Variables
Chapter 21 Hive and Amazon Web Services (AWS)
Why Elastic MapReduce?
Instances
Before You Start
Managing Your EMR Hive Cluster
Thrift Server on EMR Hive
Instance Groups on EMR
Configuring Your EMR Cluster
Persistence and the Metastore on EMR
HDFS and S3 on EMR Cluster
Putting Resources, Configs, and Bootstrap Scripts on S3
Logs on S3
Spot Instances
Security Groups
EMR Versus EC2 and Apache Hive
Wrapping Up
Chapter 22 HCatalog
Introduction
MapReduce
Command Line
Security Model
Architecture
Chapter 23 Case Studies
m6d.com (Media6Degrees)
Outbrain
NASA’s Jet Propulsion Laboratory
Photobucket
SimpleReach
Experiences and Needs from the Customer Trenches
Glossary
Appendix References
Colophon
Download
Rutracker.org не распространяет и не хранит электронные версии произведений, а лишь предоставляет доступ к создаваемому пользователями каталогу ссылок на торрент-файлы, которые содержат только списки хеш-сумм
Как скачивать? (для скачивания .torrent файлов необходима регистрация)
[Профиль]  [ЛС] 

Osco do Casco

VIP (Заслуженный)

Стаж: 16 лет

Сообщений: 13524

Osco do Casco · 26-Янв-13 11:08 (спустя 1 час 33 мин.)

cat.td!
Пожалуйста, размещайте книги в нужном разделе!
Если не понимаете в теме и совершенно не имеете понятия, о чем эта книга, то лучше воздержитесь от выкладывания вообще.
[Профиль]  [ЛС] 
 
Ответить
Loading...
Error