Регистрация · Вход Забыли имя или пароль?

Edward Capriolo, Dean Wampler, Jason Rutherglen - Programming Hive [2012, PDF, ENG]

Страницы: 1

Ответить


kathleen1 Стаж: 12 лет 10 месяцев Сообщений: 173	kathleen1 · 26-Янв-13 09:35 (12 лет 5 месяцев назад, ред. 12-Фев-13 09:15) Programming Hive Data Warehouse and Query Language for Hadoop Год: 2012 Автор: Edward Capriolo, Dean Wampler, Jason Rutherglen Издательство: O'Reilly Media ISBN: 978-1-4493-1933-5 Язык: Английский Формат: PDF Качество: Изначально компьютерное (eBook) Интерактивное оглавление: Да Количество страниц: 352 Описание:Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem.This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.• Use Hive to create, alter, and drop databases, tables, views, functions, and indexes • Customize data formats and storage options, from files to external databases • Load and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methods • Gain best practices for creating user defined functions (UDFs) • Learn Hive patterns you should use and anti-patterns you should avoid • Integrate Hive with other data processing programs • Use storage handlers for NoSQL databases and other datastores • Learn the pros and cons of running Hive on Amazon’s Elastic MapReduce Примеры страниц Оглавление Chapter 1 Introduction An Overview of Hadoop and MapReduce Hive in the Hadoop Ecosystem Java Versus Hive: The Word Count Algorithm What’s Next Chapter 2 Getting Started Installing a Preconfigured Virtual Machine Detailed Installation What Is Inside Hive? Starting Hive Configuring Your Hadoop Environment The Hive Command The Command-Line Interface Chapter 3 Data Types and File Formats Primitive Data Types Collection Data Types Text File Encoding of Data Values Schema on Read Chapter 4 HiveQL: Data Definition Databases in Hive Alter Database Creating Tables Partitioned, Managed Tables Dropping Tables Alter Table Chapter 5 HiveQL: Data Manipulation Loading Data into Managed Tables Inserting Data into Tables from Queries Creating Tables and Loading Them in One Query Exporting Data Chapter 6 HiveQL: Queries SELECT … FROM Clauses WHERE Clauses GROUP BY Clauses JOIN Statements ORDER BY and SORT BY DISTRIBUTE BY with SORT BY CLUSTER BY Casting Queries that Sample Data UNION ALL Chapter 7 HiveQL: Views Views to Reduce Query Complexity Views that Restrict Data Based on Conditions Views and Map Type for Dynamic Tables View Odds and Ends Chapter 8 HiveQL: Indexes Creating an Index Rebuilding the Index Showing an Index Dropping an Index Implementing a Custom Index Handler Chapter 9 Schema Design Table-by-Day Over Partitioning Unique Keys and Normalization Making Multiple Passes over the Same Data The Case for Partitioning Every Table Bucketing Table Data Storage Adding Columns to a Table Using Columnar Tables (Almost) Always Use Compression! Chapter 10 Tuning Using EXPLAIN EXPLAIN EXTENDED Limit Tuning Optimized Joins Local Mode Parallel Execution Strict Mode Tuning the Number of Mappers and Reducers JVM Reuse Indexes Dynamic Partition Tuning Speculative Execution Single MapReduce MultiGROUP BY Virtual Columns Chapter 11 Other File Formats and Compression Determining Installed Codecs Choosing a Compression Codec Enabling Intermediate Compression Final Output Compression Sequence Files Compression in Action Archive Partition Compression: Wrapping Up Chapter 12 Developing Changing Log4J Properties Connecting a Java Debugger to Hive Building Hive from Source Setting Up Hive and Eclipse Hive in a Maven Project Unit Testing in Hive with hive_test The New Plugin Developer Kit Chapter 13 Functions Discovering and Describing Functions Calling Functions Standard Functions Aggregate Functions Table Generating Functions A UDF for Finding a Zodiac Sign from a Day UDF Versus GenericUDF Permanent Functions User-Defined Aggregate Functions User-Defined Table Generating Functions Accessing the Distributed Cache from a UDF Annotations for Use with Functions Macros Chapter 14 Streaming Identity Transformation Changing Types Projecting Transformation Manipulative Transformations Using the Distributed Cache Producing Multiple Rows from a Single Row Calculating Aggregates with Streaming CLUSTER BY, DISTRIBUTE BY, SORT BY GenericMR Tools for Streaming to Java Calculating Cogroups Chapter 15 Customizing Hive File and Record Formats File Versus Record Formats Demystifying CREATE TABLE Statements File Formats Record Formats: SerDes CSV and TSV SerDes ObjectInspector Think Big Hive Reflection ObjectInspector XML UDF XPath-Related Functions JSON SerDe Avro Hive SerDe Binary Output Chapter 16 Hive Thrift Service Starting the Thrift Server Setting Up Groovy to Connect to HiveService Connecting to HiveServer Getting Cluster Status Result Set Schema Fetching Results Retrieving Query Plan Metastore Methods Administrating HiveServer Hive ThriftMetastore Chapter 17 Storage Handlers and NoSQL Storage Handler Background HiveStorageHandler HBase Cassandra DynamoDB Chapter 18 Security Integration with Hadoop Security Authentication with Hive Authorization in Hive Chapter 19 Locking Locking Support in Hive with Zookeeper Explicit, Exclusive Locks Chapter 20 Hive Integration with Oozie Oozie Actions A Two-Query Workflow Oozie Web Console Variables in Workflows Capturing Output Capturing Output to Variables Chapter 21 Hive and Amazon Web Services (AWS) Why Elastic MapReduce? Instances Before You Start Managing Your EMR Hive Cluster Thrift Server on EMR Hive Instance Groups on EMR Configuring Your EMR Cluster Persistence and the Metastore on EMR HDFS and S3 on EMR Cluster Putting Resources, Configs, and Bootstrap Scripts on S3 Logs on S3 Spot Instances Security Groups EMR Versus EC2 and Apache Hive Wrapping Up Chapter 22 HCatalog Introduction MapReduce Command Line Security Model Architecture Chapter 23 Case Studies m6d.com (Media6Degrees) Outbrain NASA’s Jet Propulsion Laboratory Photobucket SimpleReach Experiences and Needs from the Customer Trenches Glossary Appendix References Colophon Download Скачать раздачу по magnet-ссылке 5.3 MB Rutracker.org не распространяет и не хранит электронные версии произведений, а лишь предоставляет доступ к создаваемому пользователями каталогу ссылок на торрент-файлы, которые содержат только списки хеш-сумм Как скачивать? (для скачивания .torrent* файлов необходима регистрация)*
[Профиль] [ЛС]
Osco do Casco Стаж: 16 лет Сообщений: 13524	Osco do Casco · 26-Янв-13 11:08 (спустя 1 час 33 мин.) [Цитировать] cat.td! Пожалуйста, размещайте книги в нужном разделе! Если не понимаете в теме и совершенно не имеете понятия, о чем эта книга, то лучше воздержитесь от выкладывания вообще.
[Профиль] [ЛС]

Ответить

Главная » Книги и журналы » Компьютерная литература » Программирование (книги)

Loading...

Error