Learning Apache Mahout
Год: 2015
Автор: Chandramani Tiwary
Издательство: Packt Publishing
ISBN: 978-1-78355-522-2
Язык: Английский
Формат: PDF/EPUB
Качество: Изначально компьютерное (eBook)
Интерактивное оглавление: Да
Количество страниц: 250
Описание:
In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It implements machine learning algorithms on top of distributed processing platforms such as Hadoop and Spark.
Starting with the basics of Mahout and machine learning, you will explore prominent algorithms and their implementation in Mahout development. You will learn about Mahout building blocks, addressing feature extraction, reduction and the curse of dimensionality, delving into classification use cases with the random forest and Naive Bayes classifier and item and user-based recommendation. You will then work with clustering Mahout using the K-means algorithm and implement Mahout without MapReduce. Finish with a flourish by exploring end-to-end use cases on customer analytics and test analytics to get a real-life practical know-how of analytics projects.
Оглавление
Preface
Chapter 1: Introduction to Mahout
Why Mahout
When Mahout
How Mahout
Chapter 2: Core Concepts in Machine Learning
Supervised learning
Unsupervised learning
Recommender system
Model efficacy
Chapter 3: Feature Engineering
Feature engineering
Chapter 4: Classification with Mahout
Classification
Logistic regression
Adaptive regression model
Code example with logistic regression
Random forest
Naïve Bayes classifier
Chapter 5: Frequent Pattern Mining and Topic Modeling
Frequent pattern mining
Importing the Mahout source code into Eclipse
Frequent pattern mining with Mahout
Chapter 6: Recommendation with Mahout
Collaborative filtering
Chapter 7: Clustering with Mahout
k-means
Canopy clustering
Fuzzy k-means
A Mahout command-line example
A Mahout Java example
Chapter 8: New Paradigm in Mahout
Moving beyond MapReduce
Apache Spark
In-core types
Spark Mahout basics
Linear regression with Mahout Spark
Chapter 9: Case Study – Churn Analytics and Customer Segmentation
Churn analytics
Chapter 10: Case Study – Text Analytics
Text analytics
Clustering text
Categorizing text
Index