Apache Hadoop CookBook Opensource Software Framework Written In Java

Apache Hadoop CookBook Free Download

Below is Book Index it has completely 199 Pages

  • “HelloWorld” Example
  • Introduction
  • Hadoop Word-Count Example
  • Setup
  • Mapper Code
  • Reducer Code
  • Putting it all together, The Driver Class
  • Running the example
  • Download the complete source code

1. How to Install Apache Hadoop on Ubuntu

  • Introduction
  • Prerequisites
  • Installing Java
  • Creating a Dedicated User
  • Disable ipv
  • Installing SSH and Setting up certificate
  • Installing Apache Hadoop
  • Download Apache Hadoop
  • Updating bash
  • Configuring Hadoop
  • Formatting the Hadoop Filesystem
  • Starting Apache Hadoop
  • Testing MapReduce Job
  • Stopping Apache Hadoop
  • Conclusion

Apache Hadoop Cookbook

2. FS Commands Example

  • Introduction
  • Common Commands
  • Create a directory
  • List the content of the directory
  • Upload a file in HDFS
  • Download a file from HDFS
  • View the file content
  • Copying a file
  • Moving file from source to destination
  • Removing the file or the directory from HDFS
  • Displaying the tail of a file
  • Displaying the aggregate length of a particular file
  • Count the directories and files
  • Details of space in the file system
  • Conclusion

3. Cluster Setup Example

  • Introduction
  • Requirements
  • Preparing Virtual Machine
  • Creating VM and Installing Guest OS
  • Installing Guest Additions
  • Creating Cluster of Virtual Machines
  • VM Network settings
  • Cloning the Virtual Machine
  • Testing the network IPs assigned to VMs
  • Converting to Static IPs for VMs
  • Hadoop prerequisite settings
  • Creating User
  • Disable ipv
  • Connecting the machines (SSH Access)
  • Hadoop Setup
  • Download Hadoop
  • Update bashrc
  • Configuring Hadoop
  • Formatting the Namenode
  • Start the Distributed Format System
  • Testing MapReduce Job
  • Stopping the Distributed Format System
  • Conclusion
  • Download configuration files

Apache Hadoop Cookbook

4. Distcp Example

  • Introduction
  • Syntax and Examples
  • Basic
  • Multiple Sources
  • Update and Overwrite Flag
  • Ignore Failures Flag
  • Maximum Map Tasks
  • Final Notes

5. Distributed File System Explained

  • Introduction
  • HDFS Design
  • System failures
  • Can handle large amount of data
  • Coherency Model
  • Portability
  • HDFS Nodes
  • NameNode
  • DataNode
  • HDFS Architecture
  • Working of NameNode and DataNode
  • HDFS Namespace
  • Data Replication
  • Failures
  • Data Accessibility
  • Configuring HDFS
  • Configuring HDFS
  • Formating NameNode
  • Starting the HDFS
  • Interacting with HDFS using Shell
  • Creating a directory
  • List the content of the directory
  • Upload a file in HDFS
  • Download a file from HDFS
  • Interacting with HDFS using MapReduce
  • Conclusion
  • Download the code

Apache Hadoop Cookbook

6.Distributed Cache Example

  • Introduction
  • Working
  • Implementation
  • The Driver Class
  • Map Class
  • Reduce Class
  • Executing the Hadoop Job
  • Conclusion
  • Download the Eclipse Project

7. Wordcount Example

  • Introduction
  • MapReduce
  • Word-Count Example
  • Setup
  • Mapper Code
  • Reducer Code
  • The Driver Class
  • Code Execution
  • In Eclipse IDE
  • On Hadoop Cluster
  • Conclusion
  • Download the Eclipse Project

8. Streaming Example

  • Introduction
  • Prerequisites and Assumptions
  • Hadoop Streaming Workflow
  • MapReduce Code in Python
  • Wordcount Example
  • Mapper
  • Reducer
  • Testing the Python code
  • Submitting and Executing the Job on Hadoop cluster
  • Input Data
  • Transferring input data to HDFS
  • Submitting the MapReduce Job
  • Understanding the Console Log
  • MapReduce Job Output
  • Conclusion
  • Download the Source Code

Apache Hadoop Cookbook

9. Zookeeper Example

  • Introduction
  • How Zookeeper Works?
  • Zookeeper Setup
  • System Requirements
  • Install Java
  • Download Zookeeper
  • Data Directory
  • Configuration File
  • Starting The Server
  • Zookeeper Server Basic Interaction
  • Starting The CLI
  • Creating The First Znode
  • Retrieving Data From The First Znode
  • Modifying Data in Znode
  • Creating A Subnode
  • Removing A Node
  • Conclusion

Download Free Hadoop Book Now

Hadoop is an Apache Software Foundation project. It is the open source version inspired by Google MapReduce and Google File System.

It is designed for distributed processing of large data sets across a cluster of systems often running on commodity standard hardware.

Hadoop is designed with an assumption that all hardware fails sooner or later and the system should be robust and able to handle the hardware failures automatically.

Apache Hadoop consists of two core components, they are:

• Distributed File System called Hadoop Distributed File System or HDFS for short.
• Framework and API for MapReduce jobs.

In this example, we are going to demonstrate the second component of Hadoop framework called MapReduce and we will do so by Word Count Example (Hello World program of the Hadoop Ecosystem) but first we shall understand what MapReduce actually is.

MapReduce is basically a software framework or programming model, which enable users to write programs so that data can be processed parallelly across multiple systems in a cluster. MapReduce consists of two parts Map and Reduce.

• Map: Map task is performed using a map() function that basically performs filtering and sorting. This part is responsible for processing one or more chunks of data and producing the output results which are generally referred as intermediate results. As shown in the diagram below, map task is generally processed in parallel provided the mapping operation is independent of  each other.

• Reduce: Reduce task is performed by reduce() function and performs a summary operation. It is responsible for consolidating the results produced by each of the Map task.

Download Hadoop Book Now


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s