•The Motivation For Hadoop
· Problems with traditional large-scale systems
· Requirements for a new approach
• Hadoop Basic Concepts
· An Overview of Hadoop
· The Hadoop Distributed File System
· How MapReduce Works
· Anatomy of a Hadoop Cluster
· Other Hadoop Ecosystem Components
• Writing a MapReduce Program
· The MapReduce Flow
· Examining a Sample MapReduce Program
· Basic MapReduce API Concepts
· The Driver Code
· The Mapper
· The Reducer
· Hadoop’s Streaming API
· Using Eclipse for Rapid Development
• Integrating Hadoop Into The Workflow
· Relational Database Management Systems
· Storage Systems
· Creating workflows with Oozie
· Importing Data from RDBMSs With Sqoop
· Importing Real-Time Data with Flume
· Accessing HDFS Using FuseDFS and Hoop
• Delving Deeper Into The Hadoop API
· Using Combiners
· Using LocalJobRunner Mode for Faster Development
· Reducing Intermediate Data with Combiners
· The configure and close methods for MapReduce
Setup and Teardown
· Writing Partitioners for Better Load Balancing
· Directly Accessing HDFS
· Using The Distributed Cache
• Using Hive and Pig
· Hive Basics
· Pig Basics
• Common MapReduce Algorithms
· Sorting and Searching
· Indexing
· Machine Learning with Mahout
· Term Frequency - Inverse Document Frequency
· Word Co-Occurrence
• Practical Development Tips and Techniques
· Testing with MRUnit
· Debugging MapReduce Code
· Using LocalJobRunner Mode for Easier Debugging
· Eclipse development techniques
· Retrieving Job Information with Counters
· Logging
· Splittable File Formats
· Determining the Optimal Number of Reducers
· Map-Only MapReduce Jobs
· Implementing Multiple Mappers using ChainMapper
• More Advanced MapReduce Programming
· Custom Writables and WritableComparables
· Saving Binary Data using SequenceFiles and Avro Files
· Creating InputFormats and OutputFormats
• Joining Data Sets in MapReduce Jobs
· Map-Side Joins
· The Secondary Sort
· Reduce-Side Joins
• Graph Manipulation in Hadoop
· Introduction to graph techniques
· Representing Graphs in Hadoop
· Implementing a sample algorithm: Single Source
· Shortest Path
• Creating Workflows with Oozie
· The Motivation for Oozie
· Oozie’s Workflow Definition Format