BIG DATA & HADOOP TRAINING

                             BIG DATA & HADOOP TRAINING CURRICULUM

  • Module 1
  • Hadoop Architecture

Topics – What is Big Data, Hadoop Architecture, Hadoop ecosystem components, Hadoop Storage: HDFS, Hadoop Processing: MapReduce Framework, Hadoop Server Roles: NameNode, Secondary NameNode, and DataNode, Anatomy of File Write and Read.

  • Module 2
  • Hadoop Cluster Configuration and Data Loading

Topics – Hadoop Cluster Architecture, Hadoop Cluster Configuration files, Hadoop Cluster Modes, Multi-Node Hadoop Cluster, A Typical Production Hadoop Cluster, MapReduce Job execution, Common Hadoop Shell commands, Data Loading Techniques: FLUME, SQOOP, Hadoop Copy Commands, Hadoop Project: Data Loading.

  • Module 3
  • Hadoop MapReduce framework

Topics – Hadoop Data Types, Hadoop MapReduce paradigm, Map and Reduce tasks, MapReduce Execution Framework, Partitioners and Combiners, Input Formats (Input Splits and Records, Text Input, Binary Input, Multiple Inputs), Output Formats (TextOutput, BinaryOutPut, Multiple Output), Hadoop Project: MapReduce Programming.

  • Module 4
  • Pig and Pig Latin

Topics – Installing and Running Pig, Grunt, Pig’s Data Model, Pig Latin, Developing & Testing Pig Latin Scripts, Writing Evaluation, Filter, Load & Store Functions, Hadoop Project: Pig Scripting.

Module 5

Hive and HiveQL

Topics – Hive Architecture and Installation, Comparison with Traditional Database, HiveQL: Data Types, Operators and Functions, Hive Tables(Managed Tables and External Tables, Partitions and Buckets, Storage Formats, Importing Data, Altering Tables, Dropping Tables), Querying Data (Sorting And Aggregating, Map Reduce Scripts, Joins & Subqueries, Views, Map and Reduce side Joins to optimize Query).

  • Module 6
  • Advance Hive, NoSQL Databases and HBase

Topics – Hive: Data manipulation with Hive, User Defined Functions, Appending Data into existing Hive Table, Custom Map/Reduce in Hive, Hadoop Project: Hive Scripting, HBase: Introduction to HBase, Client API’s and their features, Available Client, HBase Architecture, MapReduce Integration.

  • Module 7
  • Advance HBase and ZooKeeper

Topics – HBase: Advanced Usage, Schema Design, Advance Indexing, Coprocessors, Hadoop Project: HBase tables The ZooKeeper Service: Data Model, Operations, Implementation, Consistency, Sessions, States

  • Module 8
  • Hadoop 2.0, MRv2, and YARN

Topics – Schedulers: Fair and Capacity, Hadoop 2.0 New Features: NameNode High Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN, Upgrade your existing MRv1 code to MRv2, Programming in YARN framework.

X