Data Analytics - Big Data - Hadoop Training

Our new BigData - Hadoop course is designed for complete begineer with no experience in data analytics. This course is perfectly sutiable for the individual who wants to start career in Data Analytics.

Training course can be tailored to your organisations' or individual needs. Please contact us for details and prices of private in-house training services. 

Scheduled start date: Contact us for dates

Duration: 60 to 80 hours

 Introduction to Big Data

  • Rise of Big Data
  • Compare Hadoop vs traditional systems
  • Hadoop Master-Slave Architecture
  • Understanding HDFS Architecture
  • Name Node, Data Node, Secondary Node
  • Learn about Job Tracker, Task Tracker

Hadoop Configuration and Deamon Logs

  • Hadoop Configuration and Daemon Logs
  • Hadoop Daemon or Roles

 Hadoop Cluster Setup and Working

  • Cluster Setup and Working
  • Getting Virtualization Software and Linux Disk Images
  • Adding Machines to your VM Box
  • Installing Linux into Machines
  • Preparing your Linux Machines to install Hadoop
  • Cluster Management Solution
  • Setting Apache Hadoop Cluster
  • Writing Data to Cluster and Checking Replication Status
  • Setting up Linux machines in AWS EC2 to setup Cloudera Cluster
  • Setting Cloudera Cluster on your machines in AWS EC2

Hadoop Cluster Maintanance and Administration

  • Commissioning Decommissioning of Data Nodes in Cloudera Cluster
  • Decommissioning and Commissioning nodes in Apache Hadoop Cluster
  • Balancing a Cluster
  • Managing Services
  • Managing Software Packages with Apache Hadoop
  • Managing Role Instances
  • Improvements in Hadoop Version 2

 HDFS & MapReduce Architecture

  • Core components of Hadoop
  • Understanding Hadoop Master-Slave Architecture
  • Learn about NameNode, DataNode, Secondary Node
  • Understanding HDFS Architecture
  • Anatomy of Read and Write data on HDFS
  • MapReduce Architecture Flow

Hadoop Configuration

  • Hadoop Modes
  • Hadoop Terminal Commands
  • Cluster Configuration
  • Web Ports
  • Hadoop Configuration Files
  • Reporting, Recovery

 Understanding Hadoop MapReduce Framework

  • Overview of the MapReduce Framework
  • Use cases of MapReduce
  • MapReduce Architecture
  • Anatomy of MapReduce Program
  • Mapper/Reducer Class, Driver code
  • Understand Combiner and Partitioner

Advance MapReduce - Part 1

  • Write your own Partitioner
  • Writing Map and Reduce in Python
  • Map side/Reduce side Join
  • Distributed Join
  • Distributed Cache
  • Counters
  • Joining Multiple datasets in MapReduce

 Advance MapReduce - Part 2

  • MapReduce internals
  • Understanding Input Format
  • Custom Input Format
  • Using Writable and Comparable
  • Understanding Output Format
  • Sequence Files
  • JUnit and MRUnit Testing Frameworks

Apache PIG

  • PIG vs MapReduce
  • PIG Architecture & Data types
  • PIG Latin Relational Operators
  • PIG Latin Join and CoGroup
  • PIG Latin Group and Union
  • Describe, Explain, Illustrate

 Apahce Hive and HiveQL

  • What is Hive
  • Hive DDL - Create/Show Database
  • Hive DDL - Create/Show/Drop Tables
  • Hive DML - Load Files & Insert Data
  • Hive SQL - Select, Filter, Join, Group By
  • Hive Architecture & Components
  • Difference between Hive and RDBMS

Advance HiveQL

  • Multi-Table Inserts
  • Joins
  • Grouping Sets, Cubes, Rollups
  • Custom Map and Reduce scripts
  • Hive SerDe
  • Hive UDF
  • Hive UDAF

 Apache Kafka

  • Kafka - How Kafka works
  • Kafka Architecture
  • Apache Kafka and real world use cases
  • What are the various components of Apache Kafka
  • Kafka Cluster configuration
  • Kafka Broker, Producer and Consumer configuration
  • Practice lab exercises using Apache Flume

Apache Flume

  • Flume - How it works
  • Flume Architecture
  • Flume Complex Flow - Multiplexing
  • Apache Flume and real world use cases
  • What are the various components of Apache Flume
  • Flume agent configuration
  • Practice lab exercises using Apache Flume

 Apache Sqoop, Oozie

  • Sqoop - How Sqoop works
  • Sqoop Architecture
  • Oozie - Simple/Complex Flow
  • Oozie Service/ Scheduler
  • Use Cases - Time and Data triggers

NoSQL Database

  • CAP theorem
  • RDBMS vs NoSQL
  • Key Value stores: Memcached, Riak
  • Key Value stores: Redis, Dynamo DB
  • Column Family: Cassandra, HBase

 Apache HBase

  • When/Why to use HBase
  • HBase Architecture/Storage
  • HBase Data Model
  • HBase Families/ Column Families
  • HBase Master
  • HBase vs RDBMS
  • Access HBase Data

Apache ZooKeeper

  • What is Zookeeper
  • Zookeeper Data Model
  • ZNokde Types
  • Sequential ZNodes
  • Installing and Configuring
  • Running Zookeeper
  • Zookeeper use cases

 Hadoop 2.0, YARN, MRV2

  • Hadoop 1.0 Limitations
  • MapReduce Limitations
  • HDFS 2: Architecture
  • HDFS 2: High availability
  • HDFS 2: Federation
  • YARN Architecture
  • Classic vs YARN
  • YARN multitenancy
 

Please enroll for the course by submitting your details to: afshan_alurkar@testtriangle.com

Confirm