Home  |  About Us  |  Careers |  Resources  |  Client List  |  Sitemap  |  Contact Us

Call: 416-623-9493 or 905-487-4500

Toll Free: 1-866-955-4526 
E-mail: info@globalerp.ca

Software Quality Assurance Training Business Analysis Training SAP Training PMP Training Informatica Training JAVA Training MICROSOFT .NET Training ORACLE DBA Training Big Data & Hadoop Training Scrum Training Fundamentals of Information Security

Big Data Training
  DURATION   39hrs
  Course Fee   $$2100 + HST
  DELIVERY METHOD   Class Room Or Online Training

Introduction to Hadoop:

  • RDBMS vs Hadoop
  • Ecosystem tour (9 products)
  • Vendor comparison (Cloudera, Hortonworks, MapR, Amazon EMR)
  • Hardware Recommendations

HDFS: File System details

  • NameNode and DataNode architecture
  • Write pipeline
  • Read pipeline
  • Heartbeats
  • Rack awareness
  • Block scanner


  • JobTracker/TaskTracker architecture
  • Shuffle: Sort + Partitioning
  • Speculative Execution
  • Input/output formats
  • Distributed cache

Hands-on on Hadoop machine?

  • Introduction to Hadoop FS and Processing Environment?s UIs
  • How to read and write files
  • Basic Unix commands for Hadoop
  • Hadoop ?FS shell
  • Hadoop releases practical
  • Hadoop daemons practical?

ETL Tool (Pig) Introduction Level-1 (Basics)?

  • Pig Introduction
  • Why Pig if Map Reduce is there?
  • How Pig is different from Programming languages
  • Pig Data flow Introduction
  • How Schema is optional in Pig
  • Pig Data types
  • Pig Commands ? Load, Store , Describe , Dump
  • Map Reduce job started by Pig Commands
  • Execution plan?

ETL Tool (Pig) Level-2 (Complex)?

  • Pig- UDFs
  • Pig Use cases
  • Pig Assignment
  • Complex Use cases on Pig
  • XML Data Processing in Pig
  • Structured Data processing in Pig
  • Semi-structured data processing in Pig
  • Pig Advanced Assignment
  • Real time scenarios on Pig
  • When we should use Pig
  • When we shouldn?t use Pig
  • Live examples of Pig Use cases?

Hive Warehouse (Introduction to Hive Warehouse and Differentiation between SQL based Datawarehouse and Hive) Level-1 (Basics)

  • Hive Introduction
  • Meta storage and meta store
  • Introduction to Derby Database
  • Hive Data types
  • HQL
  • DDL, DML and sub languages of Hive
  • Internal , external and Temp tables in Hive
  • Differentiation between SQL based Datawarehouse and Hive?

Hive Level-2 (Complex)

  • Hive releases
  • Why Hive is not best solution for OLTP
  • OLAP in Hive
  • Partitioning
  • Bucketing
  • Hive Architecture
  • Thrift Server
  • Hue Interface for Hive
  • How to analyze data using Hive script
  • Differentiation between Hive and Impala
  • UDFs in Hive
  • Complex Use cases in Hive
  • Hive Advanced Assignment
  • Real time scenarios of Hive
  • POC on Pig and Hive , With real time data sets and problem statements?

Map Reduce Level-1 (Basics)

  • How Map Reduce works as Processing Framework
  • End to End execution flow of Map Reduce job
  • Different tasks in Map Reduce job
  • Why Reducer is optional while Mapper is mandatory?
  • Introduction to Combiner
  • Introduction to Partitioner
  • Programming languages for Map Reduce
  • Why Java is preferred for Map Reduce programming
  • POC based on Pig, Hive, HDFS, MR?

NOSQL Databases and Introduction to HBase Level-1 (Basics)

  • Introduction to NOSQL
  • Why NOSQL if SQL is in market since several years
  • Databases in market based on NOSQL
  • CAP Theorem
  • ACID Vs. CAP
  • OLTP Solutions with different capabilities
  • Which Nosql based solution is capable to handle specific requirements
  • Examples of companies like Google, Facebook, Amazon, and other clients who are using NOSQL based databases
  • HBase Architecture of column families?

Map Reduce Advanced and HBase Level-2 (Complex)

  • How to work on Map Reduce in real time
  • Map Reduce complex scenarios
  • Introduction to HBase
  • Introduction to other NOSQL based data models
  • Drawbacks of Hadoop
  • Why Hadoop can?t work for real time processing
  • How HBase or other NOSQL based tools made real time processing possible on the top of Hadoop
  • HBase table and column family structure
  • HBase versioning concept
  • HBase flexible schema
  • HBase Advanced?

Zookeeper and SQOOP

  • Introduction to Zookeeper
  • How Zookeeper helps in Hadoop Ecosystem
  • How to load data from Relational storage in Hadoop
  • Sqoop basics
  • Sqoop practical implementation
  • Sqoop alternative
  • Sqoop connector
  • Quick revision of previous classes to fill the gap in your understanding and correct understandings

Flume , Oozie and YARN

  • How to load data in Hadoop that is coming from web server or other storage without fixed schema
  • How to load unstructured and semi structured data in Hadoop
  • Introduction to Flume
  • Hands-on on Flume
  • How to load Twitter data in HDFS using Hadoop
  • Introduction to Oozie
  • How to schedule jobs using Oozie
  • What kind of jobs can be scheduled using Oozie
  • How to schedule jobs which are time based
  • Hadoop releases
  • From where to get Hadoop and other components to install
  • Introduction to YARN
  • Significance of YARN?

Hue, Hadoop Releases comparison, Hadoop Real time scenarios Level-2 (Complex)?

  • Introduction to Hue
  • How Hue is used in real time
  • Hue Use cases
  • ?Real time Hadoop usage
  • Real time cluster introduction
  • Hadoop Release 1 vs Hadoop Release 2 in real time
  • Hadoop real time project
  • Major POC based on combination of several tools of Hadoop Ecosystem
  • Comparison between Pig and Hive real time scenarios
  • Real time problems and frequently faced errors with solution?

SPARK and Scala? Level-1 (Basics)

  • Introduction to Spark
  • Introduction to scala
  • Basics Features of SPARK and Scala available in Hue
  • Why Spark demand is increasing in market
  • How can we use Spark with Hadoop Eco System
  • Datasets for practice purpose?

SPARK and Scala? Level-2 (Complex)

  • Spark use cases with ?real time scenarios
  • Spark Practical with advanced concepts
  • Scala platform with complex use cases
  • Real time project use cases examples based on Spark and Scala
  • How we can reduce?
  SOFTWARE   Mississauga: Weekend Batch Start Date 06-May from 10am-2pm.
Mississauga:Weekdays Batch Start Date 08-May from 10am-2pm.

Brampton:199 Advance Blvd, Suite: 201
..................Brampton, ON, L6T 4N2

Toronto:5635 Yonge St. Unit 210
..................North York (Finch Subway) ON M2M 3S9

Mississauga:1065 Canadian Place, Suite 201
..................Mississauga ON L4W 0C2

Phone: 416-623-9493 or 905-487-4500
E-mail: training@globalerp.ca

Enter Courses Description