Industry recognized certification enables you to add this credential to your resume upon completion of all courses

Need Custom Training for Your Team?
Get Quote
Call Us

Toll Free (844) 397-3739

Inquire About This Course
Sumit Pal, Instructor - Big Data Analyst

Sumit Pal

The instructor for this course has more than 22 years of experience in various roles spanning companies from startups to enterprises. He has worked for Microsoft (SQL server development team), Oracle (OLAP development team) and Verizon (as Director of Big Data Architecture). Currently, he consults for multiple clients advising them on their data architectures and big data solutions and does hands-on coding with Spark, Scala, Java, and Python. Author of recently published book: SQL on Big data, he has extensive experience in building scalable systems across the stack from middle-tier, data tier to visualization for analytics applications, using Big Data, NoSQL DB and has deep expertise in Database Internals, Data Warehouses, Dimensional Modeling, Data Science with Java, Python, and SQL.

Instructor: Sumit Pal

Master the skills necessary to build a career in Big Data.

  • Get up to speed with big data technologies and start doing analyst work on massive data sets
  • Instructor: Microsoft SQL Server team (1996-1997), Oracle development team (1997-2004) and Big Data team at Verizon Labs (2013-2015)   
  • This big data online training prepares you for Cloudera's Business Analyst Certification

Duration: 10h 41m

Course Description

This Big Data online training gives one the background necessary to start doing analyst work on Big Data. It covers - areas like Big Data basics, Hadoop basics and tools like Hive and Pig - which allows one to load large data sets on Hadoop and start playing around with SQL Like queries over it using Hive and do analysis and Data Wrangling work with Pig. This online Big Data training also teaches Machine Learning Basics and Data Science using R and also covers Mahout briefly - a Recommendation, Clustering Engine on Large data sets. The course includes hands-on exercises with Hadoop, Hive , Pig and R with some examples of using R to do Machine Learning and Data Science work

What am I going to get from this course?

  • Students will get a good idea of Big Data Landscape, Learn basics of Big Data and Hadoop and HDFS.
  • Students will also learn to use tools like - Hive and Pig - both from a theoretical aspect as well as Hands on.
  • Students will Learn some amount of R and SparkR ( a big data processing framework )
  • Students will learn about Mahout and also about Data Science and where it is used
  • Students will learn basics of some Data Science Algorithms like - Decision Trees, Naive Bayes and Clustering algorithms and do hands on work with them
  • Students will learn about R on Hadoop - tools and solutions
  • Students will also learn how to use Hadoop Virtual Machines on their laptop

Prerequisites and Target Audience

What will students need to know or do before starting this course?

  • Interest in Data and some SQL and general aptitude

Who should take this course? Who should not?

  • The course is open for anyone who likes to know about Big Data tools and technologies and someone who is interested in knowing about Data Science and the algorithms and where they are used
  • It will be useful for both Business Analysts as well as Managers and anyone interested in working with big data.


Module 1: Big Data Analytics Overview

Lecture 1 Introduction

Introduction to the course and contents

Lecture 2 How Big Data Affects Our Daily Life

Lecture 3 Big Data Analytics Overview

Discuss State of Practice in Analytics and the disruption happening How Big Data is usurping the traditional analytics

Lecture 4 Big Data Analytics Across Verticals

Discuss usage of Big Data in different verticals and newly evolving field of IOT and Cybersecurity and how Big Data is so essential for them

Module 2: Big Data Analytics with Hadoop

Lecture 5 What is Hadoop?

Motivation for Hadoop and Distributed Data Processing, new Architectures and History of Hadoop

Lecture 6 Hadoop - Key Platform Components and Architecture

Here we cover how Hadoop evolved to what it is today and its main components and the reason why Hadoop exists

Lecture 7 Hadoop Cluster

This module covers details about a Hadoop Cluster and how data splitting and data compression is so essential for Hadoop

Lecture 8 HDFS and Map Reduce Architecture

This section covers in details about the 2 major components that build up Hadoop - HDFS and Map Reduce and their internals

Lecture 9 Hadoop Ecosystem

In this section we cover about Hadoop Ecosystem, Deployment architectures and major Hadoop Vendors and also when, where and how to use Hadoop deployments

Lecture 10 Installation Hands-on and Resources Download

Module 3: Hive

Lecture 11 Hive Overview

In this section we discuss how Hive fits into the overall Hadoop Architecture and what is Hive and what it is Not

Lecture 12 Hive Architecture

In this section we discuss about Hive Architecture as well as Hive basic command level details - how to create tables, data types and support for complex data types

Lecture 13 How to connect Tableau to Hive

In this section we see a basic demo of how to setup Tableau to connect to Hive installation on your laptop or VM

Lecture 14 Hive Tables, Partitions and Data Formats

In this section we discuss Hive Tables, Data Formats and how to do data partitioning in Hive for better performance and scalability

Lecture 15 Hive deeper details

In this section we cover more capabilities of Hive - Functions and Joins, Other Hive Queries and building UDFs and Importing and Exporting Data

Lecture 16 Hive Hands On Video

Hands on Video showing how to work with Hive and walk through of a sample example

Module 4: PIG

Lecture 17 Pig Overview

In this section we discuss how Pig fits into the Hadoop Ecosystem and an introduction to Pig How Pig works What is Pig What Pig is Not

Lecture 18 Pig Data Types and Operators

In this section we cover more operators and commands available in Pig and their usage with examples. This is the meat of Pig

Lecture 19 Pig Hands On

In this section we show the video of how to start using pig and some sample examples

Lecture 20 Deeper Into Pig - Some Advanced Things on Pig

More operators and advanced concepts in Pig

Module 5: Introduction to R

Lecture 21 What is R?

In this section we learn about the basics of the R Programming Language and the Data Exploration Capabilities of R

Lecture 22 Data Ingestion and Manipulation with R

In this section we learn the capabilities of R for doing basic Data Ingestion / Reading and Manipulation

Lecture 23 Data Visualization with R

Here we learn how to do some basic data visualization with R

Module 6: R with Big Data ( Hadoop and Spark )

Lecture 24 R with Big Data - 1

Here we cover how R and Big Data Technologies have evolved and adapted for processing large data sets using R language constructs but with Map Reduce and Spark as the underlying engines to run R code

Lecture 25 R with Big Data - 2

Here we cover how R and Big Data Technologies have evolved and adapted for processing large data sets using R language constructs but with Map Reduce and Spark as the underlying engines to run R code

Lecture 26 R with SparkR

See working examples of using R on Spark

Module 7: Fundamentals of Machine Learning

Lecture 27 Basics of Machine Learning

What is Machine Learning, Data Science and where they are used

Lecture 28 Road to Data Science

In this section we discuss the kind of skills and capabilities needed to become a data scientists We discuss the Life cycle of Data Science projects Everyday usage of Data Science based algorithms

Lecture 29 Basic Concepts and Terminology and their meaning

In this section we discuss the basics concepts for Data Science things like Bias and Variance and why they are important to go further into this field

Lecture 30 Basic Concepts and Terminology and their meaning

This is an additional module to the previous one - where we discuss more of the fundamental concepts and terminology of the different things in Machine Learning and Data Science and how they help us to build the right algorithms

Lecture 31 Classification and Regression

In this section we look at the basics of Classification and Regression Algorithms

Lecture 32 Naive Bayes and Decision Trees

In this section we look at 2 of the most commonly used algorithms in the field of Data Science - Naive Bayes and Decision Trees

Module 8: Installation and Hands On Exercise

Lecture 33 Installation (RECAP)

In this section we will install and setup the VM The zip file contains the following files -Install.txt - Start here - following the instructions (This has been tested on Windows 7 and Windows 10 laptop ) -Vagrant_README.md -- The above Install.txt file will also tell you to refer to this file and do the steps as mentioned in this file for Installation and Setup -VagrantNotes.txt -- This file will tell you how to copy files from your laptop to the VM These 2 files are to be used when setting up connectivity to Hive from Tableau TableauConnectToHive.png - HortonworksHiveODBC64.msi

Lecture 34 Hands On working session with Hadoop and HDFS

This is to be tried only after the VM has been installed and it is working and you are comfortable working with the VM See the zip file - This has some very basic Hadoop and HDFS commands for you to use and get used to Hadoop. Also available in the zip file is a sample dataset (Text file ) for you to use for your Hadoop commands

Lecture 35 Hands on Working session and Exercises with Hive (RECAP)

This lecture contains the resources to work with Hive Examples. The zip file has all the examples and code for you to try and play with Hive and learn the Commands and Querying capabilities of Hive The HivePigData.zip file - has all the data you need to do the exercises

Lecture 36 Connecting Tableau to Hive (RECAP)

In this Lecture we will see demo of how to connect Tableau to Hive ( just the connectivity part ) not doing the actual visualization of data in Tableau ( that is not part of the course ) The zip file has the ODBC Driver to connect to Hive from Tableau and a Screen Image of how to do the setup - look at the video with this Lecture to do the setup ODBC Driver - HortonworksHiveODBC64.msi Tableau to Hive Connection Setup - TableauConnectToHive.png

Lecture 37 Hands on Exercise with Pig (RECAP)

In this module we will do some sample exercises to learn Pig more deeply. The zip file has the code / exercises to do with Pig The HivePigData.zip fiile has the datasets we would be using. Look at the video for the sample example of how to learn pig

Resource 1 Code and Data Sets

This is not a lecture per-se - but the code (R) and Data Sets for Module 10 - where we learnt - Cluster Analysis, Decision Trees, Descriptive Statistics and little bit of probability

Resource 2 DataSets to Download for Module 5
Resource 3 DataSets to Download for Module 5 - SparkR

Module 9: Apache Mahout Introduction

Lecture 38 Mahout Basics

This is the only section in the module - which discusses the basics of Mahout - where it started and where it is going and the capabilities and algorithms it has in built for large scale data science.

Lecture 39 Mahout Demo for Recommendation

This section shows how to run Mahout's recommendation engine from out of the box and 1 configurable example developed by the instructor to run Mahout's Recommendation Engine

Module 10: Data Analysis and Statistical Methods

Lecture 40 Cluster Analysis Part 1

This section walks through the different ways of doing Clustering of data using out of the box algorithms in R

Lecture 41 Cluster Analysis - Part 2

This section walks through the different ways of doing Clustering of data using out of the box algorithms in R

Lecture 42 Statistical Method - Part 1

This section covers - Descriptive Statistics part of Data Analysis using R

Lecture 43 Statistical Method - Part 2

This section covers - basics of Probability Theory part of Data Analysis using R

Lecture 44 Statistical Method - Part 3

This section covers - Inferential Statistics part of Data Analysis using R

Lecture 45 Decision Tree - Part 1

This section covers - building decision trees using R with sample examples and demos

Lecture 46 Decision Tree - Part 2

This section covers - Decision Trees using Random Forest Algorithm in R


10 Reviews

Ben J

January, 2017

Excellent course with right contents in terms of coverage and right amount of depth to get started, up and running. The trainer had done lot of hard work in building the right slides and right content which is appropriate for this extensive subject area

Martin R

January, 2017

Good bang for the buck - gets the trainee up to speed with the Big Data Analyst Skills in a short but comprehensive course. The course has a good balance of hands on and theoretical content

Sanjay M

February, 2017

One of the best courses on Big Data. I have been searching for something like this for a while. I have taken many other courses before but the way Sumit takes you to the journey of Big Data is quite unique. He starts off with a big picture and explains each and every aspect of Hadoop with hands on exercises. Highly recommended. Must for anyone to learn hadoop the right way.

Donald S

May, 2017

The course also helped me to learn big data processing framework and algorithms broadly. Particularly the use of Hadoop virtual machine was useful as I could easily do it on my laptop. Sure, the course can prepare us for Cloudera's business analyst certification.

Victor L

May, 2017

Before I enrolled for the course, I was not a confident big data analyst. However, the course has changed me a lot and made me more confident with a broad spectrum of big data technologies learning. I could gain excellent knowledge in the basics of big data, Hadoop, Pig, and Hive, apart from Machine learning and SQL. The exercises were so good that I gained practical knowledge in the work of data science. In addition, one can learn much more on the usage of big data in different verticals and newly evolving field of IOT and Cybersecurity.

Sue B

May, 2017

Overall it is a very good course even for the experienced ones as they can brush up their knowledge with the developments in big data.

Pavithra S

July, 2017

I was uncomfortable with big data analytics. After enrolling in this course of study, I was a changed person with more confidence studying big data methodologies.

Peter B

July, 2017

The tests were so valuable that I had sound wisdom in the practice of data technique. Notably, the Hadoop virtual machine was helpful as I could comfortably deal with it on my computer.

Nishad D

July, 2017

I picked up great understanding in the rudiments of Hadoop, Machine learning, Hive and Pig, besides big data and SQL. The curriculum also pushed me to get up to speed on big data processing scheme and algorithms broadly.

Thomas B

July, 2017

The program prepares you for Cloudera's business analyst certification. Additionally, you get a deeper understanding of the management of big data in several verticals and newly expanding area of IOT and security. All in all, it is a rather useful course for skilled people as they can brush up on their familiarity with improvements in big data.