Course Description
This Big Data online training gives one the background necessary to start doing analyst work on Big Data. It covers - areas like Big Data basics, Hadoop basics and tools like Hive and Pig - which allows one to load large data sets on Hadoop and start playing around with SQL Like queries over it using Hive and do analysis and Data Wrangling work with Pig.
This online Big Data training also teaches Machine Learning Basics and Data Science using R and also covers Mahout briefly - a Recommendation, Clustering Engine on Large data sets.
The course includes hands-on exercises with Hadoop, Hive , Pig and R with some examples of using R to do Machine Learning and Data Science work
What am I going to get from this course?
- Students will get a good idea of Big Data Landscape, Learn basics of Big Data and Hadoop and HDFS.
- Students will also learn to use tools like - Hive and Pig - both from a theoretical aspect as well as Hands on.
- Students will Learn some amount of R and SparkR ( a big data processing framework )
- Students will learn about Mahout and also about Data Science and where it is used
- Students will learn basics of some Data Science Algorithms like - Decision Trees, Naive Bayes and Clustering algorithms and do hands on work with them
- Students will learn about R on Hadoop - tools and solutions
- Students will also learn how to use Hadoop Virtual Machines on their laptop
Curriculum
Module 1: Big Data Analytics Overview
01:02:40
Introduction to the course and contents
Lecture 2
How Big Data Affects Our Daily Life
22:39
Lecture 3
Big Data Analytics Overview
16:43
Discuss State of Practice in Analytics and the disruption happening
How Big Data is usurping the traditional analytics
Lecture 4
Big Data Analytics Across Verticals
15:04
Discuss usage of Big Data in different verticals and newly evolving field of IOT and Cybersecurity and how Big Data is so essential for them
Module 2: Big Data Analytics with Hadoop
01:40:06
Lecture 5
What is Hadoop?
18:49
Motivation for Hadoop and Distributed Data Processing, new Architectures and History of Hadoop
Lecture 6
Hadoop - Key Platform Components and Architecture
15:21
Here we cover how Hadoop evolved to what it is today and its main components and the reason why Hadoop exists
Lecture 7
Hadoop Cluster
23:33
This module covers details about a Hadoop Cluster and how data splitting and data compression is so essential for Hadoop
Lecture 8
HDFS and Map Reduce Architecture
15:58
This section covers in details about the 2 major components that build up Hadoop - HDFS and Map Reduce and their internals
Lecture 9
Hadoop Ecosystem
12:41
In this section we cover about Hadoop Ecosystem, Deployment architectures and major Hadoop Vendors and also when, where and how to use Hadoop deployments
Lecture 10
Installation Hands-on and Resources Download
13:44
Lecture 11
Hive Overview
15:14
In this section we discuss how Hive fits into the overall Hadoop Architecture and what is Hive and what it is Not
Lecture 12
Hive Architecture
15:55
In this section we discuss about Hive Architecture as well as Hive basic command level details - how to create tables, data types and support for complex data types
Lecture 13
How to connect Tableau to Hive
14:42
In this section we see a basic demo of how to setup Tableau to connect to Hive installation on your laptop or VM
Lecture 14
Hive Tables, Partitions and Data Formats
15:19
In this section we discuss Hive Tables, Data Formats and how to do data partitioning in Hive for better performance and scalability
Lecture 15
Hive deeper details
12:36
In this section we cover more capabilities of Hive - Functions and Joins, Other Hive Queries and building UDFs and Importing and Exporting Data
Lecture 16
Hive Hands On Video
13:31
Hands on Video showing how to work with Hive and walk through of a sample example
Lecture 17
Pig Overview
14:16
In this section we discuss how Pig fits into the Hadoop Ecosystem and an introduction to Pig
How Pig works
What is Pig
What Pig is Not
Lecture 18
Pig Data Types and Operators
15:13
In this section we cover more operators and commands available in Pig and their usage with examples. This is the meat of Pig
Lecture 19
Pig Hands On
07:00
In this section we show the video of how to start using pig and some sample examples
Lecture 20
Deeper Into Pig - Some Advanced Things on Pig
20:21
More operators and advanced concepts in Pig
Module 5: Introduction to R
50:13
Lecture 21
What is R?
18:42
In this section we learn about the basics of the R Programming Language and the Data Exploration Capabilities of R
Lecture 22
Data Ingestion and Manipulation with R
13:01
In this section we learn the capabilities of R for doing basic Data Ingestion / Reading and Manipulation
Lecture 23
Data Visualization with R
18:30
Here we learn how to do some basic data visualization with R
Module 6: R with Big Data ( Hadoop and Spark )
46:45
Lecture 24
R with Big Data - 1
15:11
Here we cover how R and Big Data Technologies have evolved and adapted for processing large data sets using R language constructs but with Map Reduce and Spark as the underlying engines to run R code
Lecture 25
R with Big Data - 2
20:01
Here we cover how R and Big Data Technologies have evolved and adapted for processing large data sets using R language constructs but with Map Reduce and Spark as the underlying engines to run R code
Lecture 26
R with SparkR
11:33
See working examples of using R on Spark
Module 7: Fundamentals of Machine Learning
01:21:16
Lecture 27
Basics of Machine Learning
13:33
What is Machine Learning, Data Science and where they are used
Lecture 28
Road to Data Science
13:37
In this section we discuss the kind of skills and capabilities needed to become a data scientists
We discuss the Life cycle of Data Science projects
Everyday usage of Data Science based algorithms
Lecture 29
Basic Concepts and Terminology and their meaning
15:18
In this section we discuss the basics concepts for Data Science things like Bias and Variance and why they are important to go further into this field
Lecture 30
Basic Concepts and Terminology and their meaning
10:52
This is an additional module to the previous one - where we discuss more of the fundamental concepts and terminology of the different things in Machine Learning and Data Science and how they help us to build the right algorithms
Lecture 31
Classification and Regression
09:42
In this section we look at the basics of Classification and Regression Algorithms
Lecture 32
Naive Bayes and Decision Trees
18:14
In this section we look at 2 of the most commonly used algorithms in the field of Data Science - Naive Bayes and Decision Trees
Module 8: Installation and Hands On Exercise
58:31
Lecture 33
Installation (RECAP)
13:44
In this section we will install and setup the VM
The zip file contains the following files
-Install.txt - Start here - following the instructions (This has been tested on Windows 7 and Windows 10 laptop )
-Vagrant_README.md -- The above Install.txt file will also tell you to refer to this file and do the steps as mentioned in this file for Installation and Setup
-VagrantNotes.txt -- This file will tell you how to copy files from your laptop to the VM
These 2 files are to be used when setting up connectivity to Hive from Tableau
TableauConnectToHive.png -
HortonworksHiveODBC64.msi
Lecture 34
Hands On working session with Hadoop and HDFS
09:34
This is to be tried only after the VM has been installed and it is working and you are comfortable working with the VM
See the zip file - This has some very basic Hadoop and HDFS commands for you to use and get used to Hadoop. Also available in the zip file is a sample dataset (Text file ) for you to use for your Hadoop commands
Lecture 35
Hands on Working session and Exercises with Hive (RECAP)
13:31
This lecture contains the resources to work with Hive Examples.
The zip file has all the examples and code for you to try and play with Hive and learn the Commands and Querying capabilities of Hive
The HivePigData.zip file - has all the data you need to do the exercises
Lecture 36
Connecting Tableau to Hive (RECAP)
14:42
In this Lecture we will see demo of how to connect Tableau to Hive ( just the connectivity part ) not doing the actual visualization of data in Tableau ( that is not part of the course )
The zip file has the ODBC Driver to connect to Hive from Tableau and a Screen Image of how to do the setup - look at the video with this Lecture to do the setup
ODBC Driver - HortonworksHiveODBC64.msi
Tableau to Hive Connection Setup - TableauConnectToHive.png
Lecture 37
Hands on Exercise with Pig (RECAP)
07:00
In this module we will do some sample exercises to learn Pig more deeply.
The zip file has the code / exercises to do with Pig
The HivePigData.zip fiile has the datasets we would be using.
Look at the video for the sample example of how to learn pig
Resource 1
Code and Data Sets
This is not a lecture per-se - but the code (R) and Data Sets for Module 10 - where we learnt - Cluster Analysis, Decision Trees, Descriptive Statistics and little bit of probability
Resource 2
DataSets to Download for Module 5
Resource 3
DataSets to Download for Module 5 - SparkR
Module 9: Apache Mahout Introduction
27:01
Lecture 38
Mahout Basics
11:18
This is the only section in the module - which discusses the basics of Mahout - where it started and where it is going and the capabilities and algorithms it has in built for large scale data science.
Lecture 39
Mahout Demo for Recommendation
15:43
This section shows how to run Mahout's recommendation engine from out of the box and 1 configurable example developed by the instructor to run Mahout's Recommendation Engine
Module 10: Data Analysis and Statistical Methods
01:10:20
Lecture 40
Cluster Analysis Part 1
10:13
This section walks through the different ways of doing Clustering of data using out of the box algorithms in R
Lecture 41
Cluster Analysis - Part 2
12:28
This section walks through the different ways of doing Clustering of data using out of the box algorithms in R
Lecture 42
Statistical Method - Part 1
08:31
This section covers - Descriptive Statistics part of Data Analysis using R
Lecture 43
Statistical Method - Part 2
08:15
This section covers - basics of Probability Theory part of Data Analysis using R
Lecture 44
Statistical Method - Part 3
07:26
This section covers - Inferential Statistics part of Data Analysis using R
Lecture 45
Decision Tree - Part 1
13:54
This section covers - building decision trees using R with sample examples and demos
Lecture 46
Decision Tree - Part 2
09:33
This section covers - Decision Trees using Random Forest Algorithm in R