Course Description
This course teaches you how to write programs in Apache Storm to take streaming data from tools like Kafka and Twitter in real time, process in Storm and save to tables in Cassandra or files in Hadoop HDFS. You will be able to develop distributed stream processing applications that can process streaming data in parallel and handle failures. You will be able to implement data transformations like maps and filters in Apache Storm, implement stateful stream processing and exactly once processing. It covers some administrative aspects also like setting up an Apache Storm cluster, scheduling, monitoring and metrics reporting.
This is a hands on course, so you will be developing many Apache Storm programs in the course using Eclipse IDE and Java programming. Theory will be intermixed with practice so that you implement what you have learned as a developer. You will write more than thirty programs during this course.
Only way to learn a new tool quickly is to practice by writing programs. This course provides you the right mix of theory and practice with real life industry use of Apache Storm. By enrolling in this course, you will be on a journey to become a big data developer using Apache Storm.
What am I going to get from this course?
Implement Apache Storm programs that take real time streaming data from tools like Kafka and Twitter, process in Storm and save to tables in Cassandra or files in Hadoop HDFS. You will be able to develop distributed stream processing applications that can process streaming data in parallel and handle failures. You will be able to implement stateful stream processing, implement data transformations like maps and filters and implement exactly once processing.
Prerequisites and Target Audience
What will students need to know or do before starting this course?
- Experience in devloping software projects
- Some programming experience in Java required
- Use of Java IDE like Eclipe or IntelliJ
Who should take this course? Who should not?
Real time big data processing tools have become main stream now and lot of organizations have started processing big data in real time. Apache storm is one of the popular tools for processing big data in real time. If you are familiar with Java, then you can easily learn Apache Storm programming to process streaming data in your organization. Through this course, I aim to provide you with working knowledge of Apache Storm so that you can write distributed programs to process streaming data.
Curriculum
Lecture 2
Course Prerequisites
Lecture 3
Course Structure
Lecture 4
Data Sizes in Big Data
Lecture 5
Big Data Problem
Lecture 6
Traditional Solution
Lecture 7
Big Data Solution
Lecture 8
Demo and practice activity: Install Eclipse
Download, Install and start Eclipse
Lecture 9
Down load the training programs
Download the training program zip file. Create a directory C:\storm in windows and unzip the training program zip file in that directory. It will create three directories input, output and training and copy the files. The output folder will be empty.
Lecture 10
Demo and practice activity: Create a maven project in Eclipse
Create a maven project and set build path
Lecture 11
Demo and practice activity : Add Apache storm programs to Eclipse project
Add the training programs provided to the created eclipse project
Lecture 12
Demo and practice activity: Compile the storm program in Eclipse
Correct the mistakes, adjust the build path and create a run configuration to run the program
Lecture 13
Demo and practice activity: Run the Apache Storm program from Eclipse
Using the run configuration, run the storm program in local cluster and see the results.
Module 2: Introduction to Apache Storm
Lecture 16
Storm Features
Lecture 18
Storm Architecture
Lecture 19
Storm Data Model
Lecture 20
Storm Topology
Lecture 21
Storm Topology Simple Example
Lecture 22
Demo and practice activity: Create a simple Apache Storm program
In this demo, practice to create a simple Storm program using the sample program provided and run in the local cluster to see the results.
Lecture 23
Storm Topology: Case Study1
Lecture 24
Demo and practice activity: Implement the case study as Apache Storm program
Implement the case study 1 program in Eclipse and run the program to see the results
Lecture 25
Storm Topology: Case Study2
Lecture 26
Demo and practice activity: Implement the case study 2 program
Implement the case study 2 program in Eclipse, run and see the results.
Lecture 28
Demo and practice activity: Implement the periodic processing in Storm with tick tuples
Use tick tuples to implement the Apache Storm program for periodic processing. Run and see the results.
Lecture 30
Practice activity: Write the storm programs for the five assignments and run them with the data provided
Five assignments are described in the document. Modify the programs in this section to complete assignment programs and run them. Sample programs are provided to help with few assignments. Download them from the download section. Look at them only if you have trouble completing the assignment programs. Make sure to run them and see the results before you move on to the next section.
Module 3: Storm Installation & Configuration
Lecture 32
Storm Environment Setup
Lecture 33
Install Zookeeper
Lecture 34
Storm Download
Lecture 35
Starting Storm Servers
Lecture 37
Demo and practice activity: Create a thin jar in Eclipse
Use this demo to create a thin jar that you can use to run your Apache Storm programs. The maven build in Eclipse can be used to build a thin jar.
Lecture 38
Demo and practice activity: Create a far jar in Eclipse
Create a fat jar for the storm program so that it includes the dependent libraries.
Lecture 39
Submitting a Job to Storm
Lecture 41
Storm Topology
Lecture 43
Using Eclipse for Storm Programs
Lecture 44
Setup a Storm Cluster
Lecture 46
Practice activity: Perform the five activities specified
Practice what you have learned in this section by completing the practice activities. Sample program is provided for one of the activities in the download section.
Module 4: Storm Classes & Groupings
Lecture 48
Bolt Parallelism
Lecture 49
Stream Grouping
Lecture 50
The Fields Class
Lecture 51
Storm Classes and Interfaces
Lecture 52
IRichSpout Interface
Lecture 53
NextTuple Method
Lecture 54
IRichBolt Interface
Lecture 55
Building a Topology
Lecture 56
Declarer Interfaces
Lecture 57
Demo and practice activity: Shuffle grouping with multiple tasks
Lecture 58
Demo and practice activity: Fields grouping with multiple tasks
Lecture 59
Normal Tuple Processing in Storm
Lecture 60
Demo and practice activity: Implement reliable processing in Storm
Lecture 62
Practice activity: Write the programs for the nine activities listed and run to check the output
Nine activities listed provide good practice for this section. Sample programs are provided for some of the activities in the download section. Look at them only after trying out the activities. Always run, correct the mistakes and check the output.
Lecture 65
Trident Operations
Lecture 66
Case Study: Trident Operations
Lecture 67
Demo and practice activity: Implement Trident stream transformations
The previous case study is illustrated with the actual program in Eclipse. The student is encouraged to create this program in Eclipse using the training program files provided, run and check the results.
Lecture 69
Partition Agreggate
Lecture 70
General Aggregator
Lecture 71
Repartitioning Operations
Lecture 72
Aggregate Operations
Lecture 73
Operations on Grouped Streams
Lecture 75
Trident Exactly Once Processing
Lecture 76
Case Study: Trident State Updates
Lecture 77
Demo and practice activity: Trident state implementation part 1 : Spout implementation
The Trident state processing and exactly once processing implementation is quite complex. It is implemented and illustrated in a step by step manner in multiple parts. I start with showing the spout that produces the batch of tuples.
Lecture 78
Demo and practice activity: Trident state implementation part 2: IBackingMap implementation
I continue here with the implementation of IBackingMap interface in Trident. Part of this class, the method multiGet is illustrated here.
Lecture 79
Demo and practice activity: Trident state implementation part 3: IBackingMap and StateFactory implementation
Here I cover the multiPut method of IBackingMap implementation and continue with simple implementation of StateFactory.
Lecture 80
Demo and practice activity: Trident state implementation part 4: The main method implementation
Now that I have all the pieces in place, it is time to connect the pieces in the main method by creating the Trident topology and adding the spout and state processing to the topology.
Lecture 81
Demo and practice activity: Trident state implementation part 5: Run the Trident state processing program
It is finally time to see the fruits of our labor. Here I will run the create program in the local cluster and see the results. Make sure you also follow the demo and run the program on your machine to check the results.
Lecture 83
Practice activity: Write the programs for the six activities listed and check the output
These six activities help you apply the Trident interface to processing streams in Apache Storm. Sample programs are provided for some of the activities. You can download the sample programs from the download section,
Module 6: Storm Scheduling
Lecture 85
Storm User Interface
Lecture 86
Storm Schedulers
Lecture 87
Isolation Scheduler
Lecture 88
Resource Aware Scheduler
Lecture 89
Resource Aware Scheduler: Example
Lecture 90
Default Configurations
Lecture 91
Metrics Reporting
Lecture 92
Configuration for Ganglia
Lecture 94
Practice activity: Perform the two activities listed in this section
Perform the two activities listed in this section by modifying the existing programs. A sample modified file is provided. You can download the sample program from the download section.
Lecture 95
Demo: Monitor multiple topologies using Storm User Interface
Look at multiple topologies including reliable topology and Trident topology in the Storm UI
Module 7: Storm Interfaces
Lecture 98
Storm Kafka Spout Example
Lecture 99
Compiling for Kafka
Lecture 100
Demo and practice activity: Setup and start Zookeeper and Kafka servers
To illustrate Storm interface to Kafka, let us first setup Zookeeper and Kafka and start the servers.
Lecture 101
Demo and practice activity: Create a new topic in Kafka
Create a topic in Kafka so that Storm can receive messages from this topic.
Lecture 102
Demo and practice activity: Start Kafka producer
Start the Kafka console producer process that can take the typed messages and send them to Storm
Lecture 103
Demo and practice activity: Storm Program for interfacing with Kafka
Here you can look at the Apache Storm program that uses a Kafka client spout to connect to a Kafka topic and gets the messages and prints them out.
Lecture 104
Demo and practice activity: Run the program and see flow of messages from Kafka to Storm
Here you will start the Storm program that interfaces with Kafka. Messages will be entered for Kafka and the same messages can be seen in the Storm output.
Lecture 106
Setting Properties for Cassandra
Lecture 107
Writing to Cassandra Table
Lecture 108
Real Time Data Analytics Platform
Lecture 109
Demo and practice activity: Setup and start Cassandra server
Install Cassandra and start the Cassandra server
Lecture 110
Demo and practice activity: Create key space and table in Cassandra
Create a key space in Cassandra and create a table in Cassandra to receive the data from Storm.
Lecture 111
Demo and practice activity: Look at the Storm program that takes messages from Kafka and stores to table in Cassandra
Here I illustrate the real time data analytics platform with the Apache Storm program that takes messages from a topic in Kafka and stores as rows into a table in Cassandra in real time.
Lecture 112
Demo and practice activity: Run the Kafka-Storm-Cassandra interface program to see the flow of data from Kafka to Cassandra table
Finally the real time data analytics platform is illustrated by running the Storm interface program. You can enter the messages for Kafka topic in one console window and see the data updated in the Cassandra table in another console window.
Lecture 113
Example Writing to HDFS
Lecture 114
Demo and practice activity: Create the program to store data into Hadoop HDFS from Kafka
The program illustrates reading data from Kafka topic and inserting into a directory in HDFS.
Lecture 115
Interfacing with Twitter
Lecture 116
Setting Authorization
Lecture 117
Demo and practice activity: Create the program for getting tweets from Twitter in Apache Storm
This program illustrates using Twitter4J to get data from Twitter and processing the tweets in Storm. The link for creating the Twitter developer account and getting the Twitter credentials is provided in the download section.
Lecture 118
Demo and practice activity: Run the twitter interface program and look at the live tweets
The program filters the tweets in real time for certain key words and displays the tweets. You can also run the program by providing the Twitter credentials. The Twitter credentials can be obtained from the link provided.
Lecture 120
Practice activity: Write the programs for the seven activities listed in this section and check the results
Seven activities in this section can be used to practice the Storm interfaces. You can go through the demos multiple times to perform the commands for Kafka and Cassandra as well. Sample programs are provided for some of the activities. You can download the sample programs from the download section.
Lecture 121
Course Conclusion
Course summary, next steps
Quiz 1
Big Data - Storm Intro - Installation
Quiz 2
Installation - Classes & Groupings
Quiz 3
Scheduling & Monitoring - Interfaces - Trident