Course Description
The bootcamp will cover different aspects of Data Science with hands-on exercises and industrial use-cases. Major modules include Python for Data Science, Data Analysis, Practical Machine Learning, Large Scale Machine Learning, Deep Learning, and Business Perspectives of Data Science. This well-structured offering is by Data Science Initiative (DSI) - a team of industry and academia experts - who have executed 10 hands-on workshops in the past with an excellent rating of 4.5/5.0.
What am I going to get from this course?
- Perform effective data analysis i.e. 50% to 80% of Data Science work
- Investigate and present your work in effective visual form
- Understand most popular machine learning algorithms and their uses
- Implement end-to-end machine learning pipe-lines for given usecases
- Implement large-scale machine learning algorithms on cloud and through APIs
- Employ deep learning for unstructured data (Images and Text)
- Build a small block-chain
- Differentiate real data science from the fuzz around it
- Understand DS-business echo-system and can identify right use-cases
Prerequisites and Target Audience
What will students need to know or do before starting this course?
Participants may brush some basic concepts of probability, matrices, and programming (any language).
Who should take this course? Who should not?
You should join this Bootcamp if you are among:
- Business or IT Professionals who wish to transform to a data scientist role
- Early stage data scientists and data analysts who wish to learn end-to-end data science from a team of industry experts.
- Students and fresh graduates who want to pursue a career in Data Science, Machine Learning and Artificial Intelligence Area
- Researchers from any field who works with data and would like to employee Machine Learning in their research
- Non-python Data Scientists who would like to go through a quick hands-on transformation to Python.
Curriculum
Module 1: Data Science Landscape
Lecture 1
Data Science Landscape - Overview
This sub-module will enable participants to differentiate real data science from the fuzz around it and develop a solution-oriented mindset. It is scheduled in the beginning of the workshop.
• The four building blocks of Data Science
• Common mistakes and best practices for data scientists
• Relation of Data Science with AI, Machine Learning, Deep Learning, and other fields
• Technology stack and choosing the best technology
• An overview of industrial applications and their requirements
Lecture 2
Data Science from Business Perspective
Scheduled in the middle of the workshop, this module will introduce participants with practical requirements of business use-cases and best practices during CRISP-DM or garage approach.
• Identifying and prioritizing data science use-cases within an organization
• Translating business problems into ML problems
• Analytics-roadmap for organizations
• Mapping of Design Thinking to Data Science
• Introduction and requirements for ML project P2, that will follow this session
Lecture 3
Data Science Roles and Methods
At the end of the workshop when the participants have gone through the end-to-end data science journey, this module will help participants understand which role fit best to them, the best practices, and how to enter and excel in that role.
• Data Science value perspective
• Data Science teams and organizations
• Data Scince roles (e.g. Data Engineer, Analyst, Data Scientist, ML Engineer, etc)
• Methods such as CRISP-DM, garage, scrum
• Communication with non-data scientists
• The non-technical skills needed for excellence in Data Science
Module 2: Exploratory Data Analysis with Python
Lecture 4
Python Programming for Data Science
Python is an easy to understand scripting language – yet its compact programming style and vast amount of libraries makes it a challenge for learners to focus on what matters the most. With a carefully designed contents trajectory, this hands-on module will provide participants a good basis in Python for the rest of the workshop – which then includes most relevant libraries and methods.
• Why Python is the most popular language for Data Scientists
• Introduction to Python as a language
• Python native data structures including Lists, Set, Dictionary, Tuple
• Numpy and Scipy
• Control Structures
• Functions and Classes
• Hands-on Exercises in Python
Lecture 5
Exploratory Data Analysis with Python
EDA is what consume most time of a Data Scientist (50% to 80%). This module will equip participants with best techniques to do efficient and effective EDA.
• Introduction to EDA, common mistakes, best practices
• Introduction to Pandas library
• DataFrame and Series data structures in pandas
• Reading data from different sources (csv, web, excel, json, SQL, txt, etc)
• DataFrame operations e.g. filtering, filling, merging, conditioning, aggregation
• Summarization, outlier detection, and bird-eye-view reporting
• Map-reduce for efficient operations on DataFrames
• EDA hands-on excercises
Lecture 6
Data Visualization
Data Visualization is not only helpful to communicate results and findings with others but is equally important for data scientist itself in order to understanding the data. This module will enable participants to develop quick and pretty visualizations using Python libraries.
• Good and bad types of visualizations
• Practical working with Matplotlib – making any visualization
• Practical working with Seaborn – making interactive and pretty visualizations
• Practical working with Plotly – making and deploying interactive and pretty visualizations
Lecture 7
Data Analysis Project
This module will allow participants test and improve their skills developed so far i.e. in EDA with pandas, and visualization libraries in Python. A business use case will be discussed as a reference case for exercises.
• Use-case and problem statements
• Data loading and merging
• Data analysis
• Data cleaning
• Data exploration
• Data visualization
Module 3: Practical Machine Learning
Lecture 8
Basics of Machine Learning
This module will provide theoretical understanding of machine learning algorithms, their working, their advantages and limitations, and hence demystifying it for participants – eventually hoping they will be able to decide their own design of ML if needed.
• Types of Machine Learning algorithms and application scenarios
• Classification algorithms (e.g. Naïve Bayes, Decision Trees, KNN, ANN, Support Vector Machines)
• Regression algorithms (e.g. cousins of classification algorithms, Linear/Ridge/Lasso Regression and all other cousins of classification algorithms)
• Ensemble methods (e.g. Random Forests, Gradient Boosted Trees)
• Outlier detection (e.g. One-Class SVM, auto-encoders)
• Clustering algorithms (e.g. K-Means, DBSCAN, Hierarchical clustering)
• Feature Selection and Dimensionality reduction (e.g. PCA, LDA, RFE and other techniques)
Lecture 9
Practical Machine Learning with Python
By using Scikit-learn as the main library, this tool will enable participants to apply machine learning process through model selection, model building, parameter optimization and evaluation.
• Introduction of Scikit-learn
• Exploration of built-in datasets
• Building an ML model using Scikit-learn
• Splitting data into training, validation, and testing
• Cross-validation techniques
• Hyperparameter search using GridSearchCV
• ML Excercises and Assignment
Lecture 10
ML Project – Mobility Prediction in Metropolis
This is the first of two projects in Machine Learning. The objective of this projects is to enable participants apply their so far gained knowledge of end-to-end data science process (EDA+ML) on a real-world scenario.
• Quick introduction of the use-case
• Loading data from different sources
• EDA and Data cleaning
• Building machine learning models
• Validation and testing of models
• Debugging machine learning model w.r.t overfitting and underfitting
Lecture 11
Advance ML with Scikit-Learn
Primary objective of this module is to enable participants to use Pipelines as end-to-end machine learning construct. The module will also cover more details on feature selection, and dimensionality reduction with exercises.
• Curse of Dimensionality
• Feature selection methods (Univariate and multivariate methods)
• Dimensionality reduction techniques exercises (PCA, LDA, RFE, etc)
• Concept of transformations and operations in Python
• Introduction to Machine Learning Pipelines in Scikit-learn
• Building 2-steps, 3-steps, k-steps ML Pipelines (e.g. Feature Selection + Feature Engineering + Classification)
• Pipelines and GridSearchCV
• Exercise of all of above
• The untold truth of Machine Learning
Lecture 12
ML Project – Customer Churn Prediction
This is second ML project with and objective to exercise a full CRISP-DM cycle i.e. domain-understanding + data understanding + data analysis + model building + model optimization + model deployment (as a pipeline).
• Domain understanding from a problem statement
• Exploratory Data Analysis
• Feature Engineering and Feature Selection
• Building machine learning models
• Validation and testing of models
• Doing it all within a Machine Learning Pipeline
• Applying CRISP-DM
Module 4: Data Science at Scale
Lecture 13
Machine Learning for Large Scale Applications
This is second ML project with and objective to exercise a full CRISP-DM cycle i.e. domain-understanding + data understanding + data analysis + model building + model optimization + model deployment (as a pipeline).
• Building ML Apps with REST APIs using Flask
• Introduction to Apache Spark
• Machine Learning on Spark – A Hands-on session
• Introduction to Cloud-based Artificial Intelligence
• Architecture of large-scale AI Applications
Lecture 14
Building Decentralized Applications: Blockchain Network
This module will enable participants understand and experiments on the concepts of decentralized applications.
• Understanding Blockchain
• Distributed Ledgers
• Cryptocurrencies
• Blockchain Potential: Use cases
• Building a Blockchain application with Python (hands-on)
Deep Learning is probably the most talked-about area in Data Science. This module will equip participants with the understanding and practical experience of building a deep neural network using Keras/Tensorflow.
• Types and applications of Neural Networks
• Multi-layer Backpropagation Networks
• Activation function (Sigmoid, Tanh, Relu, etc)
• Introduction to Convolutional Neural Network
• Introduction to Keras
• Development of Image classification using Deep Neural Network
• Development of Image classification using Convolutional Neural Network