Course Description
Predictive Modeling is all about predicting an outcome (outcome/response variable), based on a bunch of predictors. In this course, we will work through an example where we are interested in predicting a customers Loan_Status based on a bunch of historical data (the predictors).
Predictive Modeling is an attractive option because it brings a ton of value to any organization. However, it is also on the higher end of the complexity spectrum when compared to some other business analytic options.
The complexity in utilizing predictive modeling largely manifests in the number of steps required and the expertise required for some of the steps. This course is designed to reduce and/or eliminate these issues.
What am I going to get from this course?
1. Understand the challenges of analytics.
2. Gain an intuitive understanding of Logistic Regression.
3. Perform a data audit.
4. Perform a Uni-variate and Bi-variate analysis of features/variables.
5. Deal with Missing Data. (Multiple Imputation)
6. Adjustments for Oversampling
7. Dimension Reduction for your Categorical Inputs
8. Perform Variable Clustering
9. Subset selection (utilizing automatic techniques for variable/feature selection)
10. Analyze Multicollinearity in your Features
11. Analyze Classifier Performance (ROC curves, Optimal cutoffs, K-S)
12. Partition your Data, Train a Model, and Make Predictions on Unseen Data
Prerequisites and Target Audience
What will students need to know or do before starting this course?
If you intend on following allowing, you need access to a software that allows you to program/code in the SAS programming language. SAS Studio can be used for learning purposes. SAS Enterprise is available in the business community. WPS is the software that I use, and they also have all kinds of versions of their software, including for those in school (with a school email address).
Who should take this course? Who should not?
1. Companies who are interested in training their workforce in the different aspects of predictive modeling, including ultimately training a model and making predictions on unseen data. (Banking industry, insurance, etc)
2. Companies who want their workforce to familiarize themselves with the power of SAS programming.
3. If you have no experience with SAS Base programming or beginner statistics, I would suggest first gaining that knowledge.
Curriculum
Lecture 1
Introduction (the objectives of the course)
I discuss the three main objectives of the course.
Lecture 2
Business Applications
How does Predictive Modeling get applied in the business world?
Lecture 3
Analytics Challenges
If predictive modeling is so great why doesn't every company institute it?
Lecture 4
The Major Steps in Predictive Modeling
In this video, I discuss the three most major components of predictive modeling.
Lecture 5
Understanding Logistic Regression
I explain the intuition behind Logistic Regression.
Module 2: Understanding the Problem, Hypotheses, Analysis
Lecture 6
Problem Statement/Hypothesis Generation
This is the starting point for any predictive modeling. Understand problem statements and hypothesis generation and how to properly go about it.
I tell you how to do a simple data audit.
Lecture 8
Uni-variate Analysis
This is the simplest form of analysis but necessary and useful.
Lecture 9
Bi-variate Analysis
Let's do a bi-variate analysis.
Lecture 10
Important Housekeeping
I show you how to change the response variable to a dummy variable (0 and 1) from Yes and No, and also how to get rid of characters in the predictor variable (Dependents).
Lecture 11
Some important code for you to add
Some important code for you to add
Module 3: Preparing the Input Variables
Lecture 12
Sources, Patterns and Mechanisms of Missing Data
Before you can address missing data (values), you must understand the different sources, patterns, and mechanisms of missing data.
Lecture 13
Evaluating Missing Patterns with SAS Code
How do you figure out which missing pattern you have using SAS code?
Lecture 14
3 Phase Multiple Imputation Process in Detail
I go over the 3 phase multiple imputation process and explain the code in detail.
Lecture 15
Considering the Output from MI
We look at some of the output from the multiple imputation (MI) process.
Lecture 16
Mean, Median and Mode Imputation (SAS Code)
If you have very little missing data, 3-5% and less per variable, then mean, median and mode imputation will be sufficient. I provide the SAS code to make that happen.
Lecture 17
Oversampling and Adjusting for Oversampling
Sometimes events are rare. For example, fraud is uncommon, and this brings with it particularly problems. One way to solve this problem is to over-sample, but once we over-sample adjustments have to be made.
Lecture 18
Categorical Inputs
We consider how we can reduce the dimensions of our categorical inputs.
Lecture 19
Variable Clustering
A SAS procedure clusters your numeric variables which makes it easy to select some numeric variables over others.
Lecture 20
Multicollinearity
Multicollinearity diagnosis is important, as a failure to address this problem can lead to not appreciating the distinct impact of individual variables.
Lecture 21
Subset Selection
The final step when considering our final variables is to utilize automatic procedures available for proc logistic to select our final variables for our model.
Lecture 22
Parameter Estimates
In this video I discuss the importance of understanding parameter estimates.
Module 4: Evaluation Metrics
Lecture 23
Discrimination vs. Calibration (Article lesson)
Two important evaluation metrics are discussed.
ROC graphically summarizes sensitivity on vertical axis and 1-Specificity on the horizontal axis. This is a discrimination related metric.
Lecture 25
Scoring Validation Data Set Using SAS Code
Scoring Validation Data Set Using SAS Code
Lecture 26
Decile Calibration Plot
We discuss what calibration measures and how we can improve calibration.
Lecture 27
Kolmogorov–Smirnov (article)
KS refers to the distance between distributions of positive outcome and negative outcome. Higher KS means more separation of positive vs negative, i.e. better prediction.
Lecture 28
Feature Engineering
Creating new variables from the ones you already have is one way that you can improve your model.