Course Description

Predictive Modeling is all about predicting an outcome (outcome/response variable), based on a bunch of predictors. In this course, we will work through an example where we are interested in predicting a customers Loan_Status based on a bunch of historical data (the predictors). Predictive Modeling is an attractive option because it brings a ton of value to any organization. However, it is also on the higher end of the complexity spectrum when compared to some other business analytic options. The complexity in utilizing predictive modeling largely manifests in the number of steps required and the expertise required for some of the steps. This course is designed to reduce and/or eliminate these issues.

What am I going to get from this course?

1. Understand the challenges of analytics.
2. Gain an intuitive understanding of Logistic Regression.
3. Perform a data audit.
4. Perform a Uni-variate and Bi-variate analysis of features/variables.
5. Deal with Missing Data. (Multiple Imputation)
6. Adjustments for Oversampling
7. Dimension Reduction for your Categorical Inputs
8. Perform Variable Clustering
9. Subset selection (utilizing automatic techniques for variable/feature selection)
10. Analyze Multicollinearity in your Features
11. Analyze Classifier Performance (ROC curves, Optimal cutoffs, K-S)
12. Partition your Data, Train a Model, and Make Predictions on Unseen Data

Prerequisites and Target Audience

What will students need to know or do before starting this course?

If you intend on following allowing, you need access to a software that allows you to program/code in the SAS programming language. SAS Studio can be used for learning purposes. SAS Enterprise is available in the business community. WPS is the software that I use, and they also have all kinds of versions of their software, including for those in school (with a school email address).

Who should take this course? Who should not?

1. Companies who are interested in training their workforce in the different aspects of predictive modeling, including ultimately training a model and making predictions on unseen data. (Banking industry, insurance, etc)

2. Companies who want their workforce to familiarize themselves with the power of SAS programming.

3. If you have no experience with SAS Base programming or beginner statistics, I would suggest first gaining that knowledge.

Curriculum

Module 1: Introduction

Lecture 1 Introduction (the objectives of the course)

I discuss the three main objectives of the course.

Lecture 2 Business Applications

How does Predictive Modeling get applied in the business world?

Lecture 3 Analytics Challenges

If predictive modeling is so great why doesn't every company institute it?

Lecture 4 The Major Steps in Predictive Modeling

In this video, I discuss the three most major components of predictive modeling.

Lecture 5 Understanding Logistic Regression

I explain the intuition behind Logistic Regression.

Module 2: Understanding the Problem, Hypotheses, Analysis

Lecture 6 Problem Statement/Hypothesis Generation

This is the starting point for any predictive modeling. Understand problem statements and hypothesis generation and how to properly go about it.

Lecture 7 Data Audit

I tell you how to do a simple data audit.

Lecture 8 Uni-variate Analysis

This is the simplest form of analysis but necessary and useful.

Lecture 9 Bi-variate Analysis

Let's do a bi-variate analysis.

Lecture 10 Important Housekeeping

I show you how to change the response variable to a dummy variable (0 and 1) from Yes and No, and also how to get rid of characters in the predictor variable (Dependents).

Lecture 11 Some important code for you to add

Some important code for you to add

Module 3: Preparing the Input Variables

Lecture 12 Sources, Patterns and Mechanisms of Missing Data

Before you can address missing data (values), you must understand the different sources, patterns, and mechanisms of missing data.

Lecture 13 Evaluating Missing Patterns with SAS Code

How do you figure out which missing pattern you have using SAS code?

Lecture 14 3 Phase Multiple Imputation Process in Detail

I go over the 3 phase multiple imputation process and explain the code in detail.

Lecture 15 Considering the Output from MI

We look at some of the output from the multiple imputation (MI) process.

Lecture 16 Mean, Median and Mode Imputation (SAS Code)

If you have very little missing data, 3-5% and less per variable, then mean, median and mode imputation will be sufficient. I provide the SAS code to make that happen.

Lecture 17 Oversampling and Adjusting for Oversampling

Sometimes events are rare. For example, fraud is uncommon, and this brings with it particularly problems. One way to solve this problem is to over-sample, but once we over-sample adjustments have to be made.

Lecture 18 Categorical Inputs

We consider how we can reduce the dimensions of our categorical inputs.

Lecture 19 Variable Clustering

A SAS procedure clusters your numeric variables which makes it easy to select some numeric variables over others.

Lecture 20 Multicollinearity

Multicollinearity diagnosis is important, as a failure to address this problem can lead to not appreciating the distinct impact of individual variables.

Lecture 21 Subset Selection

The final step when considering our final variables is to utilize automatic procedures available for proc logistic to select our final variables for our model.

Lecture 22 Parameter Estimates

In this video I discuss the importance of understanding parameter estimates.

Module 4: Evaluation Metrics

Lecture 23 Discrimination vs. Calibration (Article lesson)

Two important evaluation metrics are discussed.

Lecture 24 ROC Curve

ROC graphically summarizes sensitivity on vertical axis and 1-Specificity on the horizontal axis. This is a discrimination related metric.

Lecture 25 Scoring Validation Data Set Using SAS Code

Scoring Validation Data Set Using SAS Code

Lecture 26 Decile Calibration Plot

We discuss what calibration measures and how we can improve calibration.

Lecture 27 Kolmogorov–Smirnov (article)

KS refers to the distance between distributions of positive outcome and negative outcome. Higher KS means more separation of positive vs negative, i.e. better prediction.

Lecture 28 Feature Engineering

Creating new variables from the ones you already have is one way that you can improve your model.

Predictive Modeling Using Logistic Regression (With SAS)

Use logistic regression to predict an individual's behavior using SAS

Certification

Need Custom Training for Your Team?

Call Us

Inquire About This Course

Instructor

Ermin Dedic

Instructor: Ermin Dedic

Learn how to use SAS logistic regression for your predictive modeling needs.

About Course

Prerequisites

Curriculum

Course Description

What am I going to get from this course?

Prerequisites and Target Audience

What will students need to know or do before starting this course?

Who should take this course? Who should not?

Curriculum

Module 1: Introduction

Module 2: Understanding the Problem, Hypotheses, Analysis

Module 3: Preparing the Input Variables

Module 4: Evaluation Metrics