Course Description
This course begins with a basic introduction, and describes why decision trees are useful tools and how they differ from more traditional analytical tools like linear regression. We’ll also cover some basics of R, as the examples in this course will use the R programming language to analyze data.
The second module then dives into so-called “regression trees”, or decision trees for continuous variables (i.e. variables that take on numeric values, like sales amounts or number of purchases). It provides a theoretical basis for these models as well as practical examples and use cases.
The third module is very similar to the second, except that it treats categorical variables (i.e. product type of next purchase) instead of continuous variables.
In the fourth module, we’ll talk about random forests and the idea of combining many individual classification or regression trees to make one final, improved prediction.
Module 5 builds on the idea of random forests, but presents a slightly different framework with boosted trees. You’ll learn about an implementation of boosted trees, XGBoost, which is one of the most popular tree algorithms and has been used extensively for machine learning problems.
What am I going to get from this course?
Learn and understand decision trees, random forests, boosted tree models, and interpret results to drive business decisions.
Prerequisites and Target Audience
What will students need to know or do before starting this course?
We will be using R for this course, but little prior knowledge is required (as long as you’re willing to learn a bit along the way). Some basic understanding of mathematical functions and algorithms is also important.
Who should take this course? Who should not?
You should take this course if you want to learn how to start with tree based models
You should take this course if you want to implement tree based models in your daily work
You should take this course if you are curious about the theoretical ideas of machine learning models
You should not take this course if you have successfully implemented and used random forests and/or XGBoost on datasets in the past, unless you didn’t understand what you were doing.
Curriculum
Lecture 1
R Installation and Basics
10:11
Since we'll be using R for this course, this lecture will give a quick overview of how to install R and RStudio. We'll also look at some basics of R, but we won't go too in depth.
Lecture 2
Interpreting Linear Models
03:24
In this lecture, we'll examine how linear models handle interactions, and, in particular, the challenges that come with trying to interpret linear models. In later lectures, we'll compare these models to tree based models.
Lecture 3
Basics of Decision Trees
06:40
We'll discuss some basic terms and definitions of decision trees, and we'll look at a few very simple examples to see what tree models can look like.
Lecture 4
Modeling Non-Linearity
04:23
We discuss how trees can handle input features that have highly non-linear relationships with the target variable, and we compare this to linear regression, where all forms of non-linearity must be specified via transformations of the input features.
Module 2: Regression Trees
01:36:32
Lecture 5
Dataset Introduction
17:54
We'll explore a sample dataset containing sales of 11 different types of orange juice products at 83 stores and over the course of 120 weeks. This dataset will be used extensively in the rest of the course.
Lecture 6
Boolean Features
06:05
We'll discuss Boolean features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single Boolean feature.
Lecture 7
Categorical Features
03:59
We'll discuss categorical features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single categorical feature.
Lecture 8
Continuous Features
05:49
We'll discuss continuous features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single continuous feature.
Lecture 9
Split Finding
10:52
Previously, we've seen examples of regression trees using various types of features to construct models. In this lecture, we'll explore the rationale behind how these splits are chosen, and explore this theory with an example of one particular split.
Lecture 10
Advantages and Limitations
10:19
We've discussed several different properties of decision trees already, but in this lecture, we'll dive in a bit deeper to some of the strengths and weaknesses of regression trees, particularly as compared to linear regression.
Lecture 11
Full Model
09:01
In this lecture, we'll examine a more realistic regression tree model using many different features.
Lecture 12
Visual Interpretation
12:26
When fitting a tree based model, we usually first look at a plot of the decision tree in terms of the splits and rules used to create the splits. However, it's also sometimes interesting to explore the different relationships between the features and the target, and in this lecture, we'll examine ways of understanding relationships between one or two features (at a time) and the target.
Lecture 13
Complexity Parameters
09:49
Complexity parameters are used to control how complex or simple a regression tree is. We'll see some examples of various complexity parameters, and learn how they control the complexity of the tree.
Lecture 14
Optimizing Parameters
10:18
In order to create a good decision tree, we must find a good value for the complexity parameter. We'll discuss how we can pick good parameters for the model via a method known as cross-validation.
Estimate a model for logmove vs price_over_min, price_under_max, price_per_oz_over_min, price_per_oz_under_max, and classification. Estimate a new model for logmove using exp(price_over_min), exp(price_under_max), exp(price_per_oz_over_min), exp(price_per_oz_under_max), and classification. Estimate a third model using exp(logmove) and the same features of the first model. Compare the performance of the three models, and describe which one you think is best.
Module 3: Classification Trees
45:30
Lecture 15
Comparison with Regression Trees
07:29
We'll begin this module by discussing a new problem: how can we estimate a target that takes on one of many different categories? We'll look at why the techniques we used for regression trees can't be directly applied here, but we'll start to investigate how such models could be constructed.
Lecture 16
Goodness of Fit
14:04
We saw in the previous lecture how RMSE can't be applied to classification trees. So, we'll explore alternative error metrics and seek to understand how we can measure how good a particular split of a classification tree is.
Lecture 17
Boolean targets
09:37
The simplest type of categorical variable is one taking on two values. This lecture will present an example of a classification tree with a Boolean (i.e. taking on only two values) target.
Lecture 18
Targets with 3+ Categories
14:20
We'll generalize from the previous lecture of Boolean targets to targets which have multiple categories. We'll look at an example of these types of classification trees within the Orange Juice dataset.
Quiz 5
Module 3 Mini-Project
Module 4: Random Forests
38:44
Lecture 19
Introduction
05:59
We'll discuss the main idea behind random forests and develop some intuition around the basics of how they work with a simple example.
Lecture 20
Theory of Random Forests
08:22
We'll discuss the basic theory of random forests: why they work and why randomness is important. We'll also introduce the concept of out of bag observations.
Lecture 21
Tuning the Number of Trees
10:00
Determining the appropriate tuning parameters is an important part of any machine learning model. In this lecture, we'll learn how to pick a good value for the number of trees to fit in a random forest model.
Lecture 22
Tuning mtry
05:38
Determining the appropriate tuning parameters is an important part of any machine learning model. In this lecture, we'll learn how to pick a good value for the number of features to try at a every split in a random forest model.
Lecture 23
Interpretation/Importance
08:45
One advantage of using tree-based models is their interpretability, and we lose that interpretability when we average 500 trees. However, there are still some measures we can look at to understand why a random forest is predicting how it is, and we'll explore those in this lecture.
Module 5: Gradient Boosting
01:03:12
Lecture 24
Motivation
05:15
We'll look at an example of tree-based models with updated weights to motivate the idea of gradient boosting algorithms.
Lecture 25
Optimizing Loss Functions
09:42
We'll first look at the algorithm for Adaboost, one of the first implementations of this idea of gradient boosting. We'll then discuss the idea of a loss function, and use this to understand how gradient boosting works.
Lecture 26
Tuning Gradient Boosting
06:28
We'll discuss all (or at least many of) the tuning parameters available for tuning a gradient boosting model. In Lecture 28, we'll look at how to optimize these tuning parameters while fitting a model.
Lecture 27
Simple Example
08:23
In this lecture, we'll see how to use the XGBoost algorithm (a nice implementation of gradient boosting) within R.
Lecture 28
Finding the Optimal Model
20:26
We'll dive into the details of the XGBoost model. We'll learn about how to tune the various parameters in an efficient way in order to get an optimized final model.
Lecture 29
Categorical Features in XGBoost
12:58
We've seen that the implementation of random forests and decision trees handle categorical features, but the XGBoost implementation does not accept such categorical features. We'll look at two ways of converting categorical features into numerical ones.
Quiz 8
Module 5 Mini-Project