Industry recognized certification enables you to add this credential to your resume upon completion of all courses

Need Custom Training for Your Team?
Get Quote
Call Us

Toll Free (844) 397-3739

Inquire About This Course
Josh Browning, Instructor - Classification & Regression Trees / Random Forests

Josh Browning

Joshua Browning currently works as a senior data scientist for an e-commerce company. He has 10 years of experience in statistics and data science, working in varied industries from aerospace engineering to international organizations. He has several years experience in education as a mathematics instructor/tutor, and is passionate about helping people understand complex subjects in meaningful and applicable ways.

Instructor: Josh Browning

Using Non-Linear Models to Understand Data

  • Learn and understand decision trees, random forests, boosted tree models, and interpret results to drive business decisions.
  • Instructor has 10 years of experience in statistics and data science, working in varied industries from aerospace engineering to international organizations. 

Duration: 4h 29m

Course Description

This course begins with a basic introduction, and describes why decision trees are useful tools and how they differ from more traditional analytical tools like linear regression. We’ll also cover some basics of R, as the examples in this course will use the R programming language to analyze data. The second module then dives into so-called “regression trees”, or decision trees for continuous variables (i.e. variables that take on numeric values, like sales amounts or number of purchases). It provides a theoretical basis for these models as well as practical examples and use cases. The third module is very similar to the second, except that it treats categorical variables (i.e. product type of next purchase) instead of continuous variables. In the fourth module, we’ll talk about random forests and the idea of combining many individual classification or regression trees to make one final, improved prediction. Module 5 builds on the idea of random forests, but presents a slightly different framework with boosted trees. You’ll learn about an implementation of boosted trees, XGBoost, which is one of the most popular tree algorithms and has been used extensively for machine learning problems.

What am I going to get from this course?

Learn and understand decision trees, random forests, boosted tree models, and interpret results to drive business decisions.

Prerequisites and Target Audience

What will students need to know or do before starting this course?

We will be using R for this course, but little prior knowledge is required (as long as you’re willing to learn a bit along the way).  Some basic understanding of mathematical functions and algorithms is also important.

Who should take this course? Who should not?

You should take this course if you want to learn how to start with tree based models
You should take this course if you want to implement tree based models in your daily work
You should take this course if you are curious about the theoretical ideas of machine learning models
You should not take this course if you have successfully implemented and used random forests and/or XGBoost on datasets in the past, unless you didn’t understand what you were doing.


Module 1: Module 1

Lecture 1 R Installation and Basics

Since we'll be using R for this course, this lecture will give a quick overview of how to install R and RStudio. We'll also look at some basics of R, but we won't go too in depth.

Lecture 2 Interpreting Linear Models

In this lecture, we'll examine how linear models handle interactions, and, in particular, the challenges that come with trying to interpret linear models. In later lectures, we'll compare these models to tree based models.

Lecture 3 Basics of Decision Trees

We'll discuss some basic terms and definitions of decision trees, and we'll look at a few very simple examples to see what tree models can look like.

Lecture 4 Modeling Non-Linearity

We discuss how trees can handle input features that have highly non-linear relationships with the target variable, and we compare this to linear regression, where all forms of non-linearity must be specified via transformations of the input features.

Quiz 1 Module 1 Quiz

Module 2: Regression Trees

Lecture 5 Dataset Introduction

We'll explore a sample dataset containing sales of 11 different types of orange juice products at 83 stores and over the course of 120 weeks. This dataset will be used extensively in the rest of the course.

Lecture 6 Boolean Features

We'll discuss Boolean features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single Boolean feature.

Lecture 7 Categorical Features

We'll discuss categorical features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single categorical feature.

Lecture 8 Continuous Features

We'll discuss continuous features and how they are used in regression trees. We'll also see an example of how to construct a regression tree using a single continuous feature.

Lecture 9 Split Finding

Previously, we've seen examples of regression trees using various types of features to construct models. In this lecture, we'll explore the rationale behind how these splits are chosen, and explore this theory with an example of one particular split.

Lecture 10 Advantages and Limitations

We've discussed several different properties of decision trees already, but in this lecture, we'll dive in a bit deeper to some of the strengths and weaknesses of regression trees, particularly as compared to linear regression.

Lecture 11 Full Model

In this lecture, we'll examine a more realistic regression tree model using many different features.

Lecture 12 Visual Interpretation

When fitting a tree based model, we usually first look at a plot of the decision tree in terms of the splits and rules used to create the splits. However, it's also sometimes interesting to explore the different relationships between the features and the target, and in this lecture, we'll examine ways of understanding relationships between one or two features (at a time) and the target.

Lecture 13 Complexity Parameters

Complexity parameters are used to control how complex or simple a regression tree is. We'll see some examples of various complexity parameters, and learn how they control the complexity of the tree.

Lecture 14 Optimizing Parameters

In order to create a good decision tree, we must find a good value for the complexity parameter. We'll discuss how we can pick good parameters for the model via a method known as cross-validation.

Quiz 2 Module 2 Quiz
Quiz 3 Mini-Project

Estimate a model for logmove vs price_over_min, price_under_max, price_per_oz_over_min, price_per_oz_under_max, and classification. Estimate a new model for logmove using exp(price_over_min), exp(price_under_max), exp(price_per_oz_over_min), exp(price_per_oz_under_max), and classification. Estimate a third model using exp(logmove) and the same features of the first model. Compare the performance of the three models, and describe which one you think is best.

Module 3: Classification Trees

Lecture 15 Comparison with Regression Trees

We'll begin this module by discussing a new problem: how can we estimate a target that takes on one of many different categories? We'll look at why the techniques we used for regression trees can't be directly applied here, but we'll start to investigate how such models could be constructed.

Lecture 16 Goodness of Fit

We saw in the previous lecture how RMSE can't be applied to classification trees. So, we'll explore alternative error metrics and seek to understand how we can measure how good a particular split of a classification tree is.

Lecture 17 Boolean targets

The simplest type of categorical variable is one taking on two values. This lecture will present an example of a classification tree with a Boolean (i.e. taking on only two values) target.

Lecture 18 Targets with 3+ Categories

We'll generalize from the previous lecture of Boolean targets to targets which have multiple categories. We'll look at an example of these types of classification trees within the Orange Juice dataset.

Quiz 4 Module 3 Quiz
Quiz 5 Module 3 Mini-Project

Module 4: Random Forests

Lecture 19 Introduction

We'll discuss the main idea behind random forests and develop some intuition around the basics of how they work with a simple example.

Lecture 20 Theory of Random Forests

We'll discuss the basic theory of random forests: why they work and why randomness is important. We'll also introduce the concept of out of bag observations.

Lecture 21 Tuning the Number of Trees

Determining the appropriate tuning parameters is an important part of any machine learning model. In this lecture, we'll learn how to pick a good value for the number of trees to fit in a random forest model.

Lecture 22 Tuning mtry

Determining the appropriate tuning parameters is an important part of any machine learning model. In this lecture, we'll learn how to pick a good value for the number of features to try at a every split in a random forest model.

Lecture 23 Interpretation/Importance

One advantage of using tree-based models is their interpretability, and we lose that interpretability when we average 500 trees. However, there are still some measures we can look at to understand why a random forest is predicting how it is, and we'll explore those in this lecture.

Quiz 6 Module 4 Quiz

Module 5: Gradient Boosting

Lecture 24 Motivation

We'll look at an example of tree-based models with updated weights to motivate the idea of gradient boosting algorithms.

Lecture 25 Optimizing Loss Functions

We'll first look at the algorithm for Adaboost, one of the first implementations of this idea of gradient boosting. We'll then discuss the idea of a loss function, and use this to understand how gradient boosting works.

Lecture 26 Tuning Gradient Boosting

We'll discuss all (or at least many of) the tuning parameters available for tuning a gradient boosting model. In Lecture 28, we'll look at how to optimize these tuning parameters while fitting a model.

Lecture 27 Simple Example

In this lecture, we'll see how to use the XGBoost algorithm (a nice implementation of gradient boosting) within R.

Lecture 28 Finding the Optimal Model

We'll dive into the details of the XGBoost model. We'll learn about how to tune the various parameters in an efficient way in order to get an optimized final model.

Lecture 29 Categorical Features in XGBoost

We've seen that the implementation of random forests and decision trees handle categorical features, but the XGBoost implementation does not accept such categorical features. We'll look at two ways of converting categorical features into numerical ones.

Quiz 7 Module 5 Quiz
Quiz 8 Module 5 Mini-Project


7 Reviews

Chris B

May, 2017

An excellent course to learn decision trees, random forests, boosted tree models, and interpret results for driving the business decisions. The introduction of the course generated much interest to build up the tempo for learning further.

Matthew D

May, 2017

All the five modules of the course are very useful and can be kept handy as a reference material. The exercise for making final and improved prediction was excellent. It was interesting to learn about boosted trees and xCBoost implementations. The practical exploration of a data set of 11 products of different types of orange juice at 83 stores over 120 weeks was the highlight of learning.

Ryan B

May, 2017

Excellent I must say. It was the best learning with the practical approach of the course. A very useful course indeed.

Dave M

July, 2017

The program explained a variety of model-based and algorithmic machine learning methods including classification trees, regression, random forests, and Naive Bayes. Nice that it included the comprehensive method of bringing about prediction functions involving feature creation, data collection, evaluation, and algorithms.

Paul G

July, 2017

The lecture transcript covers random forests, which we can think of as an extension to bagging for classification and regression trees. The basic idea is very well explained which is very similar to bagging in the sense that we bootstrap samples, so we resample of our observed data, and our training data set. Good course.

Winnie C

July, 2017

The most frequent tasks performed by data scientists and analysts are prediction and machine learning. This course finely covers the fundamental elements of forming and using prediction functions with a focus on functional applications.

Kristen M

July, 2017

The study has presented essential teaching in theories such as tests and training sets, error rates, and overfitting.