Industry recognized certification enables you to add this credential to your resume upon completion of all courses

Need Custom Training for Your Team?
Get Quote
Call Us

Toll Free (844) 397-3739

Inquire About This Course
David Sanchez, Instructor - Model Assessment in Machine Learning

David Sanchez

David N. Sanchez has 14 years of experience in industry software development and statistics. He has worked in many industries including health care and health insurance (Sierra Health/Health Plan of Nevada), higher learning (California State University, Fullerton), real estate and finance (First American Title Software), and communications (Flash of Genius/UpdatePromise). Currently he works with various clients developing machine learning applications and consulting on data infrastructures. This includes hands on work with Python, R, cloud technologies such as AWS Redshift, NOSQL technologies such as MongoDB and ElasticSearch, and visualization technologies such as d3.js. David has an M.S. in Computer Science from California State University, Fullerton with an emphasis in bioinformatics and machine learning.

Instructor: David Sanchez

Methods for Evaluating Supervised and Unsupervised Learning Models

  • Learn how to choose the right supervised or unsupervised model for your problem.
  • Instructor has 14 years of experience in industry software development and statistics. He has worked in many industries including health care and health insurance (Sierra Health/Health Plan of Nevada), higher learning (California State University, Fullerton), real estate and finance (First American Title Software), and communications (Flash of Genius/UpdatePromise).

Duration: 3h 01m

Course Description

This course will cover the model assessment of unsupervised and supervised learning models. You will start with the basics and learn terms pertaining to model assessment. Model validation and assessment during training will be covered. In addition assessing and evaluating unsupervised learning model methods will be taught. Finally, you learn different metrics for determining if you have a winning supervised learning model. Furthermore, all the methods and metrics discussed in this course are accompanied by demonstrations using Python 3 and easy to execute Jupyter notebooks.

What am I going to get from this course?

Choose the right supervised or unsupervised model for their problem using the metrics and methods taught in the course.

Prerequisites and Target Audience

What will students need to know or do before starting this course?

Students should be familiar with Python 3 and the packages SCI-Kit Learn, Numpy, and Pandas. This course also utilizes Jupyter notebooks so you should be able to load and run notebooks.

Who should take this course? Who should not?

This course should be taken if you want to gain additional skills that go beyond simply fitting a model and viewing the model score using the simple fit and score of machine learning packages. You will learn how to calculate those scores yourself as well as calculate additional metrics for evaluation. Learning those additional skills will give you better insight when evaluating and selecting a model.

If you do not need to constantly compare different types of models or different regression models using different parameters this course may not be for you. Also, if you do not need to get insight during the training of your supervised learning models you can pass on this course.


Module 1: Introduction to Model Assessment

Lecture 1 Welcome

An introduction to the instructor and an overview of the course.

Lecture 2 Overview of Model Assessment

Overview of what to expect from module one. You will know why model assessment is important.

Lecture 3 Supervised Learning vs. Unsupervised Learning

Although you may already know the difference between supervised and unsupervised learning, this lecture will refresh your memory by discussing types of models used for each learning methodology.

Lecture 4 Bias

Learn how to recognize bias in your machine learning models.

Lecture 5 Variance

Learn the signs of variance and what it means to have over fitting in your model.

Quiz 1 Model Assessment Overview Quiz

Quiz yourself on the model assessment terms learned in this module.

Module 2: Validation

Lecture 6 Reason for Validation

Understand the reasons for validating your model. Also, you will be introduced to some common methods of validating your model.

Lecture 7 Leave One Out Cross Validation

Learn the most basic type of validation and the benefits of splitting your data in two.

Lecture 8 K-Fold Cross Validation

Learn how to validate your training using K-Fold Cross Validation. The demonstration will use 10-Fold cross validation. A common use of K-Fold cross validation.

Lecture 9 5x2 Cross Validation

Another method of validation you can add to your tool belt. You will learn how to achieve this type of validation using Python.

Lecture 10 Bootstrap

Sometimes you may not have enough data to create a proper model. Bootstrapping is one method you can utilize to attack this problem.

Quiz 2 Validation Quiz

Answer different questions pertaining to model validation.

Resource 1 Validation Lecture Demonstrations

The Jupyter notebooks used in the demonstrations for the validation demonstrations.

Quiz 3 Validation Homework

Perform 10-Fold cross validation on the HR data set and Leave One Out Cross Validation for the HR data set already imported. Which type of validation provides better performance during training?

Module 3: Unsupervised Learning

Lecture 11 Unsupervised Learning Assessment Introduction

You will be introduced to the metrics and methods used to evaluate unsupervised learning models.

Lecture 12 Gap Analysis

Learn how the distance properties of cluster nodes can be a sign of how well you are clustering giving the correct hyperparameters.

Lecture 13 Silhouette Coefficient

Learn how to calculate this metric in Python that can additionally be used to determine how well you are separating your data types into different clusters.

Lecture 14 Perplexity

Using this measure you can evaluate your unsupervised learning model in a manner similar to validation of supervised learning models. This lecture uses a Latent Dirichlet Allocation (LDA) model for the demonstration.

Quiz 4 Unsupervised Learning Model Assessment

Test your knowledge on assessing unsupervised learning models.

Resource 2 Unsupervised Learning Assessment Demonstrations

The Jupyter notebooks with the code for the unsupervised learning assessment demonstrations.

Quiz 5 Unsupervised Learning Homework

For the diabetes data found in the Jupyter notebook, calculate the silhouette scores. What value for k would you recommend?

Module 4: Supervised Learning

Lecture 15 Supervised Learning Overview

These are the assessment methods that will be discussed for supervised learning methods in this model.

Lecture 16 Assessing Regression Models and Aiken Information Criteria (AIC)

You will get an overview the metrics that you can use to evaluate regression models. In this lecture we will begin with a demonstration of the Aiken Information Criteria (AIC).

Lecture 17 Bayesian Information Criterion (BIC)

Although similar to the AIC, you learn the difference between the AIC and BIC. You also learn how to calculate this metric using Python.

Lecture 18 Minimum Description Length (MDL)

Find the least complex regression model by translating your model complexity into a length.

Lecture 19 Confusion Matrix

Create a visual representation of your models performance results using tabular data.

Lecture 20 Prediction Accuracy (ACC)

Probably the most common method of determining how well your supervised learning model performs. Learn to calculate this by hand in Python using the confusion matrix data.

Lecture 21 Sensitivity

Also know as the true positive rate. Learn how to calculate the metric that will give you some insight on how well your supervised learning model predicts the expected true values.

Lecture 22 Specificity

Also know as the true negative rate. Learn how to calculate the metric that will give you some insight on how well your supervised learning model predicts the "False" class.

Lecture 23 Precision

We demonstrate how to calculate the positive predicted value. This will give you a measure that pertains to predicted true values.

Lecture 24 F1 Score

Also known as the harmonic mean. In addition to the prediction accuracy, this will indicate how well you model performs. In this lecture you will be able to calculate the metric with the confusion matrix values.

Lecture 25 Receiver Operator Characteristic (ROC) Curve

You will learn how to create this powerful visualization of your classifier performance. In conjunction with the confusion matrix this tool will be a necessity in evaluating your classifier's performance. Plotting the curve is one of the tasks you will be performing in this lecture.

Lecture 26 Cohen's Kappa Statistic

This is an additional metric will help you evaluate how well you classifier compares to the observed/labelled data. In this demonstration you will be able to calculate this metric by hand.

Lecture 27 Matthew's Correlation Coefficient

Learn how skewed data can make for a deceiving prediction accuracy. This metric will take imbalanced data sets into consideration. You will be able to calculate this metric using Python in this lecture.

Quiz 6 Supervised Learning Quiz

Test your skills on the metrics and tools used for the evaluation of regression and classifier models.

Resource 3 Supervised Learning Assessment Demonstrations

The supervised learning Jupyter notebook demonstrations.

Quiz 7 Supervised Learning Homework

Create a confusion matrix for the Diabetes data in the attached homework for a Random Forest Classifier. In addition calculate the Matthew's Correlation Coefficient and Prediction Accuracy. Calculate the Prediction Accuracy for the included SVM. Compare the metrics between the SVM and the Random Forest Classifier. Using your judgment, which model performs better?