Course Description
This course will cover the model assessment of unsupervised and supervised learning models. You will start with the basics and learn terms pertaining to model assessment. Model validation and assessment during training will be covered. In addition assessing and evaluating unsupervised learning model methods will be taught. Finally, you learn different metrics for determining if you have a winning supervised learning model.
Furthermore, all the methods and metrics discussed in this course are accompanied by demonstrations using Python 3 and easy to execute Jupyter notebooks.
What am I going to get from this course?
Choose the right supervised or unsupervised model for their problem using the metrics and methods taught in the course.
Prerequisites and Target Audience
What will students need to know or do before starting this course?
Students should be familiar with Python 3 and the packages SCI-Kit Learn, Numpy, and Pandas. This course also utilizes Jupyter notebooks so you should be able to load and run notebooks.
Who should take this course? Who should not?
This course should be taken if you want to gain additional skills that go beyond simply fitting a model and viewing the model score using the simple fit and score of machine learning packages. You will learn how to calculate those scores yourself as well as calculate additional metrics for evaluation. Learning those additional skills will give you better insight when evaluating and selecting a model.
If you do not need to constantly compare different types of models or different regression models using different parameters this course may not be for you. Also, if you do not need to get insight during the training of your supervised learning models you can pass on this course.
Curriculum
Module 1: Introduction to Model Assessment
An introduction to the instructor and an overview of the course.
Lecture 2
Overview of Model Assessment
Overview of what to expect from module one. You will know why model assessment is important.
Lecture 3
Supervised Learning vs. Unsupervised Learning
Although you may already know the difference between supervised and unsupervised learning, this lecture will refresh your memory by discussing types of models used for each learning methodology.
Learn how to recognize bias in your machine learning models.
Learn the signs of variance and what it means to have over fitting in your model.
Quiz 1
Model Assessment Overview Quiz
Quiz yourself on the model assessment terms learned in this module.
Lecture 6
Reason for Validation
Understand the reasons for validating your model. Also, you will be introduced to some common methods of validating your model.
Lecture 7
Leave One Out Cross Validation
Learn the most basic type of validation and the benefits of splitting your data in two.
Lecture 8
K-Fold Cross Validation
Learn how to validate your training using K-Fold Cross Validation. The demonstration will use 10-Fold cross validation. A common use of K-Fold cross validation.
Lecture 9
5x2 Cross Validation
Another method of validation you can add to your tool belt. You will learn how to achieve this type of validation using Python.
Sometimes you may not have enough data to create a proper model. Bootstrapping is one method you can utilize to attack this problem.
Answer different questions pertaining to model validation.
Resource 1
Validation Lecture Demonstrations
The Jupyter notebooks used in the demonstrations for the validation demonstrations.
Quiz 3
Validation Homework
Perform 10-Fold cross validation on the HR data set and Leave One Out Cross Validation for the HR data set already imported. Which type of validation provides better performance during training?
Module 3: Unsupervised Learning
Lecture 11
Unsupervised Learning Assessment Introduction
You will be introduced to the metrics and methods used to evaluate unsupervised learning models.
Learn how the distance properties of cluster nodes can be a sign of how well you are clustering giving the correct hyperparameters.
Lecture 13
Silhouette Coefficient
Learn how to calculate this metric in Python that can additionally be used to determine how well you are separating your data types into different clusters.
Using this measure you can evaluate your unsupervised learning model in a manner similar to validation of supervised learning models. This lecture uses a Latent Dirichlet Allocation (LDA) model for the demonstration.
Quiz 4
Unsupervised Learning Model Assessment
Test your knowledge on assessing unsupervised learning models.
Resource 2
Unsupervised Learning Assessment Demonstrations
The Jupyter notebooks with the code for the unsupervised learning assessment demonstrations.
Quiz 5
Unsupervised Learning Homework
For the diabetes data found in the Jupyter notebook, calculate the silhouette scores. What value for k would you recommend?
Module 4: Supervised Learning
Lecture 15
Supervised Learning Overview
These are the assessment methods that will be discussed for supervised learning methods in this model.
Lecture 16
Assessing Regression Models and Aiken Information Criteria (AIC)
You will get an overview the metrics that you can use to evaluate regression models. In this lecture we will begin with a demonstration of the Aiken Information Criteria (AIC).
Lecture 17
Bayesian Information Criterion (BIC)
Although similar to the AIC, you learn the difference between the AIC and BIC. You also learn how to calculate this metric using Python.
Lecture 18
Minimum Description Length (MDL)
Find the least complex regression model by translating your model complexity into a length.
Lecture 19
Confusion Matrix
Create a visual representation of your models performance results using tabular data.
Lecture 20
Prediction Accuracy (ACC)
Probably the most common method of determining how well your supervised learning model performs. Learn to calculate this by hand in Python using the confusion matrix data.
Also know as the true positive rate. Learn how to calculate the metric that will give you some insight on how well your supervised learning model predicts the expected true values.
Also know as the true negative rate. Learn how to calculate the metric that will give you some insight on how well your supervised learning model predicts the "False" class.
We demonstrate how to calculate the positive predicted value. This will give you a measure that pertains to predicted true values.
Also known as the harmonic mean. In addition to the prediction accuracy, this will indicate how well you model performs. In this lecture you will be able to calculate the metric with the confusion matrix values.
Lecture 25
Receiver Operator Characteristic (ROC) Curve
You will learn how to create this powerful visualization of your classifier performance. In conjunction with the confusion matrix this tool will be a necessity in evaluating your classifier's performance. Plotting the curve is one of the tasks you will be performing in this lecture.
Lecture 26
Cohen's Kappa Statistic
This is an additional metric will help you evaluate how well you classifier compares to the observed/labelled data. In this demonstration you will be able to calculate this metric by hand.
Lecture 27
Matthew's Correlation Coefficient
Learn how skewed data can make for a deceiving prediction accuracy. This metric will take imbalanced data sets into consideration. You will be able to calculate this metric using Python in this lecture.
Quiz 6
Supervised Learning Quiz
Test your skills on the metrics and tools used for the evaluation of regression and classifier models.
Resource 3
Supervised Learning Assessment Demonstrations
The supervised learning Jupyter notebook demonstrations.
Quiz 7
Supervised Learning Homework
Create a confusion matrix for the Diabetes data in the attached homework for a Random Forest Classifier. In addition calculate the Matthew's Correlation Coefficient and Prediction Accuracy. Calculate the Prediction Accuracy for the included SVM. Compare the metrics between the SVM and the Random Forest Classifier. Using your judgment, which model performs better?