$129.00
Certification

Industry recognized certification enables you to add this credential to your resume upon completion of all courses

Need Custom Training for Your Team?
Get Quote
Call Us

Toll Free (844) 397-3739

Inquire About This Course
Instructor
Dr. Rich Huebner, Instructor - Data Pre-Processing

Dr. Rich Huebner

Has 25 years experience with data design, data architecture, and analytics. He worked for large multinational organizations in healthcare, medical devices, and education; he's also an adjunct faculty member in multiple Universities. He holds two graduate degrees in Information Systems & Management with a Ph.D. in IT.

Instructor: Dr. Rich Huebner

Tidy Your Data Before Using It in Machine Learning Algorithms

    • Understand what data preprocessing is and why it is needed as part of an overall data science and machine learning methodology.
    • Be able to summarize your data by using some statistics and data visualization.
    • Instructor has 25 years experience with data design, data architecture, and analytics. He holds two graduate degrees in Information Systems & Management with a Ph.D. in IT.

Duration: 2h 04m

Course Description

We know that data is very messy and comes in a variety of form. As part of the overall data mining and machine learning process, we must take the time to preprocess our data. This means we must ensure that it is structured, cleansed, and address any problems that the data may have. Preprocessing the data includes gaining a better understanding of the data through descriptive statistics and data visualization techniques. It also includes ensuring that missing data or outliers are handled accordingly.

What am I going to get from this course?

  • Understand what data preprocessing is and why it is needed as part of an overall data science and machine learning methodology
  • Review and understand data quality issues and how to address them
  • Apply specific Python functions to assist in cleansing and transforming your data
  • Be able to summarize your data by using some statistics and data visualization.

Prerequisites and Target Audience

What will students need to know or do before starting this course?

Programming Knowledge in Python
  • Lists, variables, loops, etc.
Basic Statistics Knowledge
  • Inferential and Descriptive Statistics
Python loaded onto your computer.
  • I use Spyder IDE and the Anaconda distribution.
  • I have Python 3.6.1 on my machine, so any version greater than 3.6 will work.

Who should take this course? Who should not?

Individuals with basic Python & statistics knowledge can take this course.

Curriculum

Module 1: Introduction to Data Preprocessing

Lecture 1 What is data preprocessing?
Lecture 2 What is dirty data?
Lecture 3 Structuring Data
Lecture 4 Overview of Data Cleansing

Module 2: Data Quality

Lecture 5 Data Quality
Lecture 6 Data Quality Challenges
Lecture 7 Raw Files and File Formats
Lecture 8 Structured Data
Lecture 9 Finding Data Sets
Lecture 10 Loading Data into Python
Lecture 11 Loading Data Into Python Part 2

Module 3: Summarizing Data with StatisticsModule...

Lecture 12 Review of Basic Statistics
Lecture 13 Summarizing Data with Python

Module 4: Data Visualization

Lecture 14 Introduction to Data Visualization
Lecture 15 EDA and CDA
Lecture 16 Creating a Histogram
Lecture 17 Box Plots
Lecture 18 Bar Graphs
Lecture 19 Other Graphs

Module 5: Data Cleansing

Lecture 20 Missing Data Part 1
Lecture 21 Missing Data Part 2
Lecture 22 Outlier Detection Part 1
Lecture 23 High-Dimensional Data
Lecture 24 Outlier Detection Part 2

Module 6: Feature Scaling

Lecture 25 Introduction to Feature Scaling
Lecture 26 Final Thoughts