Course Description
This is an introductory course and assumes no background in natural language processing. Exposure to math is kept to a minimum and used only when it is needed to reinforce learning. The course is loaded with Python code snippets and exercises throughout and ends with an exciting final project on natural language generation.
Instructor holds a Ph.D. in Electrical and Computer Engineering, with a focus on machine learning and data analysis.
What am I going to get from this course?
- Preprocess and clean up text data
- Extract features from text data to be used in downstream tasks
- Perform some of the most common NLP tasks, such as text classification and topic modeling
- Make a machine generate text that sounds like your favourite celebrity or book
Prerequisites and Target Audience
What will students need to know or do before starting this course?
- Familiarity with the Python programming language is required
- Students will benefit from prior exposure to machine learning and probability
Who should take this course? Who should not?
Industry professionals and college students who are interested in a broad overview of how machines understand human language.
Curriculum
Definition of NLP, sample NLP applications, and associated challenges
Setting up your development environment and introduction to NLTK
Introduction to Project Gutenberg, which provides a repository of copyright-free books that we will use throughout the course.
Preliminary analyses, such as counting occurrences of words in a document, generating frequency distributions of words, and creating word clouds
Quiz 1
Finding the most common word in Hamlet
Module 3: Data Preprocessing
Lecture 5
Preprocessing I
Case normalization, removing punctuation, and tokenization
Lecture 6
Preprocessing II
Lexicon normalization and stop word removal
Quiz 2
Finding the most common word in Moby Dick, ignoring stop words and punctuation
Module 4: Feature Engineering
Lecture 7
Feature Engineering I
Part of speech tagging and named entity recognition
Lecture 8
Feature Engineering II
Lecture 9
Feature Engineering III
Quiz 3
Determining POS tags
Quiz 5
Word arithmetic with embeddings
Lecture 10
Topic Modeling I
Latent semantic analysis (LSA)
Lecture 11
Topic Modeling II
Latent Dirichlet allocation
Quiz 6
Latent Dirichlet allocation
Quiz 7
Number of topics extracted
Module 6: Text Classification
Lecture 12
Text Classification I
Preparing a dataset for text classification
Lecture 13
Text Classification II
Feature extraction for text classification
Lecture 14
Text Classification III
Training a machine learning model for text classification
Quiz 8
Sentiment analysis on tweets
Quiz 9
Word embeddings vs word count vectors vs TF-IDF vectors
Quiz 10
Train and test sets
Module 7: Language Modeling
Lecture 15
Language Modeling
Markov models for language modeling and final project description