Machine Learning and Prediction (Causal ML)
Professor: Jeff Thurk // jeff.thurk@uga.edu
Meetings: TTh 9:35 to 10:50, 11:10 to 12:25 (321 Correll Hall)
Office hours: W 3:00-5:00 (office) F 11:00-12:00 (Zoom — see syllabus)
Schedule
The following outlines the topics we will cover this semester. Yes, this is a lot of material. Buckle up.
Readings: “MG“ refers to the book “Introduction to Machine Learning with Python” by Andreas Muller and Sarah Guido. “Chollet” refers to the book “Deep Learning with Python” by Francois Chollet.
File availability: Please download all data (on eLC) and jupyter notebooks (.ipynb files) prior to class.
Lecture 1: Class Vision and Objectives
Topics: Discuss Syllabus, OLS Review, and the Bias-Variance Trade-Off
Reading: Syllabus, MG: Chapter 2.2, Chollet: Chapters 4.2 and 4.4
Lecture Notebook: Bias-Variance [MY SOLUTIONS]
At-home Notebook: Effective Visualizations
Lecture 2: Forecasting [MY SOLUTIONS]
Topics: Trends, Stationarity, AR, ARIMA, Forecasting
Reading: MG: Chapter 3.4.1
At-home Notebook: API, Working with Time-Series Data
Lecture 3: Supervised Learning: Linear Penalization [MY SOLUTIONS]
Topics: LASSO, Ridge, Elastic Net. Model Tuning via Cross-Validation
Reading: MG: Chapter 2.3.3
At-home Notebook: None.
Lecture 4: Supervised Learning: Classification [MY SOLUTIONS]
Topics: Logit, Naïve Bayes
Reading: MG: Chapters 2.3 and 2.4
At-home Notebook: None.
Lecture 5: Supervised Learning: Decision Trees [MY SOLUTIONS]
Topics: Decision Trees
Reading: MG: Chapters 2.3.5
At-home Notebook: None.
Lecture 6: Supervised Learning: Random Forests [MY SOLUTIONS]
Topics: Random Forests, Support Vector Machines (SVM)
Reading: MG: Chapters 2.3.6, 2.3.7, and 2.4
At-home Notebook: Support Vector Machines.
Lecture 7: Causality: Synthetic Controls [MY SOLUTIONS]
Topics: IV and Synthetic Controls
Reading: None.
At-home Notebook: None.
Lecture 8: Causality: Double ML [MY SOLUTIONS]
Topics: Pretesting bias, Regularization bias, Post-double selection, Double/de-biased ML
Reading: None.
At-home Notebook: None.
Lecture 9: Unsupervised Learning: Clustering [MY SOLUTIONS]
Topics: k-means and Hierarchical Clustering
Reading: MG: Chapter 3.5
At-home Notebook: Histogram of Oriented Gradients (HOG)
Lecture 10: Unsupervised Learning: Principal Component Analysis [MY SOLUTIONS]
Topics: Principal Component Analysis (PCA)
Reading: MG: Chapter 7
At-home Notebook: None.
Lecture 11: Text as Data: Natural Language Processing [MY SOLUTIONS]
Topics: Natural Language Processing (NLP)
Reading: Chollet: Chapter 2; MG: Chapter 7
At-home Notebook: None.
Lecture 12: Text as Data: Sentiment Analysis [MY SOLUTIONS]
Topics: Sentiment Analysis
Reading: Chollet: Chapter 2; MG: Chapter 7
At-home Notebook: None.
Lecture 13: Deep Learning: Long Short-Term Memory [MY SOLUTIONS]
Topics: Recurrent Neural Nets (RNN), Long Short-Term Memory (LSTM)
Reading: Chollet: Chapter 3
At-home Notebook: Deep Learning Mathematical Building Blocks, Tuning Neural Networks
Final Exam: Kaggle Contest
Details: The final exam will take the form of a Kaggle contest where I will announce a business problem for the teams to solve. Students will be given training and testing data sets which they will use to develop their own predictive model to address the business problem. I will evaluate each team’s submission based on how “elegant” their model is (i.e., how well the team incorporated what we've done in class), the quality (and clarity) of the business deliverable which describes your model, and the model’s ability to accurately predict outcomes in the withheld data.
Competition Details: Exam
Release Date: December 1, 2022.
Due Date: December 13, 2022.
Results: To be Published