Machine Learning and Prediction (Causal ML)

Professor: Jeff Thurk // jeff.thurk@uga.edu

Meetings: TTh 9:35 to 10:50, 11:10 to 12:25 (321 Correll Hall)

Office hours: W 3:00-5:00 (office) F 11:00-12:00 (Zoom — see syllabus)

Schedule

The following outlines the topics we will cover this semester. Yes, this is a lot of material. Buckle up.

Readings: “MG“ refers to the book “Introduction to Machine Learning with Python” by Andreas Muller and Sarah Guido. “Chollet” refers to the book “Deep Learning with Python” by Francois Chollet.

File availability: Please download all data (on eLC) and jupyter notebooks (.ipynb files) prior to class.


Lecture 1: Class Vision and Objectives

Topics: Discuss Syllabus, OLS Review, and the Bias-Variance Trade-Off

Reading: Syllabus, MG: Chapter 2.2, Chollet: Chapters 4.2 and 4.4

Lecture Notebook: Bias-Variance [MY SOLUTIONS]

At-home Notebook: Effective Visualizations


Lecture 2: Forecasting [MY SOLUTIONS]

Topics: Trends, Stationarity, AR, ARIMA, Forecasting

Reading: MG: Chapter 3.4.1

At-home Notebook: API, Working with Time-Series Data


Lecture 3: Supervised Learning: Linear Penalization [MY SOLUTIONS]

Topics: LASSO, Ridge, Elastic Net. Model Tuning via Cross-Validation

Reading: MG: Chapter 2.3.3

At-home Notebook: None.


Lecture 4: Supervised Learning: Classification [MY SOLUTIONS]

Topics: Logit, Naïve Bayes

Reading: MG: Chapters 2.3 and 2.4

At-home Notebook: None.


Lecture 5: Supervised Learning: Decision Trees [MY SOLUTIONS]

Topics: Decision Trees

Reading: MG: Chapters 2.3.5

At-home Notebook: None.


Lecture 6: Supervised Learning: Random Forests [MY SOLUTIONS]

Topics: Random Forests, Support Vector Machines (SVM)

Reading: MG: Chapters 2.3.6, 2.3.7, and 2.4

At-home Notebook: Support Vector Machines.


Lecture 7: Causality: Synthetic Controls [MY SOLUTIONS]

Topics: IV and Synthetic Controls

Reading: None.

At-home Notebook: None.


Lecture 8: Causality: Double ML [MY SOLUTIONS]

Topics: Pretesting bias, Regularization bias, Post-double selection, Double/de-biased ML

Reading: None.

At-home Notebook: None.


Lecture 9: Unsupervised Learning: Clustering [MY SOLUTIONS]

Topics: k-means and Hierarchical Clustering

Reading: MG: Chapter 3.5

At-home Notebook: Histogram of Oriented Gradients (HOG)


Lecture 10: Unsupervised Learning: Principal Component Analysis [MY SOLUTIONS]

Topics: Principal Component Analysis (PCA)

Reading: MG: Chapter 7

At-home Notebook: None.


Lecture 11: Text as Data: Natural Language Processing [MY SOLUTIONS]

Topics: Natural Language Processing (NLP)

Reading: Chollet: Chapter 2; MG: Chapter 7

At-home Notebook: None.


Lecture 12: Text as Data: Sentiment Analysis [MY SOLUTIONS]

Topics: Sentiment Analysis

Reading: Chollet: Chapter 2; MG: Chapter 7

At-home Notebook: None.


Lecture 13: Deep Learning: Long Short-Term Memory [MY SOLUTIONS]

Topics: Recurrent Neural Nets (RNN), Long Short-Term Memory (LSTM)

Reading: Chollet: Chapter 3

At-home Notebook: Deep Learning Mathematical Building Blocks, Tuning Neural Networks


Lecture 14: Deep Learning: Embeddings

Topics: Embeddings

Reading: None.

At-home Notebook: None.


Final Exam: Kaggle Contest

Details: The final exam will take the form of a Kaggle contest where I will announce a business problem for the teams to solve. Students will be given training and testing data sets which they will use to develop their own predictive model to address the business problem. I will evaluate each team’s submission based on how “elegant” their model is (i.e., how well the team incorporated what we've done in class), the quality (and clarity) of the business deliverable which describes your model, and the model’s ability to accurately predict outcomes in the withheld data.

Competition Details: Exam

Release Date: December 1, 2022.

Due Date: December 13, 2022.

Results: To be Published