Data Science for Business and Economics

Professor: Jeff Thurk // jeff.thurk@uga.edu

Meetings: Tuesdays and Thursdays (8:00 to 9:15 EST) in C006 Benson Hall

Office hours: Fridays 9:00 to 11:00 Eastern via Zoom. Link located in the syllabus.

Schedule

The following outlines the topics we will cover this semester. Yes, this is a lot of material. Buckle up.

Readings: 

“McKinney” refers to the book Python for Data Analysis (Second edition) by Wes McKinney.

“MG“ refers to the book “Introduction to Machine Learning with Python” by Andreas Muller and Sarah Guido.

File availability: Please download Jupyter notebooks (.ipynb) prior to each class. All data files are located on eLC.


Class 1: Class Vision and Objectives

Topics: Using data to communicate ideas // Class Overview // Install Python via Anaconda

Reading: Syllabus // Data Visualization Examples: Gapminder


Class 2: Jupyter Notebooks and Markdown

Details: Jupyter notebooks, pip, executing and debugging python code

Reading: Markdown Cheat sheet // McKinney Ch. 2.2 parts: "Running the Jupyter Notebook," "Tab Completion," "Introspection"


Class 3: Data Types

Details: Common object types: float, integer, and strings.

New Python Packages: numpy

Reading: McKinney Ch. 2.3 up to "Control Flow." Skip "Duck Typing" and "Bytes and Unicode"


Class 4: Conditionals and Loops [MY SOLUTIONS]

Details: Booleans, if-then statements, and loops.

New Python Packages: None.

Reading: McKinney Ch 3.1


Class 5: Lists

Details: Lists, list comprehensions, tuples, dictionaries (dicts).

New Python Packages: None.

Reading: McKinney Ch 3.1


Class 6: Slicing and user-generated functions [MY SOLUTIONS]

Topics: Slicing and user-generated functions.

New Python Packages: None.

Reading: McKinney Ch 3.2


Class 7: Panel Data [MY SOLUTIONS]

Details: DataFrames and calculations.

New Python Packages: Pandas

Reading: McKinney Ch 5 & 6.1


Class 8: Input / Output [MY SOLUTIONS]

Details: Loading and saving data.

New Python Packages: os

Reading: McKinney Ch 9.1


Class 9: Data Visualization [MY SOLUTIONS]

Details: Figures and axes, line plots, scatter plots, and histograms. Creating effective visualizations.

New Python Packages: Matplotlib, Seaborn

Reading: McKinney Ch 9.1


Class 10: Time-Series [MY SOLUTIONS]

Details: Datetime types. Resampling and plotting with time-stamped data.

New Python Packages: Datetime

Reading: McKinney Ch 11


Class 11: Reshaping Data [MY SOLUTIONS]

Details: MultiIndex, stack, unstack, pivot, and melt.

New Python Packages: None.

Reading: McKinney Ch 8.1, 8.3


Class 12: Collapsing Data [MY SOLUTIONS]

Details: Groupby and transform

New Python Packages: None.

Reading: McKinney Ch 10.1 & 10.2


Class 13: Merging Data Sets [MY SOLUTIONS]

Details: Merging DataFrames.

New Python Packages: None.

Reading: McKinney Ch 8.2


Class 14: Cleaning Data [MY SOLUTIONS]

Details: Replace, map, apply, applymap, unique, strip, cut, and qcut.

New Python Packages: None.

Reading: McKinney Ch 7


Class 15: Regular Expressions [MY SOLUTIONS]

Details: Searching and matching strings.

New Python Packages: Re

Reading: McKinney Ch 7.3


Class 16: Maps [MY SOLUTIONS]

Details: Plotting geospatial data.

New Python Packages: Geopandas.

Reading: None.


Class 17: Chloropleths [MY SOLUTIONS]

Details: Plotting geospatial data.

New Python Packages: None.

Reading: None.


Class 18: Webscraping [MY SOLUTIONS]

Details: Extracting data from internet webpages.

New Python Packages: Requests, BeautifulSoup

Reading: None.


Class 19: Application Programming Interfaces (APIs) [MY SOLUTIONS]

Details: APIs

New Python Packages: Pandas Datareader

Reading: McKinney Ch 6.3


Class 20: Econometrics: Linear Regression [MY SOLUTIONS]

Details: Ordinary Least Squares (OLS)

New Python Packages: Patsy, Statsmodels

Reading: McKinney 13.3


Class 21: Econometrics: Discrete Regression [MY SOLUTIONS]

Details: Linear probability, probit, logit

New Python Packages: None.

Reading: McKinney 13.3


Class 22: Machine Learning: Model Selection [MY SOLUTIONS]

Details: Bias versus Variance: Understanding the trade-off between simple vs complex models.

New Python Packages: None.

Reading: MG Ch 2.2


Class 23: Supervised Machine Learning: Linear Penalization [MY SOLUTIONS]

Details: Regularization: Ridge, Least Absolute Shrinkage and Selection Operator (LASSO), and Elastic-Net regression.

New Python Packages: Scikit-learn.

Reading: MG Ch 2.2


Class 24: Supervised Machine Learning: Classification [MY SOLUTIONS]

Details: Logit

New Python Packages: None.

Reading: MG Ch 2.3 & 2.4


Class 25: Unsupervised Machine Learning: Clustering and Dimension Reduction [MY SOLUTIONS]

Details: Kmeans, Principal Component Analysis (PCA)

New Python Packages: None.

Reading: MG Ch 3.4.1 & 3.5.1


Class 26: Machine Learning with Text Data

Details: Natural Language Processing, Bag of Words, Term Frequency-Inverse Document Frequency (TF-IDF)

New Python Packages: nltk, wordcloud, autocorrect

Reading: MG Ch 7