Data Science for Business and Economics
Professor: Jeff Thurk // jeff.thurk@uga.edu
Meetings: Tuesdays and Thursdays (8:00 to 9:15 EST) in C006 Benson Hall
Office hours: Fridays 9:00 to 11:00 Eastern via Zoom. Link located in the syllabus.
Schedule
The following outlines the topics we will cover this semester. Yes, this is a lot of material. Buckle up.
Readings:
“McKinney” refers to the book Python for Data Analysis (Second edition) by Wes McKinney.
“MG“ refers to the book “Introduction to Machine Learning with Python” by Andreas Muller and Sarah Guido.
File availability: Please download Jupyter notebooks (.ipynb) prior to each class. All data files are located on eLC.
Class 2: Jupyter Notebooks and Markdown
Details: Jupyter notebooks, pip, executing and debugging python code
Reading: Markdown Cheat sheet // McKinney Ch. 2.2 parts: "Running the Jupyter Notebook," "Tab Completion," "Introspection"
Class 3: Data Types
Details: Common object types: float, integer, and strings.
New Python Packages: numpy
Reading: McKinney Ch. 2.3 up to "Control Flow." Skip "Duck Typing" and "Bytes and Unicode"
Class 4: Conditionals and Loops [MY SOLUTIONS]
Details: Booleans, if-then statements, and loops.
New Python Packages: None.
Reading: McKinney Ch 3.1
Class 5: Lists
Details: Lists, list comprehensions, tuples, dictionaries (dicts).
New Python Packages: None.
Reading: McKinney Ch 3.1
Class 6: Slicing and user-generated functions [MY SOLUTIONS]
Topics: Slicing and user-generated functions.
New Python Packages: None.
Reading: McKinney Ch 3.2
Class 7: Panel Data [MY SOLUTIONS]
Details: DataFrames and calculations.
New Python Packages: Pandas
Reading: McKinney Ch 5 & 6.1
Class 8: Input / Output [MY SOLUTIONS]
Details: Loading and saving data.
New Python Packages: os
Reading: McKinney Ch 9.1
Class 9: Data Visualization [MY SOLUTIONS]
Details: Figures and axes, line plots, scatter plots, and histograms. Creating effective visualizations.
New Python Packages: Matplotlib, Seaborn
Reading: McKinney Ch 9.1
Class 10: Time-Series [MY SOLUTIONS]
Details: Datetime types. Resampling and plotting with time-stamped data.
New Python Packages: Datetime
Reading: McKinney Ch 11
Class 11: Reshaping Data [MY SOLUTIONS]
Details: MultiIndex, stack, unstack, pivot, and melt.
New Python Packages: None.
Reading: McKinney Ch 8.1, 8.3
Class 12: Collapsing Data [MY SOLUTIONS]
Details: Groupby and transform
New Python Packages: None.
Reading: McKinney Ch 10.1 & 10.2
Class 13: Merging Data Sets [MY SOLUTIONS]
Details: Merging DataFrames.
New Python Packages: None.
Reading: McKinney Ch 8.2
Class 14: Cleaning Data [MY SOLUTIONS]
Details: Replace, map, apply, applymap, unique, strip, cut, and qcut.
New Python Packages: None.
Reading: McKinney Ch 7
Class 15: Regular Expressions [MY SOLUTIONS]
Details: Searching and matching strings.
New Python Packages: Re
Reading: McKinney Ch 7.3
Class 16: Maps [MY SOLUTIONS]
Details: Plotting geospatial data.
New Python Packages: Geopandas.
Reading: None.
Class 17: Chloropleths [MY SOLUTIONS]
Details: Plotting geospatial data.
New Python Packages: None.
Reading: None.
Class 18: Webscraping [MY SOLUTIONS]
Details: Extracting data from internet webpages.
New Python Packages: Requests, BeautifulSoup
Reading: None.
Class 19: Application Programming Interfaces (APIs) [MY SOLUTIONS]
Details: APIs
New Python Packages: Pandas Datareader
Reading: McKinney Ch 6.3
Class 20: Econometrics: Linear Regression [MY SOLUTIONS]
Details: Ordinary Least Squares (OLS)
New Python Packages: Patsy, Statsmodels
Reading: McKinney 13.3
Class 21: Econometrics: Discrete Regression [MY SOLUTIONS]
Details: Linear probability, probit, logit
New Python Packages: None.
Reading: McKinney 13.3
Class 22: Machine Learning: Model Selection [MY SOLUTIONS]
Details: Bias versus Variance: Understanding the trade-off between simple vs complex models.
New Python Packages: None.
Reading: MG Ch 2.2
Class 23: Supervised Machine Learning: Linear Penalization [MY SOLUTIONS]
Details: Regularization: Ridge, Least Absolute Shrinkage and Selection Operator (LASSO), and Elastic-Net regression.
New Python Packages: Scikit-learn.
Reading: MG Ch 2.2
Class 24: Supervised Machine Learning: Classification [MY SOLUTIONS]
Details: Logit
New Python Packages: None.
Reading: MG Ch 2.3 & 2.4
Class 25: Unsupervised Machine Learning: Clustering and Dimension Reduction [MY SOLUTIONS]
Details: Kmeans, Principal Component Analysis (PCA)
New Python Packages: None.
Reading: MG Ch 3.4.1 & 3.5.1
Class 26: Machine Learning with Text Data
Details: Natural Language Processing, Bag of Words, Term Frequency-Inverse Document Frequency (TF-IDF)
New Python Packages: nltk, wordcloud, autocorrect
Reading: MG Ch 7