Lecture 9: Visualizing Data with Matplotlib [SUGGESTED SOLUTIONS]
We have a handle on python now: we understand the data structures and enough about working with them to move on to stuff more directly relevant to data analysis. We know how to get data into Pandas from files, how to manipulate DataFrames and how to do basic statistics.
Let's get started on making figures, arguably the best way to convey information about our data.
Today, we will cover:
Class Announcements
PS2 (due 9/18) is on eLC.
Matplotlib is a very popular package that bundles tools for creating visualizations. The documentation is here. We will look at some specific plot types in class, but you can learn about many different types thumbnail gallery. [Warning: not all the figures in the thumbnail gallery are good figures.]
We start in the usual way by loading packages.
import pandas as pd #load the pandas package and call it pd
import matplotlib.pyplot as plt # load the pyplot set of tools from the package matplotlib. Name it plt for short.
And now let's go back to our principles of macroeconomics days and look at some national income account data.
import pandas as pd #load the pandas package and call it pd
import matplotlib.pyplot as plt # load the pyplot set of tools from the package matplotlib. Name it plt for short.
gdp = pd.read_csv('./Data/gdp_components_simple.csv', index_col=0) # load data from file, make date the index
print(gdp.head(2)) # print the first and last few rows to make sure all is well
print('\n', gdp.tail(2))
GDPA GPDIA GCEA EXPGSA IMPGSA DATE 1929 104.556 17.170 9.622 5.939 5.556 1930 92.160 11.428 10.273 4.444 4.121 GDPA GPDIA GCEA EXPGSA IMPGSA DATE 2016 18707.189 3169.887 3290.979 2217.576 2738.146 2017 19485.394 3367.965 3374.444 2350.175 2928.596
I don't like these variable names.
gdp.rename(columns = {'GDPA':'gdp', 'GPDIA':'inv', 'GCEA':'gov', 'EXPGSA':'ex', 'IMPGSA':'im' }, inplace=True)
Let's get plotting. matplotlib graphics are based around two new object types.
- The figure object: think of this as the canvas we will draw figures onto
- The axes object: think of this as the figure itself and all the components
To create a new figure, we call the subplots()
method of plt
. Notice the use of multiple assignment.
fig, ax = plt.subplots() # passing no arguments gets us one fig object and one axes object
plt.show() # tells jupyter to show the figure
print(type(fig))
print(type(ax))
<class 'matplotlib.figure.Figure'> <class 'matplotlib.axes._subplots.AxesSubplot'>
We apply methods to the axes to actually plot the data. Here is a scatter plot. [Try ax.
and hit TAB...]
fig, ax = plt.subplots()
ax.plot(gdp.index, gdp['gdp']) # scatter plot of gdp vs. time
plt.show() # tells jupyter to show the figure
First, note that the plot is a Line2D object. This is absolutely not important for us, but when you see jupyter print out <matplotlib.lines.Line2D at ...>
that is what it is telling us. Everything in python is an object.
Second, a scatter plot needs two columns of data, one for the x-coordinate and one for the y-coordinate. I am using gdp
for the y-coordinate and the years for the x-coordinate. I set years as the index variable, so to retrieve it I used the .index
attribute.
Third, this plot needs some work. I do not like this line color. More importantly, I am missing labels and a title. These are extremely important.
fig, ax = plt.subplots()
ax.plot(gdp.index, gdp['gdp'], # line plot of gdp vs. time
color='red' # set the line color to red
)
ax.set_ylabel('billions of dollars') # add the y-axis label
ax.set_xlabel('year') # add the x-axis label
ax.set_title('U.S. Gross Domestic Product')
plt.show() # tells jupyter to show the figure
This is looking pretty good. While I am a fanatic when it comes to labeling things, I probably wouldn't label the x-axis. You have to have some faith in the reader.
I also do not like 'boxing' my plots. There is a philosophy about visualizations that says: Every mark on your figure should convey information. If it does not, then it is clutter and should be removed. I am not sure who developed this philosophy (Marie Kondo?) but I think it is a useful benchmark.
fig, ax = plt.subplots()
ax.plot(gdp.index, gdp['gdp'], # line plot of gdp vs. time
color='red' # set the line color to red
)
ax.set_ylabel('billions of dollars') # add the y-axis label
# ax.set_xlabel('year') # add the x-axis label
ax.set_title('U.S. Gross Domestic Product')
ax.spines['right'].set_visible(False) # get ride of the line on the right
ax.spines['top'].set_visible(False) # get rid of the line on top
plt.show() # tells jupyter to show the figure
Practice: Line Plots ¶
Take a few minutes and try the following. Feel free to chat with those around if you get stuck.
- Copy the code from the last plot and add a second line that plots 'gov'. To this, just add a new line of code to the existing code.
ax.plot(gdp.index, gdp['gov'])
fig, ax = plt.subplots()
ax.plot(gdp.index, gdp['gdp'], # line plot of gdp vs. time
color='red' # set the line color to red
)
ax.plot(gdp.index, gdp['gov'], # line plot of gdp vs. time
color='blue', # set the line color to blue
alpha = 0.5,
linestyle = ':'
)
ax.set_ylabel('billions of dollars') # add the y-axis label
# ax.set_xlabel('year') # add the x-axis label
ax.set_title('U.S. Gross Domestic Product and Government Spending')
ax.spines['right'].set_visible(False) # get ride of the line on the right
ax.spines['top'].set_visible(False) # get rid of the line on top
plt.show() # tells jupyter to show the figure
- Modify your code to give the figure a better title
- Modify your code to make government consumption blue
- Modify your code to add the argument
alpha=0.5
to the plot method for gov. What does it change? If you want to learn more try 'alpha composite' in Google. - Modify your code to make the gov line dashed. Try the argument
linestyle='--'
. What is linestyle '-.' or ':' ?
A few more options to get us started¶
We have two lines on our figure. Which one is which? Not labeling our line is malpractice. Two approaches
- Add a legend
- Add text to the figure
Both are good options. I prefer the second for simple plots.
# The first option. Add labels to your plot commands, then call ax.legend.
fig, ax = plt.subplots()
ax.plot(gdp.index, gdp['gdp'], # line plot of gdp vs. time
color='red', # set the line color to red
label = 'GDP'
)
ax.plot(gdp.index, gdp['gov'], # line plot of gdp vs. time
color='blue', # set the line color to blue
alpha = 0.5,
linestyle = ':',
label = 'Gov. Spending'
)
ax.set_ylabel('billions of dollars') # add the y-axis label
# ax.set_xlabel('year') # add the x-axis label
ax.set_title('U.S. Gross Domestic Product and Government Spending')
ax.spines['right'].set_visible(False) # get ride of the line on the right
ax.spines['top'].set_visible(False) # get rid of the line on top
ax.legend(frameon=False) # Show the legend. frameon=False kills the box around the legend
plt.show() # tells jupyter to show the figure
Ah, I feel much better now that I know which line is which. Here is the second approach.
# The second option. Add text using the annotate method. Note that I can leave the labels in the plot commands.
fig, ax = plt.subplots()
ax.plot(gdp.index, gdp['gdp'], # line plot of gdp vs. time
color='red', # set the line color to red
label = 'GDP'
)
ax.plot(gdp.index, gdp['gov'], # line plot of gdp vs. time
color='blue', # set the line color to blue
alpha = 0.5,
linestyle = ':',
label = 'Gov. Spending'
)
ax.set_ylabel('billions of dollars') # add the y-axis label
# ax.set_xlabel('year') # add the x-axis label
ax.set_title('U.S. Gross Domestic Product and Government Spending')
ax.spines['right'].set_visible(False) # get ride of the line on the right
ax.spines['top'].set_visible(False) # get rid of the line on top
ax.text(1989, 8500, 'GDP') # text(x, y, string)
ax.text(1999, 4500, 'Gov. Spending') # text(x, y, string)
plt.show() # tells jupyter to show the figure
Getting plots out of your notebook¶
While I love jupyter notebooks, my research output is usually an article distributed as a pdf.
fig, ax = plt.subplots()
ax.plot(gdp.index, gdp['gdp'], # line plot of gdp vs. time
color='red', # set the line color to red
label = 'GDP'
)
ax.plot(gdp.index, gdp['gov'], # line plot of gdp vs. time
color='blue', # set the line color to blue
alpha = 0.5,
linestyle = ':',
label = 'Gov. Spending'
)
ax.set_ylabel('billions of dollars') # add the y-axis label
# ax.set_xlabel('year') # add the x-axis label
ax.set_title('U.S. Gross Domestic Product and Government Spending')
ax.spines['right'].set_visible(False) # get ride of the line on the right
ax.spines['top'].set_visible(False) # get rid of the line on top
ax.text(1989, 8500, 'GDP') # text(x, y, string)
ax.text(1999, 4500, 'Gov. Spending') # text(x, y, string)
plt.savefig('gdp.pdf', bbox_inches='tight') # Create a pdf and save to cwd
plt.savefig('../gdp.png') # Create a png and save to the folder that contains the cwd
plt.show() # tells jupyter to show the figure
When saving a pdf, I use the bbox_inches='tight'
argument to kill extra whitespace around the figure. You can also set things like orientation, dpi, and metadata. Check the documentation if you need to tweak your output.
# Create a histogram of gdp growth rates.
gdp['gdp_growth'] = gdp['gdp'].pct_change()*100 # pct_change() creates growth rates NOT percent change. Not a self-documenting name.
gdp.head()
gdp | inv | gov | ex | im | gdp_growth | |
---|---|---|---|---|---|---|
DATE | ||||||
1929 | 104.556 | 17.170 | 9.622 | 5.939 | 5.556 | NaN |
1930 | 92.160 | 11.428 | 10.273 | 4.444 | 4.121 | -11.855848 |
1931 | 77.391 | 6.549 | 10.169 | 2.906 | 2.905 | -16.025391 |
1932 | 59.522 | 1.819 | 8.946 | 1.975 | 1.932 | -23.089248 |
1933 | 57.154 | 2.276 | 8.875 | 1.987 | 1.929 | -3.978361 |
We could have used the diff()
or the shift()
methods to do something similar, but wow, pct_change is so luxe. A quick plot to take a look.
fig, ax = plt.subplots()
ax.plot(gdp.index, gdp['gdp_growth'], # line plot of gdp vs. time
color='red', # set the line color to red
label = 'GDP Growth'
)
ax.set_ylabel('percent growth') # add the y-axis label
ax.set_title('U.S. Gross Domestic Product Growth Rates')
ax.spines['right'].set_visible(False) # get ride of the line on the right
ax.spines['top'].set_visible(False) # get rid of the line on top
ax.axhline(y=0, color='black', linewidth=0.75) # Add a horizontal line at y=0
plt.show() # tells jupyter to show the figure
The great depression and the WWII buildup really stick out.
Notice that I added a line at zero. My thinking is that this line adds information: the reader can easily see that growth rates are mostly positive and that the great depression was really bad.
It is also obvous that the volitility of gdp has fallen over time, but let's approach a bit differently.
fig, ax = plt.subplots()
# hist does not like NaN. (I'm a bit surprised.) I use the dropna() method to kill off the missing value
ax.hist(gdp['gdp_growth'].dropna(), bins=20, color='red', alpha=0.75) # histogram of GDP growth rates
ax.set_ylabel('Frequency') # add the y-axis label
ax.set_xlabel('Annual growth rate (%)')
ax.set_title('Frequency of US GDP growth rates (1929-2017)')
ax.spines['right'].set_visible(False) # get ride of the line on the right
ax.spines['top'].set_visible(False) # get rid of the line on top
#ax.axhline(y=0, color='black', linewidth=0.75) # Add a horizontal line at y=0
plt.show() # tells jupyter to show the figure
Practice: Histograms ¶
Take a few minutes and try the following. Feel free to chat with those around if you get stuck.
- Break the data up into two periods: 1929-1985 and 1985-2017
- Compute the mean and the standard deviation for the gdp growth rate in each sample.
- Create a separate histogram for each sample. Make the early period historgram blue and the late historgram black. Make any changes to them that you deem appropriate.
- Use text() to add the mean and std to a blank area of the histograms.
- Save the two histograms as pdfs. Give them reasonable names.
Challenging. Can you find a way to store the value of the mean and std to a variable and print the variable out on the histogram? Redo part 4.
gdp_early = gdp[gdp.index <= 1986]
gdp_late = gdp[gdp.index > 1985]
avg_early = gdp_early['gdp_growth'].mean()
sd_early = gdp_early['gdp_growth'].std()
avg_late = gdp_late['gdp_growth'].mean()
sd_late = gdp_late['gdp_growth'].std()
print(avg_early, sd_early)
print(avg_late, sd_late)
7.195155329738043 8.32030467983293 4.822433250409485 1.886061415326792
fig, ax = plt.subplots()
# hist does not like NaN. (I'm a bit surprised.) I use the dropna() method to kill off the missing value
ax.hist(gdp_early['gdp_growth'].dropna(), bins=20, color='red', alpha=0.75) # histogram of GDP growth rates
ax.set_ylabel('Frequency') # add the y-axis label
ax.set_xlabel('Annual growth rate (%)')
ax.set_title('Frequency of US GDP growth rates (1929-1985)')
ax.spines['right'].set_visible(False) # get ride of the line on the right
ax.spines['top'].set_visible(False) # get rid of the line on top
ax.text(-20,14,'Avg GDP: '+str(round(avg_early,2)))
ax.text(-20,12,'Syd GDP: '+str(round(sd_early,2)))
plt.show() # tells jupyter to show the figure
fig, ax = plt.subplots()
# hist does not like NaN. (I'm a bit surprised.) I use the dropna() method to kill off the missing value
ax.hist(gdp_late['gdp_growth'].dropna(), bins=20, color='red', alpha=0.75) # histogram of GDP growth rates
ax.set_ylabel('Frequency') # add the y-axis label
ax.set_xlabel('Annual growth rate (%)')
ax.set_title('Frequency of US GDP growth rates (1986-2017)')
ax.spines['right'].set_visible(False) # get ride of the line on the right
ax.spines['top'].set_visible(False) # get rid of the line on top
ax.text(-1.5,4,'Avg GDP: '+str(round(avg_early,2)))
ax.text(-1.5,3.5,'Syd GDP: '+str(round(sd_early,2)))
plt.show() # tells jupyter to show the figure
fig, ax = plt.subplots(1, 2) # one row, two columns of axes
print(type(ax))
<class 'numpy.ndarray'>
So ax
is now an array that holds the axes for each plot. Each axes works just like before. Now we just have to tell python which axes to act on.
# Set a variable for plot color so I can change it everywhere easily
my_plot_color = 'black'
# I am using the figsize parameter here. It takes (width, height) in inches.
fig, ax = plt.subplots(1, 2, figsize=(10,4)) # one row, two columns of axes
# The fist plot
ax[0].plot(gdp.index, gdp['gdp_growth'], color=my_plot_color, label = 'GDP Growth') # a line plot of GDP growth rates
ax[0].axhline(y=0, color='black', linewidth=0.75) # Add a horizontal line at y=0
ax[0].set_xlabel('year')
ax[0].set_title('GDP growth rates')
ax[0].spines['right'].set_visible(False) # get ride of the line on the right
ax[0].spines['top'].set_visible(False) # get rid of the line on top
# The second plot
ax[1].hist(gdp['gdp_growth'].dropna(), bins=20, color=my_plot_color, alpha=0.25) # histogram of GDP growth rates
ax[1].set_xlabel('annual growth rate')
ax[1].set_title('Histogram of GDP growth rates')
ax[1].spines['right'].set_visible(False) # get ride of the line on the right
ax[1].spines['top'].set_visible(False) # get rid of the line on top
plt.savefig('double.pdf')
plt.show() # tells jupyter to show the figure
You can imagine how useful this can be. We can loop over sets of axes and automate making plots if we have several variables.
I changed a couple other things here, too.
- I used the
figsize
parameter to subplot. This is a tuple of figure width and height in inches. (Inches! Take that rest of the world!) The height and width are of the printed figure. You will notice that jupyter notebook scaled it down for display. This is useful when you are preparing graphics for a publication and you need to meet an exact figure size.
- I made the line color a variable, so it is easy to change all the line colors at one. For example, I like red figures when I am giving presentations, but black figures when I am creating pdfs that will be printed out on a black and white printer.
4. Bar charts (top)¶
Bar charts are useful for describing relatively few observations of categorical data --- meaning that one of the axes is not quantitative. Tufte would complain that they have a lot of redundant ink, but they are quite popular...and Tufte is not our dictator. Although, it's always good to think about what our figures are doing for us.
Bar charts are much better than pie charts for displaying the relative size of data. There are discussions of this all over the net (here is one I like) but the anti-pie-chart argument boils down to: pie charts are hard to read.
- Humans are bad at judging the relative sizes of 2D spaces. They cannot tell if one slice is 10% larger than another slice.
- The MS Excel style of coloring the slice different colors creates problems. Humans judge darker colors to have larger areas.
- To get quantitative traction, people label the slices with the data values. In this case, a table of numbers is probably a better way to share the data.
# PPP GDP data from the penn world tables
code = ['USA', 'FRA', 'JPN', 'CHN', 'IND', 'BRA', 'MEX']
country = ['United States', 'France', 'Japan', 'China', 'India',
'Brazil', 'Mexico']
gdppc = [53.1, 36.9, 36.3, 11.9, 5.4, 15.0, 16.5]
gdp = pd.DataFrame({'gdppc': gdppc, 'country': country}, index=code)
gdp
gdppc | country | |
---|---|---|
USA | 53.1 | United States |
FRA | 36.9 | France |
JPN | 36.3 | Japan |
CHN | 11.9 | China |
IND | 5.4 | India |
BRA | 15.0 | Brazil |
MEX | 16.5 | Mexico |
fig, ax = plt.subplots(figsize=(10,5))
ax.bar(gdp.index, gdp['gdppc'], color='blue', alpha=0.25) # bar(x labels, )
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_ylabel('PPP GDP per capita')
ax.set_title('Income per person (at purchasing power parity)')
plt.show() # tells jupyter to show the figure
The ordering of the bars is pretty random. We could sort it poor to rich.
fig, ax = plt.subplots(figsize=(10,5))
gdp_sort= gdp.sort_values('gdppc')
ax.bar(gdp_sort.index, gdp_sort['gdppc'], color='blue', alpha=0.25) # bar(x labels, )
ax.grid(axis='y', color='white')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_title('Income per person (at purchasing power parity)')
ax.set_ylabel('PPP GDP per capita')
plt.show() # tells jupyter to show the figure
Notice the use of grid()
to specify grid lines on the y axis. I made them white, so they only show up in the bars. It's something I'm experimenting with. I'm not sure I like it.
Maybe you prefer a horizontal bar chart. Same data, same approach. We need to swap all the y labels for x labels.
Practice: Bar Charts ¶
Take a few minutes and try the following. Feel free to chat with those around if you get stuck.
- Create a horizontal bar chart. Check the documentation for
barh()
- Fix up your figure labels, etc.
fig, ax = plt.subplots(figsize=(10,5))
gdp_sort= gdp.sort_values('gdppc')
ax.barh(gdp_sort.index, gdp_sort['gdppc'], color='red', alpha=0.25) # bar(x labels, )
ax.grid(axis='x', color='white')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_xlabel('PPP GDP per capita')
ax.set_title('Income per person (at purchasing power parity)')
plt.show() # tells jupyter to show the figure
- Create a new horizontal bar chart where each bar is gdp per capita relative to the United States. So USA =1, MEX = 0.31, etc.
gdp_sort['rel_gdp'] = gdp_sort['gdppc']/gdp_sort.loc['USA', 'gdppc']
fig, ax = plt.subplots(figsize=(10,5))
ax.barh(gdp_sort.index, gdp_sort['rel_gdp'], color='red', alpha=0.25) # bar(x labels, )
ax.grid(axis='x', color='white')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_xlabel('PPP GDP per capita relative to the United States')
ax.set_title('Income per person (at purchasing power parity)')
plt.show() # tells jupyter to show the figure
from pandas_datareader import data, wb # we are grabbing the data and wb functions from the package
import datetime as dt # for time and date. We'll learn more about datetime in the next lecture when we do time-series
codes = ['GDPC1', 'UNRATE'] # real gdp, unemployment rate
start = dt.datetime(1970, 1, 1)
fred = data.DataReader(codes, 'fred', start)
fred.head()
GDPC1 | UNRATE | |
---|---|---|
DATE | ||
1970-01-01 | 4939.759 | 3.9 |
1970-02-01 | NaN | 4.2 |
1970-03-01 | NaN | 4.4 |
1970-04-01 | 4946.770 | 4.6 |
1970-05-01 | NaN | 4.8 |
Ugh. The gdp data are quarterly, but the unemployment rate is monthly. Let's fix this by downsampling to quarterly frequency.
fred_q=fred.resample('q').mean() # Create an average quarterly unemployment rate
fred_q.head()
GDPC1 | UNRATE | |
---|---|---|
DATE | ||
1970-03-31 | 4939.759 | 4.166667 |
1970-06-30 | 4946.770 | 4.766667 |
1970-09-30 | 4992.357 | 5.166667 |
1970-12-31 | 4938.857 | 5.833333 |
1971-03-31 | 5072.996 | 5.933333 |
fred_q['gdp_gr'] = fred_q['GDPC1'].pct_change()*100 # growth rate of gdp. we've seen this a few times...
fred_q['unemp_dif'] = fred_q['UNRATE'].diff() # difference takes the first difference: u(t)-u(t-1)
fred_q.head()
GDPC1 | UNRATE | gdp_gr | unemp_dif | |
---|---|---|---|---|
DATE | ||||
1970-03-31 | 4939.759 | 4.166667 | NaN | NaN |
1970-06-30 | 4946.770 | 4.766667 | 0.141930 | 0.600000 |
1970-09-30 | 4992.357 | 5.166667 | 0.921551 | 0.400000 |
1970-12-31 | 4938.857 | 5.833333 | -1.071638 | 0.666667 |
1971-03-31 | 5072.996 | 5.933333 | 2.715993 | 0.100000 |
fig, ax = plt.subplots(figsize=(10,5))
ax.scatter(fred_q.gdp_gr, fred_q.unemp_dif)
ax.set_title('Okun\'s Law in the United States' )
ax.set_ylabel('change in unemployment rate')
ax.set_xlabel('gdp growth rate')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show() # tells jupyter to show the figure
Practice: Scatter Plots ¶
Take a few minutes and try the following. Feel free to chat with those around if you get stuck.
Let's explore some of scatter plot's options.
- Change the color of the dots to red and lighten them up using alpha
fig, ax = plt.subplots(figsize=(10,5))
ax.scatter(fred_q.gdp_gr, fred_q.unemp_dif, color='red', alpha = 0.25, marker = '^')
ax.text(fred_q.loc['1971-3-31', 'gdp_gr']+0.1, fred_q.loc['1971-3-31', 'unemp_dif'], '2009Q3', ha='left')
ax.set_title('Okun\'s Law in the United States' )
ax.set_ylabel('change in unemployment rate')
ax.set_xlabel('gdp growth rate')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show() # tells jupyter to show the figure
Check out the documentation for marker styles.
- Change the marker to a triangle.
- Use text() or annotate() to label the point corresponding to third quarter 2009: '2009Q3'
fig, ax = plt.subplots(figsize=(10,5))
ax.scatter(fred_q.gdp_gr, fred_q.unemp_dif, color='red', alpha = 0.25, marker = '^')
ax.text(fred_q.loc['1971-3-31', 'gdp_gr']+0.1, fred_q.loc['1971-3-31', 'unemp_dif'], '2009Q3', ha='left')
ax.set_title('Okun\'s Law in the United States' )
ax.set_ylabel('change in unemployment rate')
ax.set_xlabel('gdp growth rate')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show() # tells jupyter to show the figure
Scatter plots are very useful and we can do a lot more with them. Places to go from here.
- Add a line of best fit. A bit clunky in matplotlib (use np's polyfit command), but not too bad. Seaborn is a package that automates some matplotlib commands while also introducing some new, useful plots. For example, it has a regplot command that makes adding a trend line simple.
import seaborn as sns
sns.regplot(x="gdp_gr", y="unemp_dif", data=fred, ax=ax) # the ax=ax tells to apply reglplot to the plot "ax"
- Make data markers different colors or sizes depending on the value of a third variable. For example, you could get some more data and color the markers for years with a repbulican president red and markers for years with democratic presidents blue.
- Zoom-in on the bulk of the data by either dropping outliers or changing the axis limits. You can zoom-in on either (or both) axes using
ax.set_xlim(xmin,xmax) # xmin is the lower bound, xmax is the upper bound
ax.set_ylim(ymin,ymax) # ymin is the lower bound, ymax is the upper bound
- Other ideas?