Lecture 16: Creating Maps [MY SOLUTIONS]
Okay, we now know enough about figures and pandas to get a map off the ground.
We will use the geopandas package to plot our maps. Maps are really quite complicated. We are trying to project a spherical surface onto a flat figure, which is inherently a complicated endeavor. Luckily, geopandas will handle most of it for us.
Here's an outline of today's lecture:
1. Installing Geopandas (return home)¶
Plan A: Install via Conda (not pip)¶
Open your terminal and type the following
conda install geopandas
If you don't get any errors or conflict notifications, super -- you're ready to go!
If you do get conflicts, we have to go a different route to get geopandas installed on your computer. Geopandas is really useful but it's also very needy and requires specific versions of packages.
Plan B - Establish an Alternate Python Environment¶
Step 1 - Open the Anaconda command prompt (on Windows, type "Anaconda" in the Start menu and you'll see an option for "Anaconda Prompt") and create new environment variable say “geo_env” in our case using the command
conda create -n geo_env
What we're doing is creating a new environment for an additional set of python libraries to live.
Step 2 - Activate the environment "geo_env" and add and set conda-forge
channel. This amounts to letting the new environment know how to access new packages to add.
conda activate geo_env
conda config --env --add channels conda-forge
conda config --env --set channel_priority strict
Step 3 - Install Geopandas in the environment just created.
conda install geopandas
The geopanda docs strongly argue that we should use conda
rather than pip
to install geopandas so we'll do that. You can check if everything worked by typing in the command line:
python
import geopandas
exit()
where python
initiated a python instance, import geopandas
imported geopandas, and exit()
returned us to the command line becaue we did not receive any errors.
Step 4 - Install jupyter notebook in this new environment since Jupyter is only installed in the "base" environment by default.
conda install jupyter notebook
Step 5- Add the environment to jupyter notenook.
python -m ipykernel install --name geo_env
Now, when you want to run Jupyter from the geo_env
, open the "Anaconda Prompt" and type
conda activate geo_env
to activate the environment and then type
jupyter notebook
to initialize Jupyter. Everything else will work just like before but do be aware that you will have to re-install any packages that you want to use in this environment but were not included by default. For instance, we'll need matplotlib and descartes in this notebook:
pip install matplotlib
pip install descartes
Feeling like computer scientists yet? Yeah, sorry. Let's get on with the data wrangling and map making!
2. Creating Our First Map (return home)¶
import pandas as pd # pandas for data management
import geopandas # geopandas for maps work
from shapely.geometry import Point # shapely handles the coordinate references for plotting shapes
import matplotlib.pyplot as plt # matplotlib for plotting details
Setting up the GeoDataFrame¶
Let's start by plotting some cities. The DataFrame below holds longitudes and latitudes of major South American cities. Our goal is to turn them into something we can plot---in this case, a GeoDataFrame.
cities = pd.DataFrame(
{'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]})
cities.head()
City | Country | Latitude | Longitude | |
---|---|---|---|---|
0 | Buenos Aires | Argentina | -34.58 | -58.66 |
1 | Brasilia | Brazil | -15.78 | -47.91 |
2 | Santiago | Chile | -33.45 | -70.66 |
3 | Bogota | Colombia | 4.60 | -74.08 |
4 | Caracas | Venezuela | 10.48 | -66.86 |
We need tuples of coordinates to map the cities. We zip together lat and long to create the tuples and store them in a column named 'Coordinates'.
cities['Coordinates'] = list(zip(cities.Longitude, cities.Latitude))
cities.head()
City | Country | Latitude | Longitude | Coordinates | |
---|---|---|---|---|---|
0 | Buenos Aires | Argentina | -34.58 | -58.66 | (-58.66, -34.58) |
1 | Brasilia | Brazil | -15.78 | -47.91 | (-47.91, -15.78) |
2 | Santiago | Chile | -33.45 | -70.66 | (-70.66, -33.45) |
3 | Bogota | Colombia | 4.60 | -74.08 | (-74.08, 4.6) |
4 | Caracas | Venezuela | 10.48 | -66.86 | (-66.86, 10.48) |
So far, we haven't done anything that requires new packages. We just have a dataFrame full of numbers and strings.
Next, we turn the tuple into a Point object. Notice that we imported Point from the Shapely package in the first code cell. We use the .apply()
method of DataFrame to apply the Point function to each row of the Coordinates column.
cities['Coordinates'] = cities['Coordinates'].apply(Point)
cities.head()
City | Country | Latitude | Longitude | Coordinates | |
---|---|---|---|---|---|
0 | Buenos Aires | Argentina | -34.58 | -58.66 | POINT (-58.66 -34.58) |
1 | Brasilia | Brazil | -15.78 | -47.91 | POINT (-47.91 -15.78) |
2 | Santiago | Chile | -33.45 | -70.66 | POINT (-70.66 -33.45) |
3 | Bogota | Colombia | 4.60 | -74.08 | POINT (-74.08 4.6) |
4 | Caracas | Venezuela | 10.48 | -66.86 | POINT (-66.86 10.48) |
We now have a column of Point objects.
We turn the DataFrame into a GeoDataFrame, which is a data structure that understands how to plot maps. The important part here is that we specify the column that contains the geometery
data. From the docs:
The most important property of a GeoDataFrame is that it always has one GeoSeries column that holds a special status. This GeoSeries is referred to as the GeoDataFrame’s “geometry”. When a spatial method is applied to a GeoDataFrame (or a spatial attribute like area is called), this commands will always act on the “geometry” column.
In our case, the geometry data are the points in the 'Coordinates' column.
gcities = geopandas.GeoDataFrame(cities, geometry='Coordinates')
gcities.head()
City | Country | Latitude | Longitude | Coordinates | |
---|---|---|---|---|---|
0 | Buenos Aires | Argentina | -34.58 | -58.66 | POINT (-58.66000 -34.58000) |
1 | Brasilia | Brazil | -15.78 | -47.91 | POINT (-47.91000 -15.78000) |
2 | Santiago | Chile | -33.45 | -70.66 | POINT (-70.66000 -33.45000) |
3 | Bogota | Colombia | 4.60 | -74.08 | POINT (-74.08000 4.60000) |
4 | Caracas | Venezuela | 10.48 | -66.86 | POINT (-66.86000 10.48000) |
# Doesn't look different than a vanilla DataFrame...let's make sure we have what we want.
print('gdf is of type:', type(gcities))
# And how can we tell which column is the geometry column?
print('\nThe geometry column is:', gcities.geometry.name)
gdf is of type: <class 'geopandas.geodataframe.GeoDataFrame'> The geometry column is: Coordinates
Okay, we have our points in the GeoDataFrame. Let's plot the locations on a map. We proceed in three steps:
A. Get the map.
B. Plot the map.
C. Plot points on the map / Annotate the map.
A. Get the map¶
Natural Earth is the name of the organiztion that compiled the map data. The file provides the outlines of countries over which we will plot the locations of the cities in our GeoDataFrame. Geopandas comes with this data bundled into it, so we do not have to go get it. Thanks geopandas!
# Step 1: Get the map.
# geopandas comes with some datasets that define maps.
# Here, we grab a low-resolution Natural Earth map and load it into a GeoDataFrame.
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
world.sample(10)
pop_est | continent | name | iso_a3 | gdp_md_est | geometry | |
---|---|---|---|---|---|---|
134 | 287800.0 | Oceania | New Caledonia | NCL | 10770 | POLYGON ((165.77999 -21.08000, 166.59999 -21.7... |
63 | 4937374.0 | Africa | Liberia | LBR | 3070 | POLYGON ((-8.43930 7.68604, -8.48545 7.39521, ... |
1 | 58005463.0 | Africa | Tanzania | TZA | 63177 | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... |
168 | 44269594.0 | Africa | Uganda | UGA | 35165 | POLYGON ((33.90371 -0.95000, 31.86617 -1.02736... |
72 | 30366036.0 | Africa | Mozambique | MOZ | 15291 | POLYGON ((34.55999 -11.52002, 35.31240 -11.439... |
40 | 28515829.0 | South America | Venezuela | VEN | 482359 | POLYGON ((-60.73357 5.20028, -60.60118 4.91810... |
99 | 163046161.0 | Asia | Bangladesh | BGD | 302571 | POLYGON ((92.67272 22.04124, 92.65226 21.32405... |
33 | 4246439.0 | North America | Panama | PAN | 66800 | POLYGON ((-77.35336 8.67050, -77.47472 8.52429... |
158 | 34268528.0 | Asia | Saudi Arabia | SAU | 792966 | POLYGON ((34.95604 29.35655, 36.06894 29.19749... |
89 | 299882.0 | Oceania | Vanuatu | VUT | 934 | MULTIPOLYGON (((167.21680 -15.89185, 167.84488... |
# Which column holds the geometry data?
world.geometry.name
'geometry'
This is another GeoDataFrame. The geometry data are the column named 'geometry'.
A quick word about polygons¶
Instead of Points, the geometery are POLYGONs. The polygons are the shapes of countries.
world = world.set_index('name')
# Hello Albania
world.loc['Albania','geometry']
Wow, that was cool.
A polygon is a loop of points connected by straight lines (e.g., triangle or rectangle). The more points we have, the closer the polygon can approximate non-linear shapes. So Albania's shape is defined by many points connected by lines.
# Returns two arrays that hold the x and y coordinates of the points that define the polygon's exterior.
x, y = world.loc['Albania','geometry'].exterior.coords.xy
# How many points?
print('Points in the exterior of Albania:', len(x))
Points in the exterior of Albania: 24
# Afghanistan is a more complicated shape
world.loc['Afghanistan','geometry']
# Returns two arrays that hold the x and y coordinates of the points that define the polygon's exterior.
x, y = world.loc['Afghanistan', 'geometry'].exterior.coords.xy
# How many points?
print('Points in the exterior of Afghanistan:', len(x))
Points in the exterior of Afghanistan: 69
Wow, that was cool.
A polygon is a loop of points connected by straight lines (e.g., triangle or rectangle). The more points we have, the closer the polygon can approximate non-linear shapes. So Albania's shape is defined by many points connected by lines.
B. Plot the Map¶
# Here is the code. Details in the cell below it.
# Step 2: Plot the map
fig, gax = plt.subplots(figsize=(10,10))
# By only plotting rows in which the continent is 'South America' we only plot SA.
world[world['continent'] == 'South America'].plot(ax = gax, edgecolor='blue', color='red',alpha=.2)
gax.set_ylabel('longitude') # By the way, if you haven't read the book 'longitude' by Dava Sobel, you should...
gax.set_xlabel('latitude')
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)
plt.show()
GeoDataFrame plots¶
Nice one! That was easy, and now we have a map.
Note the different syntax for plot. We have been using the plot()
method of matplotlib axes objects, so we usually called
gax.plot(x, y)
which plotted x against y on the axis gax.
With a GeoDataFrame, we invoke the plot()
method of a GeoDataFrame object with
world.plot(ax = gax)
which will plot the geometry data in world
on the axis gax
. Note that this is similar to the syntax that seaborn uses.
Other options¶
Notice that lots of the regular matplotlib options still work. I can still turn of the top and right spines (do I want to?) and I can add x and y axes labels. The parameter 'edgecolor' sets the line colors, etc.
It looks like I didn't put a title on my plot. Poor form. Let's fix it when we add the cities.
C. Plot Cities on the Map / Annotate the Map¶
# Step 3: plot the cities onto the map
# We mostly use the code from before --- we still want the country borders ploted --- and we add a command to plot the cities
fig, gax = plt.subplots(figsize=(10,10))
# By only plotting rows in which the continent is 'South America' we only plot SA.
world[world['continent'] == 'South America'].plot(ax = gax, edgecolor='black',color='white')
# This plot the cities. It's the same sytax, but we are plotting from a different GeoDataFrame.
# I want the cities as pale red dots.
gcities.plot(ax=gax, color='red', alpha = 0.5)
gax.set_ylabel('longitude')
gax.set_xlabel('latitude')
gax.set_title('South America',fontsize=20)
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)
plt.show()
This looks pretty nice. What else might we want to do?
- Maybe add some labels to the points? We can add labels to the figure using
text()
just like we would in a regular figure. We need to be a little careful about specifying where to put the label. We do this below if you are interested.
- Knowing the latitude and longitude is important so let's kill the axes using
plt.axis('off')
.
# We mostly use the code from before --- we still want the country borders ploted --- and we add a command to plot the cities
fig, gax = plt.subplots(figsize=(10,10))
# By only plotting rows in which the continent is 'South America' we only plot, well, South America.
world[world['continent'] == 'South America'].plot(ax = gax, edgecolor='black',color='white')
# This plot the cities. It's the same sytax, but we are plotting from a different GeoDataFrame. I want the
# cities as pale red dots.
gcities.plot(ax=gax, color='red', alpha = 0.5)
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)
# Label the cities
for x, y, label in zip(gcities['Coordinates'].x, gcities['Coordinates'].y, gcities['City']):
gax.text(x, y, label)
plt.axis('off')
plt.show()
That took more work than I expected. Let's talk through that code. The first bit of code is
for x, y, label in zip(gdf['Coordinates'].x, gdf['Coordinates'].y, gdf['City']):
for
is looping over the rows of the GeoDataFrame.
gdf['Coordinates'].x, gdf['Coordinates'].y, gdf['City']
takes the x part of the coordinate, the y part of the coordinate and the name of the city for the row being looped over.
zip()
combines the x-coord, the y-coord, and the name together (zip is a handy function that takes multiple iterable objects and aggregates them together).
x, y, label
will hold the three values.
So, for each row the for
loops over, x is the x-coord, y is the y-coord, and label is the city name for city defined in that row. We use this data with .text()
to apply the label at point (x,y).
gax.text(x , y, label)
Improve the Labels¶
The labels get the job done, but they are a bit ugly. In particular, they are sitting on top of the dot.
We can use annotate()
to fix this up. We have used annotate()
before to add arrows connecting the text to a point. Here, we will use the ability to specify an offset of the text from the point. Here is the syntax
gax.annotate(label, xy=(x,y), xytext=(3,3), textcoords='offset points')
The parameter 'xy' is the point we are referencing. The parameter 'xytext' hold the number of points to offset the text from the point. The argument 'offset points' tells annotate that the (3,3) tuple we passed to 'xytext' is full of points to offset the label from the text.
# We mostly use the code from before --- we still want the country borders ploted --- and we add a command to plot the cities
fig, gax = plt.subplots(figsize=(10,10))
# By only plotting rows in which the continent is 'South America' we only plot, well, South America.
world[world['continent'] == 'South America'].plot(ax = gax, edgecolor='black',color='white')
# This plot the cities. It's the same sytax, but we are plotting from a different GeoDataFrame. I want the
# cities as pale red dots.
gcities.plot(ax=gax, color='red', alpha = 0.5)
gax.set_title('South America',fontsize=20)
# Kill the spines...
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)
# Label the cities
for x, y, label in zip(gcities['Coordinates'].x, gcities['Coordinates'].y, gcities['City']):
gax.annotate(label, xy=(x,y), xytext=(3,3), textcoords='offset points',color='red',size=14)
plt.axis('off')
plt.show()
3. Creating Maps Using Shapefiles (return home)¶
Let's work on developing a map of our own.
Our Agenda:
A. Plot state borders
B. Plot county borders
C. Projections
A. Plot State Borders¶
Go to https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
Here we find shapefiles for all kinds of maps: states, counties, congressional districts, zip codes. These are the cartographic boundary files. They are simplified maps. For example, they do not include the tiny islands off the Wisconsin coast, even though those islands are part of the state. These simplified files are not meant for navigation, but for "thematic" mapping. Which is what we are doing. Here is some background on the files.
We are going to work with the '5m' state border file. This means the resolution of the map is 1:5,000,000. There are 1:500,000 and 1:20,000,000 files, too. Different resolutions carry different levels of detail. You can experiment with the different shapefiles and see the differences.
It's a zipped file which you can get from the course website. I included the above stuff because it turns out the census has a ton of good demographic information so you should get used to working through the website looking for data. Put the zip file in your cwd and extract the files. Note that I extracted all of the files to a similarly titled sub-folder in my cwd. If you extract the files elsewhere, adjust the location in the read_file
in line 4 below. We need all of the files in the unzipped folder. Pass the '.shp' file to geopandas.read_file()
which works like other 'read' methods from pandas, but will create a geoDataFrame.
# Data from
# https://www.census.gov/geo/maps-data/data/cbf/cbf_state.html
states = geopandas.read_file('./Data/cb_2017_us_state_5m/cb_2017_us_state_5m.shp')
states['STATEFP'] = states['STATEFP'].astype(float)
states = states[states['STATEFP']<60] # drop places like the US Virgin Islands
states = states[(states['STATEFP']!=15) & (states['STATEFP']!=2)] # drop Hawaii and Alaska
states.head()
STATEFP | STATENS | AFFGEOID | GEOID | STUSPS | NAME | LSAD | ALAND | AWATER | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 01779775 | 0400000US01 | 01 | AL | Alabama | 00 | 131174431216 | 4592944701 | MULTIPOLYGON (((-88.04374 30.51742, -88.03661 ... |
2 | 4.0 | 01779777 | 0400000US04 | 04 | AZ | Arizona | 00 | 294198661567 | 1027245114 | POLYGON ((-114.79968 32.59362, -114.80939 32.6... |
3 | 8.0 | 01779779 | 0400000US08 | 08 | CO | Colorado | 00 | 268425964573 | 1178495763 | POLYGON ((-109.06025 38.59933, -109.05954 38.7... |
4 | 12.0 | 00294478 | 0400000US12 | 12 | FL | Florida | 00 | 138911437206 | 31398800291 | MULTIPOLYGON (((-80.75164 24.85725, -80.72906 ... |
5 | 13.0 | 01705317 | 0400000US13 | 13 | GA | Georgia | 00 | 149177524294 | 4733385577 | POLYGON ((-85.60516 34.98468, -85.55259 34.984... |
# What do we have here...
print(type(states))
print(states.geometry.name)
<class 'geopandas.geodataframe.GeoDataFrame'> geometry
This is already set up with the correct geometry and ready to go.
Plot the Georgia border.
fig, gax = plt.subplots(figsize=(10,10))
# Only take the state named 'Georgia'
states[states['NAME'] == 'Georgia'].plot(ax = gax, edgecolor='black',color='white')
plt.show()
Practice</font >
Change the state to edgecolor to orange, the line style to dashed, and the line width to 3.
fig, gax = plt.subplots(figsize=(7,7))
# Only take the state named 'Georgia'
states[states['NAME'] == 'Georgia'].plot(ax = gax, color='white',edgecolor='orange',ls='--',lw=3)
plt.show()
B. Plot County Borders¶
Let's add the county borders. To do so, we first need to get the shape files from https://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html. Again, let's use the '5m' files and again you can also get the files from the website, should extract all of the files, and should be careful about where you extract the files.
Like before, read the file in to a GeoDataFrame.
counties = geopandas.read_file('./Data/cb_2017_us_county_5m/cb_2017_us_county_5m.shp')
counties.head()
STATEFP | COUNTYFP | COUNTYNS | AFFGEOID | GEOID | NAME | LSAD | ALAND | AWATER | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 04 | 015 | 00025445 | 0500000US04015 | 04015 | Mohave | 06 | 34475503964 | 387344307 | POLYGON ((-114.75562 36.08717, -114.75364 36.0... |
1 | 22 | 105 | 00559500 | 0500000US22105 | 22105 | Tangipahoa | 15 | 2049488093 | 136678798 | POLYGON ((-90.56717 30.82481, -90.56719 30.999... |
2 | 16 | 063 | 00395624 | 0500000US16063 | 16063 | Lincoln | 06 | 3111451190 | 11606076 | POLYGON ((-114.59461 43.19834, -114.37496 43.1... |
3 | 27 | 119 | 00659505 | 0500000US27119 | 27119 | Polk | 06 | 5105067510 | 69169913 | POLYGON ((-97.14667 48.17148, -97.14570 48.173... |
4 | 38 | 017 | 01034226 | 0500000US38017 | 38017 | Cass | 06 | 4571107601 | 7732062 | POLYGON ((-97.70603 47.23998, -97.45151 47.239... |
This GeoDataFrame has all the counties in it. We only want the ones from Georgia. The Georgia federal information processing standard (fips) code is 13. Keep only counties from Georgia. (A chance to practice our subsetting!)
Plot the counties onto the same map as the state border. You will probably want to re-use the code from above.
# GA fips code is 13. Keep only those counties
ga_counties = counties[counties['STATEFP']=='13'] # the state codes came in as text. should look into this...
# Give it a quick sort so we can take a look
ga_counties = ga_counties.sort_values('NAME') # is Adams county here?
ga_counties.head()
STATEFP | COUNTYFP | COUNTYNS | AFFGEOID | GEOID | NAME | LSAD | ALAND | AWATER | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
1109 | 13 | 001 | 00349113 | 0500000US13001 | 13001 | Appling | 06 | 1313488286 | 13263005 | POLYGON ((-82.55069 31.74911, -82.52016 31.749... |
646 | 13 | 003 | 00345784 | 0500000US13003 | 13003 | Atkinson | 06 | 878787518 | 13550957 | POLYGON ((-83.14048 31.42039, -82.95852 31.417... |
2826 | 13 | 005 | 00344784 | 0500000US13005 | 13005 | Bacon | 06 | 735650672 | 4625861 | POLYGON ((-82.62734 31.67267, -82.52139 31.672... |
1918 | 13 | 007 | 00342832 | 0500000US13007 | 13007 | Baker | 06 | 885665382 | 18598655 | POLYGON ((-84.63808 31.33208, -84.62758 31.332... |
1661 | 13 | 009 | 00345255 | 0500000US13009 | 13009 | Baldwin | 06 | 670027550 | 24802965 | POLYGON ((-83.41855 33.17746, -83.39147 33.183... |
Plot the countiesby re-using the code from above. We can add shading for Fulton (dark orange) and Clarke (dark red) counties. Notice that we do this by overlapping layers onto the same figure axis.
fig, gax = plt.subplots(figsize=(7,7))
# Plot the counties
ga_counties.plot(ax=gax, edgecolor='black', color = 'white')
# Shade Clarke County
ga_counties[ga_counties['NAME'] == 'Clarke'].plot(ax = gax, edgecolor='black',color='darkred',alpha=.9)
# Shade Fulton County
ga_counties[ga_counties['NAME'] == 'Fulton'].plot(ax = gax, edgecolor='black',color='darkorange')
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)
plt.axis('off')
plt.show()
And we can combine the maps to create a two-by-one figure:
fig, gax = plt.subplots(1,2,figsize=(15,10))
#---------------------------------------------
# Plot the US
states.plot(ax = gax[0], edgecolor='black',color='white')
states[states['NAME'] == 'Georgia'].plot(ax = gax[0], edgecolor='black',ls='-',facecolor='red',lw=3)
gax[0].set_title('The United States \n', fontsize=20)
gax[0].axis('off')
gax[0].margins(0)
#---------------------------------------------
#---------------------------------------------
# Plot Georgia
states[states['NAME'] == 'Georgia'].plot(ax = gax[1], color='white', edgecolor='black',ls='-',lw=3)
ga_counties.plot(ax=gax[1], color = 'white',edgecolor='black',ls='-',lw=1)
gax[1].set_title('The Great State of Georgia \n', fontsize=20)
gax[1].axis('off')
gax[1].margins(0)
#---------------------------------------------
fig.subplots_adjust(hspace=0.0, wspace=0.0)
plt.show()
Okay, this is opening up possibilities. Ideally, we'd like to balance out the white space between the left figure (USA) and the right figure (Georgia) as well as add some obvious reasoning as to how the two are connected. But first let's deal with the fact that these look a little different than what we're used to.
Practice</font >
- We have a big game with Florida coming up. Create a figure with US states where you highlight Georgia and Florida. Color each according to the official html color codes. This will require some googling.
fig, gax = plt.subplots(1,3,figsize=(15,10))
#---------------------------------------------
# Plot the US
states.plot(ax = gax[0], edgecolor='black',color='white')
states[states['NAME'] == 'Georgia'].plot(ax = gax[0], edgecolor='black',ls='-',facecolor='#BA0C2F',lw=3)
states[states['NAME'] == 'Florida'].plot(ax = gax[0], edgecolor='black',ls='-',facecolor='#0021A5',lw=3)
gax[0].set_title('The United States \n', fontsize=20)
gax[0].axis('off')
gax[0].margins(0)
#---------------------------------------------
#---------------------------------------------
# Plot Georgia
states[states['NAME'] == 'Georgia'].plot(ax = gax[1], color='white', edgecolor='black',ls='-',lw=3)
ga_counties.plot(ax=gax[1], color = 'white',edgecolor='black',ls='-',lw=1)
gax[1].set_title('The Great State of Georgia \n', fontsize=20)
gax[1].axis('off')
gax[1].margins(0)
#---------------------------------------------
#---------------------------------------------
fl_counties = counties[counties['STATEFP']=='12'] # subset for just florida counties
# Plot Florida
states[states['NAME'] == 'Florida'].plot(ax = gax[2], color='white', edgecolor='black',ls='-',lw=3)
fl_counties.plot(ax=gax[2], color = 'white',edgecolor='black',ls='-',lw=1)
gax[2].set_title('The Mediocre State of Florida \n', fontsize=20)
gax[2].axis('off')
gax[2].margins(0)
#---------------------------------------------
fig.subplots_adjust(hspace=0.0, wspace=0.0)
plt.show()
C. Projections¶
You might be looking at our figure and thinking that Georgia looks too fat (insert joke here).
We are trying to represent a spherical object (the Earth) on a flat surface. Doing so deforms the shapes. How shapes are modified to be represented on a 2-dimensional plane depends on the projection method. Projections are complicated and are not a new problem. It's even an active area of research!
We can change the projection of our figures by changing the coordinate reference system (crs) of our data. We check the crs
using the 'crs' property of a geoDataFrame; ie,
states.crs
<Geographic 2D CRS: EPSG:4269> Name: NAD83 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands. - bounds: (167.65, 14.92, -40.73, 86.45) Datum: North American Datum 1983 - Ellipsoid: GRS 1980 - Prime Meridian: Greenwich
states=states.to_crs({'init': 'epsg:3395'})
ga_counties=ga_counties.to_crs({'init': 'epsg:3395'})
C:\Users\jt83241\Anaconda3\envs\geo_env\lib\site-packages\pyproj\crs\crs.py:141: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6 in_crs_string = _prepare_from_proj_string(in_crs_string) C:\Users\jt83241\Anaconda3\envs\geo_env\lib\site-packages\pyproj\crs\crs.py:141: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6 in_crs_string = _prepare_from_proj_string(in_crs_string)
What just happened!? The to_crs
just converted all of the polygons in the geoDataFrame to the mercator projection. Seriously, how cool is that! If you remember your trigonometry or physics, this amounted to all sorts of trig functions and radians.
Okay, now let's plot again and add some additional improvements using shading to investigate how colors may help tell a story.
fig, gax = plt.subplots(1,2,figsize=(15,10))
#---------------------------------------------
# Plot the US
states.plot(ax = gax[0], edgecolor='black',color='white')
# Shade Georgia
states[states['NAME']=='Georgia'].plot(ax = gax[0], edgecolor='black',color='darkred')
gax[0].set_title('The United States \n', fontsize=20)
gax[0].axis('off')
gax[0].margins(0)
#---------------------------------------------
#---------------------------------------------
# Plot Georgia
states[states['NAME']=='Georgia'].plot(ax = gax[1], linewidths = 10, edgecolor='darkred',color='white')
ga_counties.plot(ax=gax[1], edgecolor='black', color = 'white')
# Shade Clarke County
ga_counties[ga_counties['NAME'] == 'Clarke'].plot(ax = gax[1], edgecolor='black',color='blue',alpha=.7)
gax[1].set_title('The Great State of Georgia', fontsize=20)
gax[1].axis('off')
gax[1].margins(.08)
#---------------------------------------------
fig.subplots_adjust(hspace=0.0, wspace=0.0)
plt.show()
Practice</font >
- Make the figure with a different projection. Try 4326.
states=states.to_crs({'init': 'epsg:4326'})
ga_counties=ga_counties.to_crs({'init': 'epsg:4326'})
fig, gax = plt.subplots(1,2,figsize=(15,10))
#---------------------------------------------
# Plot the US
states.plot(ax = gax[0], edgecolor='black',color='white')
# Shade Georgia
states[states['NAME']=='Georgia'].plot(ax = gax[0], edgecolor='black',color='darkred')
gax[0].set_title('The United States \n', fontsize=20)
gax[0].axis('off')
gax[0].margins(0)
#---------------------------------------------
#---------------------------------------------
# Plot Georgia
states[states['NAME']=='Georgia'].plot(ax = gax[1], linewidths = 10, edgecolor='darkred',color='white')
ga_counties.plot(ax=gax[1], edgecolor='black', color = 'white')
# Shade Clarke County
ga_counties[ga_counties['NAME'] == 'Clarke'].plot(ax = gax[1], edgecolor='black',color='blue',alpha=.7)
gax[1].set_title('The Great State of Georgia', fontsize=20)
gax[1].axis('off')
gax[1].margins(.08)
#---------------------------------------------
fig.subplots_adjust(hspace=0.0, wspace=0.0)
plt.show()
C:\Users\jt83241\Anaconda3\envs\geo_env\lib\site-packages\pyproj\crs\crs.py:141: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6 in_crs_string = _prepare_from_proj_string(in_crs_string) C:\Users\jt83241\Anaconda3\envs\geo_env\lib\site-packages\pyproj\crs\crs.py:141: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6 in_crs_string = _prepare_from_proj_string(in_crs_string)
- Reproduce the above figure with a mercator projection but a different state and county. Your choice.
states=states.to_crs({'init': 'epsg:3395'})
counties=counties.to_crs({'init': 'epsg:3395'})
fig, gax = plt.subplots(1,2,figsize=(15,10))
#---------------------------------------------
# Plot the US
states.plot(ax = gax[0], edgecolor='black',color='white')
# Shade Georgia
states[states['NAME']=='Minnesota'].plot(ax = gax[0], edgecolor='black',color='darkred')
gax[0].set_title('The United States \n', fontsize=20)
gax[0].axis('off')
gax[0].margins(0)
#---------------------------------------------
#---------------------------------------------
mn_counties = counties[counties['STATEFP']=='27'] # subset for just minnesota counties
# Plot Minnesota in US map
states[states['NAME']=='Minnesota'].plot(ax = gax[1], linewidths = 10, edgecolor='darkred',color='white')
# Plot Minnesota county boundaries
mn_counties.plot(ax=gax[1], edgecolor='black', color = 'white')
# Shade Hennepin County
mn_counties[mn_counties['NAME'] == 'Hennepin'].plot(ax = gax[1], edgecolor='black',color='blue',alpha=.7)
gax[1].set_title('The True North: Minnesota', fontsize=20)
gax[1].axis('off')
gax[1].margins(.08)
#---------------------------------------------
fig.subplots_adjust(hspace=0.0, wspace=0.0)
plt.show()
Coming Up¶
So now we've figured out how to make maps using shapefiles. Next time we'll work on merging our maps with data to illustrate spatial variation in data; i.e., heat maps aka "chloropleth" maps. To be continued...