Lecture 10: Working with time-series data [SUGGESTED SOLUTIONS]

Time-series data are very common in finance and economics. Date and time systems have their own set of rules and logic, which makes them potentially complicated but also powerful. Both pandas and python have very powerful sets of tools to handle dates and time. We will work through the basics here. As always, the internet if full of more detailed applications and examples.

In this workbook we will cover

DateTime

Changing frequency, e.g., converting daily data to monthly

Plotting time-series data

Slicing dates

Moving-window calculations

1. DateTime (top)¶

Python uses the datetime type of object to hold date and time data. This allows for very sophisticated handling of time series data. Most of this is beyond the scope of our work, but if you ever find yourself having to deal with time zones or which week of the month a date lies in, datetime has you covered. [Python was developed to do a lot more than data analysis...]

We will mostly need to create datatime objects to hold dates. We pass datetime objects to methods that need to know about dates. We begin by importing the datetime module and cleverly call it dt.

In [1]:

import pandas as pd               #load the pandas package and call it pd
import matplotlib.pyplot as plt   # load the pyplot set of tools from the package matplotlib. Name it plt for short.
import datetime as dt     # data types and methods for dealing with dates and time. Name it dt for short.

Now for an example:

In [2]:

time_1 = dt.datetime(1776, 7, 4)  # year, month, date (datetime will add a time code of 00:00:00)
print('Declare independence on:', time_1)

Declare independence on: 1776-07-04 00:00:00

What type of object is time_1?

In [3]:

print(type(time_1))

<class 'datetime.datetime'>

Ladies and gentlemen, we have a new object: datetime!

For those of you scoring at home, the datetime function we used above is actually datetime.datetime() but we called the package dt. Silly programmers. We can convert datetime objects to different string outputs using the strftime() method which takes codes as arguments and you can format the string however you wish. For instance, I added a comma -- can you find it? A list of codes is in table 11-3 of McKinney.

In [4]:

print('Declare independence on:', time_1.strftime('%B %d, %Y'))     # I have no idea why month is 'B'

Declare independence on: July 04, 1776

In [5]:

# We can convert strings to datetime using the same codes. It give us the ability to handle strange formats.

string_date = '1815/1/8'           # What kind of mad man writes dates this way?
time_2 = dt.datetime.strptime(string_date, '%Y/%m/%d')
print('Battle in New Orleans on:', time_2.strftime('%B %d, %Y'))

Battle in New Orleans on: January 08, 1815

We might need some more datetime functionality, but that should be enough for now. Let's start working with some data!

The file 'VIXCLSD.csv' contains daily end-of-trading values of the 'VIX,' a measure of expected market volatility as implied by S&P 500 options. Business-news types like to refer to it as the 'fear index'. [That's a bit too dramatic for my tastes, but those guys have ads to sell.] The idea is that expected volatility rises when people are worried about the future.

In [6]:

vix = pd.read_csv('./Data/VIXCLSD.csv', na_values='.')   # There are some missing values marked as '.'
vix.head()

Out[6]:

	DATE	VIXCLS
0	1990-01-02	17.24
1	1990-01-03	18.19
2	1990-01-04	19.22
3	1990-01-05	20.11
4	1990-01-08	20.26

The data look about like I would expect. What kind of variables do we have?

In [7]:

print(vix.dtypes)

DATE       object
VIXCLS    float64
dtype: object

The 'DATE' variable is stored as string right now. This often happens because the computer sees things like hyphens and slashes among numbers and assumes the data are strings. Let's convert DATE column to a datetime object. Let's start by converting the dates and saving them to a new variable so that we can inspect it. Fortunately, pandas has a datetime method so we'll use that.

In [8]:

date_objs = pd.to_datetime(vix['DATE'])
print(date_objs.dtypes)

datetime64[ns]

So the to_datetime() conversion creates datetime64[ns] objects. Let's convert the 'DATE' column to datetime, save it in the DataFrame, and set the index to the dates:

In [9]:

vix['DATE'] = pd.to_datetime(vix['DATE'], yearfirst=True)  # Convert date strings to date time
print( vix.dtypes )
vix.head()

DATE      datetime64[ns]
VIXCLS           float64
dtype: object

Out[9]:

	DATE	VIXCLS
0	1990-01-02	17.24
1	1990-01-03	18.19
2	1990-01-04	19.22
3	1990-01-05	20.11
4	1990-01-08	20.26

In [10]:

vix.set_index('DATE',inplace=True)
vix.head()

Out[10]:

	VIXCLS
DATE
1990-01-02	17.24
1990-01-03	18.19
1990-01-04	19.22
1990-01-05	20.11
1990-01-08	20.26

2. Changing frequency (top)¶

We can resample the data to change its frequency. We need to specify how to deal with the vix data. In this case, I want to average it. If we had gdp data, I would want to sum it.

We are downsampling data if we are reducing the number of observations (daily$\rightarrow$monthly) and we are upsampling the data if we are increasing the number of observations (monthly$\rightarrow$daily). In my work, I have downsampled a lot. I do not think I have ever upsampled.

In [11]:

vix_w = vix.resample('w').mean()
vix_w.head()

Out[11]:

	VIXCLS
DATE
1990-01-07	18.690
1990-01-14	21.918
1990-01-21	24.304
1990-01-28	25.744
1990-02-04	25.648

We had 7,502 daily observations. Now we have 1501 weekly observations. Note the weeks are Sunday-based. Let's downsample some more.

In [12]:

vix_m = vix.resample('m').mean()   # monthly
vix_y = vix.resample('y').mean()   # yearly

3. Plotting time-series data (top)¶

Now that we have our index set as datetime objects, we can plot the data in sensible ways. In particular, matplotlib can gracefully handle data at different frequencies.

In [13]:

fig, ax = plt.subplots(figsize=(10,5))
ax.plot(vix_m.index, vix_m['VIXCLS'], color = 'red', label = 'monthly average')
ax.plot(vix_y.index, vix_y['VIXCLS'], color = 'blue', label = 'yearly average')

ax.set_title('CBOE Volatility Index: VIX')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

ax.legend(frameon=False)

plt.show()

If we zoom in tight enough, the time axis will relabel in a natural way to reflect the changed time scale. When we refer to coordinates in a time series figure, we pass the x-coordinates as a datetime object. Below, we use datetimes to limit the x-axis and to make some annotations.

In [14]:

# Any place we need to reference the x axis we use a datetime object.

fig, ax = plt.subplots(figsize=(10,5))
ax.plot(vix.index, vix['VIXCLS'], color = 'red', label = 'daily close')

ax.set_xlim( dt.datetime(2008,1,1), dt.datetime(2010, 12, 31) ) # The limits need datetime objects

ax.annotate('Lehman Collapse', xy=(dt.datetime(2008,9,15), 70), xytext=(dt.datetime(2008,2,1), 70), 
            arrowprops={'facecolor':'black',  'arrowstyle':'->'}) 

ax.annotate('Greek Debt Crisis', xy=(dt.datetime(2010,5,1), 40), xytext=(dt.datetime(2009,8,1), 40), 
            arrowprops={'facecolor':'black',  'arrowstyle':'->'})

ax.set_title('CBOE Volatility Index: VIX')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

ax.legend(frameon=False)

plt.show()

Practice: Dates ¶

Take a few minutes and try the following. Feel free to chat with those around if you get stuck.

Read the file 'osk.csv' into a DataFrame. The file contains daily closing prices for OshKosh Corp. and the S&P500.
Set the index to the date. Make sure the dates are datetime objects!

In [15]:

prices = pd.read_csv('./Data/osk.csv')
prices['Date']=pd.to_datetime(prices['Date'])
prices.set_index('Date', inplace=True)

prices.head()

Out[15]:

	osk	sp500
Date
2018-01-02	92.169998	2695.810059
2018-01-03	92.029999	2713.060059
2018-01-04	91.279999	2723.989990
2018-01-05	89.970001	2743.149902
2018-01-08	90.660004	2747.709961

Normalize each series by dividing every observation by the value at 01-02-2018. Try
```
prices['osk_n'] = prices['osk'] / prices.loc['2018-01-02','osk']
```
where I called my dataframe prices. Do something similar to the S&P500.

In [16]:

prices['osk_n'] = prices['osk'] / prices.loc['2018-01-02','osk']
prices['sp500_n'] = prices['sp500'] / prices.loc['2018-01-02','sp500']
prices.head()

Out[16]:

	osk	sp500	osk_n	sp500_n
Date
2018-01-02	92.169998	2695.810059	1.000000	1.000000
2018-01-03	92.029999	2713.060059	0.998481	1.006399
2018-01-04	91.279999	2723.989990	0.990344	1.010453
2018-01-05	89.970001	2743.149902	0.976131	1.017561
2018-01-08	90.660004	2747.709961	0.983617	1.019252

Plot the two normalized series

In [17]:

fig, ax = plt.subplots(figsize=(10,5))
ax.plot(prices.index, prices['osk_n'], color = 'red', label = 'Osh Kosh (daily)', alpha = 1)
ax.plot(prices.index, prices['sp500_n'], color = 'blue', label = 'S&P 500 (daily)', alpha = 1)


#ax.set_xlim( datetime(2008,1,1), datetime(2010, 12, 31) ) # The limits need datetime objects
ax.set_ylabel('closing price (Jan 1, 2018 = 1.0)')
ax.set_title('Normalized closing prices')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

ax.legend(frameon=False)

plt.show()

Resample the data to a weekly frequency creating average weekly prices. (I would recommend creating a new DataFrame)
Create new indexes, normalizing by the first week of the year.
Add the weekly data to your figure from part 4. Use the alpha parameter to lighten up the daily data lines.
How does your plot look? Are the titles correct? Is there a legend or some other way of identifying the lines?

In [18]:

prices_w = prices[['osk', 'sp500']].resample('w').mean()
prices_w['osk_n'] = prices_w['osk'] / prices_w.loc['2018-01-07','osk']
prices_w['sp500_n'] = prices_w['sp500'] / prices_w.loc['2018-01-07','sp500']
prices_w.head()

Out[18]:

	osk	sp500	osk_n	sp500_n
Date
2018-01-07	91.362499	2719.002502	1.000000	1.000000
2018-01-14	91.040003	2760.206006	0.996470	1.015154
2018-01-21	91.939999	2796.827515	1.006321	1.028623
2018-01-28	94.536000	2844.352002	1.034735	1.046101
2018-02-04	91.056000	2816.775977	0.996645	1.035959

In [19]:

fig, ax = plt.subplots(figsize=(10,5))
ax.plot(prices.index, prices['osk_n'], color = 'red', label = 'Osh Kosh (daily)', alpha = 0.25)
ax.plot(prices.index, prices['sp500_n'], color = 'blue', label = 'S&P 500 (daily)', alpha = 0.25)

ax.plot(prices_w.index, prices_w['osk_n'], color = 'red', label = 'Osh Kosh (weekly)')
ax.plot(prices_w.index, prices_w['sp500_n'], color = 'blue', label = 'S&P 500 (weekly)')

#ax.set_xlim( datetime(2008,1,1), datetime(2010, 12, 31) ) # The limits need datetime objects
ax.set_ylabel('closing price (Jan 1, 2018 = 1.0)')
ax.set_title('Normalized closing prices')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

ax.legend(frameon=False)

plt.show()

4. Slicing dates (top)¶

Our old way of referencing a value:

In [20]:

print('The vix on June 18, 2008 is', vix.loc['2008-06-18', 'VIXCLS'])

The vix on June 18, 2008 is 22.24

Our date-specific way:

In [21]:

# Option 1: create a datetime and pass it
date_to_extract = dt.datetime(2008, 6, 18)  # create a datetime with the day I want
print('The vix on', date_to_extract.strftime('%B %d, %Y'), 'is', vix.loc[date_to_extract, 'VIXCLS'] )   # extract with with loc[], like usual

The vix on June 18, 2008 is 22.24

An alternative way to 'zoom in' on the data is to plot a slice of the DataFrame. With an index of datetime, this is super easy.

Let's use the VIX data as an example to develop intuition for what's going on when we slice dates:

We can grab every observation with a year of 2008 with '2008'.

In [22]:

print('Every week in 2008:', vix_w.loc['2008'])  # all the weeks in 2008

Every week in 2008:              VIXCLS
DATE               
2008-01-06  23.0250
2008-01-13  24.0940
2008-01-20  25.2520
2008-01-27  29.2225
2008-02-03  26.5880
2008-02-10  27.7740
2008-02-17  25.8740
2008-02-24  24.7925
2008-03-02  23.5380
2008-03-09  26.2880
2008-03-16  28.2820
2008-03-23  28.6225
2008-03-30  25.8240
2008-04-06  23.4760
2008-04-13  22.6060
2008-04-20  21.5260
2008-04-27  20.2560
2008-05-04  19.5460
2008-05-11  19.1300
2008-05-18  17.2400
2008-05-25  18.1560
2008-06-01  18.6700
2008-06-08  20.6120
2008-06-15  22.9940
2008-06-22  21.7540
2008-06-29  22.7140
2008-07-06  24.5750
2008-07-13  25.4480
2008-07-20  26.2360
2008-07-27  22.3780
2008-08-03  22.5960
2008-08-10  21.3340
2008-08-17  20.5520
2008-08-24  20.2620
2008-08-31  20.2600
2008-09-07  22.6275
2008-09-14  24.5360
2008-09-21  32.6780
2008-09-28  34.4640
2008-10-05  43.2640
2008-10-12  59.4260
2008-10-19  63.4620
2008-10-26  64.5320
2008-11-02  67.9540
2008-11-09  55.1500
2008-11-16  62.8040
2008-11-23  72.9160
2008-11-30  58.9500
2008-12-07  63.1560
2008-12-14  56.6380
2008-12-21  50.2480
2008-12-28  44.2925

We can grab every observation between Jan 1, 2008 and Jan 31, 2008 by slicing with '2008/01/01':'2008/01/31'. Python is smart enough to understand references to dates that are not in the DataFrame. There is no January 1 or January 31 observation.

In [23]:

print('\nEvery week in Jan 2008:', vix_w.loc['2008/01/01':'2008/01/31']) # all the weeks in January 2008

Every week in Jan 2008:              VIXCLS
DATE               
2008-01-06  23.0250
2008-01-13  24.0940
2008-01-20  25.2520
2008-01-27  29.2225

5. Moving-window calculations (top)¶

The rolling() method of DataFrame allows methods like mean() and var() to be calculated over a rolling window of the data. We often use moving averages to extract trends from noisy data.

Calling

vix['VIXCLS'].rolling(60)

generates a Series of rolling observations. In this case, I am asking for a 60-day window. We can directly call a method like mean() or var() to compute rolling calculations.

vix['VIXCLS'].rolling(60).mean()

A couple important things to note:

Rolling windows only make sense with time-series data so the index must be datetime.

We strip out any missing values in the raw data using dropna() before building the rolling windows. Otherwise, rolling() will only work on the complete 60-day windows.

In [24]:

vix['vix_ma'] = vix['VIXCLS'].dropna().rolling(60).mean()  # create 60-day moving average

fig, ax = plt.subplots(figsize=(10,5))
ax.plot(vix.index, vix['VIXCLS'], color = 'red', label = 'daily close', alpha=0.25)
ax.plot(vix.index, vix['vix_ma'], color = 'blue', label = '60-day moving average')

ax.set_title('CBOE Volatility Index: VIX')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.legend(frameon=False)

plt.show()

Practice: Constructing and Plotting Moving Averages ¶