Lecture 6: Slicing and Functions [SUGGESTED SOLUTIONS]

Before we can start working with data, we need to work out some of the basics of Python. The goal is to learn enough so that we can do some interesting data work --- we do not need to be Python Jedi.

We now know about the basic data structures in python, how types work, and how to do some basic computation and string manipulation. We can use flow control statements to steer our program to different blocks of code depending on conditional statements and we have sorted out loops and list comprehensions.

Up next is a few more important topics before we get started with pandas. Today, we will cover:

Slicing

User-defined functions

Objects and TAB completion

1. Slicing (top)¶

Slicing is an important part of python life. We slice a list (or a tuple or a string) when we take a subset of it. As you can probably imagine, slicing will be a common thing we do with data. We often want to grab slices of the data set and analyze them.

The slice syntax uses square brackets --- even if we are slicing a string or a tuple. The basic command is

some_list[start:stop:stride]

start is the first element to include in the slice
stop is the first element we do NOT include
stride is the step size

Notice that the start is inclusive and the stop is exclusive. Think of a slice as a half open interval in mathematics: [start, stop) we include start in the interval but exclude stop.

The default stride is 1, meaning take every element from [start, stop).

In [1]:

some_list = [5, 6, 7, 8, 9]

print(some_list[0:2])   # indexes start with zero; stride defualts to 1
print(some_list[0:2:1]) # this should be the same
print(some_list[0:5:2]) # take every other element

[5, 6]
[5, 6]
[5, 7, 9]

In [2]:

# take a slice out of the middle
print(some_list[1:3])     #take the second element and the third element

[6, 7]

If we want to take a start and then 'everything to the end' we just leave the second argument blank. A similar sytax for taking everything from the beginning to a stop.

In [3]:

print(some_list[2:])     # the third element to the end of the list
print(some_list[:4])     # everything up to but not including the fifth element

[7, 8, 9]
[5, 6, 7, 8]

One nice thing about this half open interval syntax is that we can divide up a list very neatly:

In [4]:

first_part = some_list[:3]
second_part = some_list[3:]
print(first_part, second_part, some_list)

[5, 6, 7] [8, 9] [5, 6, 7, 8, 9]

Slice arguments can be negative. When we use a negative number for start or stop, we are telling python to count from the end of the list.

In [5]:

print(some_list[:-1])    # all but the last one
print(some_list[:-2])    # all but the last two
print(some_list[-4:-2])    # ugh (again, we don't take the -2 value)

# [5 | 6 | 7 | 8 | 9]     # The list
# -5  -4  -3  -2  -1      # backwards counting
#  0   1   2   3   4      # forwards counting

[5, 6, 7, 8]
[5, 6, 7]
[6, 7]

If we use a negative number for the stride arguement, we iterate backwards.

In [6]:

print(some_list[::-1])   # print the list out backwards
print(some_list[4:1:-1]) # we are counting backwards, so be careful about start and stop
                         # start at the [4] element in the list and end at the [2]

[9, 8, 7, 6, 5]
[9, 8, 7]

In [7]:

# don't forget, we can do this with strings, too
slogan = 'onward'
print(slogan[:2])       # just print 'on'
print(slogan[::-1])     # backwards

on
drawno

Practice: Slicing ¶

Take a few minutes and try the following. Feel free to chat with those around you if you get stuck.

Create the variable boss = 'Kirby Smart'

In [8]:

boss = 'Kirby Smart'

Slice boss to create the variables first_name and second_name

In [9]:

first_name = boss[:5]
last_name = boss[6:]
print('First name:', first_name)
print('Last name:', last_name)

First name: Kirby
Last name: Smart

Redo question two to create first_name_neg and last_name_neg by slicing boss using the negative number notation that counts from the end of the list.

In [10]:

first_name = boss[:-5]
last_name = boss[-5:]
print('First name:', first_name)
print('Last name:', last_name)

First name: Kirby 
Last name: Smart

Consider this list of sorted data.

x_sorted = [10, 40, 100, 1000, 50000]

Print out the 3 largest elements

In [11]:

x_sorted = [10, 40, 100, 1000, 50000]
print(x_sorted[-2:])

[1000, 50000]

Print out the two smallest elements

In [12]:

print(x_sorted[:2])

[10, 40]

2. User-defined functions (top)¶

We have seen some of python's built-in functions: print(), type, and len(). Like many other langauges, python allows users to create their own functions.

Using functions lets us (or someone else) write and debug the code once --- then we can reuse it. Very powerful stuff. Here is a simple example:

In [13]:

def lb_to_kg(pounds):
    """
    Input a weight in pounds. Return the weight in kilograms.
    """
    
    kilos = pounds * 0.453592                  # 1 pound = 0.453592 kilos...
    
    return kilos                               # this is the value the function returns

When you run the cell above, it looks like nothing happened, but python read the code and created the function. We can use the whos statement (a jupyter notebook 'magic' command) to learn about what objects are in the namespace. [A namespace is a list of all the objects we have created and the names we have assigned them.]

In [14]:

whos

Variable      Type        Data/Info
-----------------------------------
boss          str         Kirby Smart
first_name    str         Kirby 
first_part    list        n=3
last_name     str         Smart
lb_to_kg      function    <function lb_to_kg at 0x000001B45AD9B8B0>
second_part   list        n=2
slogan        str         onward
some_list     list        n=5
x_sorted      list        n=5

We can see the variables we have created earlier as well as the function lb_to_kg. Notice functions are of type function. Just like any other variable, lb_to_kg is loaded into the namespace.

Now that our function is defined, we are ready to use it.

In [15]:

car_weight_pounds = 5000
car_weight_kilos = lb_to_kg(car_weight_pounds)
print('The car weighs', car_weight_kilos, 'kilos.')

The car weighs 2267.96 kilos.

Since it is our function, we have to handle potentially bad inputs, or python will throw an error.

In [16]:

truck_weight_pounds = '5000'       #A classic problem with real data
truck_weight_kilos = lb_to_kg(truck_weight_pounds)
print('The truck weighs', truck_weight_kilos, 'kilos.')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [16], in <cell line: 2>()
      1 truck_weight_pounds = '5000'       #A classic problem with real data
----> 2 truck_weight_kilos = lb_to_kg(truck_weight_pounds)
      3 print('The truck weighs', truck_weight_kilos, 'kilos.')

Input In [13], in lb_to_kg(pounds)
      1 def lb_to_kg(pounds):
      2     """
      3     Input a weight in pounds. Return the weight in kilograms.
      4     """
----> 6     kilos = pounds * 0.453592                  # 1 pound = 0.453592 kilos...
      8     return kilos

TypeError: can't multiply sequence by non-int of type 'float'

In [ ]:

def lb_to_kg_v2(pounds):
    """
    Input a weight in pounds. Return the weight in kilograms.
    """
    
    if type(pounds)==float or type(pounds)== int:  # check that the input 'pounds' is an allowable type
        kilos = pounds * 0.453592                  # 1 pound = 0.453592 kilos...
        return kilos                               # this is the value the function returns
    else:
        print('error: lb_to_kg_v2 only takes integers or floats.')
        return -99

In [17]:

truck_weight_pounds = '5000'       #A classic problem with real data
truck_weight_kilos = lb_to_kg_v2(truck_weight_pounds)
print('The truck weighs', truck_weight_kilos, 'kilos.')

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [17], in <cell line: 2>()
      1 truck_weight_pounds = '5000'       #A classic problem with real data
----> 2 truck_weight_kilos = lb_to_kg_v2(truck_weight_pounds)
      3 print('The truck weighs', truck_weight_kilos, 'kilos.')

NameError: name 'lb_to_kg_v2' is not defined

How much time you spend writing code that is safe from errors is a tradeoff between your time and how robust your code needs to be. Life is all about tradeoffs.

We can have functions with several input variables:

In [18]:

def name_fixer(first, middle, last):
    """
    Fix any capitalization problems and create a single variable with the complete name.
    """
    return first.title() + ' ' + middle.title() + ' ' + last.title()           # the sting method title() makes the fist letter capital

In [19]:

mascot_first = 'HarRy'
mascot_middle = 'the'
mascot_last = 'DaWg'

full_name = name_fixer(mascot_first, mascot_middle, mascot_last)
print(full_name)

Harry The Dawg

Important: We can also asign several return variables. This is called multiple assignment. First, let's look at multiple assingment outside of a function, then we use it in a function.

In [20]:

# this is an example of multiple assignment. 
a, b = 'foo', 10            #assign 'foo' to a and 10 to b...all in one statement
print(a, b)

foo 10

Back on day one, we worked on the following problem: "In a code cell, set m=2 and n=3. Write some code that swaps the values of m and n."

Back then, we created a temp variable to help us make the swap. Now that we have some python under our belts we can just do this:

In [21]:

m = 2
n = 3  #I could have used multiple assignment here, too, but didn't
print('m=', m, 'n=', n)

m, n = n, m                  # make the swap
print('m=', m, 'n=', n)

m= 2 n= 3
m= 3 n= 2

Multiple assignment let's us return several objects from a function.

In [22]:

def temp_converter(temp_in_fahrenheit):
    """
    Takes a temperature in fahrenheit and returns it in celsius and in kelvin.
    """
    temp_in_celsius = (temp_in_fahrenheit - 32) * 5/9
    temp_in_kelvin = (temp_in_fahrenheit + 459.67) * 5/9
    return temp_in_celsius, temp_in_kelvin


# Note that I am defining the function and using it in the same code cell. 
# The code below is NOT part of the function definition. We can see that because it is not indented. 

t_f = 65        #temp in 
t_c, t_k = temp_converter(t_f)
print(t_f, f'degrees fahrenheit is {t_c:6.2f} degrees celsius and {t_k:6.2f} degrees kelvin.')

65 degrees fahrenheit is  18.33 degrees celsius and 291.48 degrees kelvin.

A comment about memory: Think of functions as a temporay list of instructions. I say "temporary" because any objects created in the function are only available in the function and are not objects created in memory. Objects in memory, however, are available to the function. Consider the following revision to the temp_converter function:

In [23]:

def temp_converter(temp_in_fahrenheit):
    """
    Takes a temperature in fahrenheit and returns it in celsius and in kelvin.
    """
    temp_in_celsius = (temp_in_fahrenheit - 32) * 5/9
    temp_in_kelvin = (temp_in_fahrenheit + 459.67) * 5/9
    print(boss)    
    return temp_in_celsius, temp_in_kelvin

We created the object boss at the beginning of the lecture so it's sitting in memory and can be accessed by the function -- even though it makes no sense to access it in the function. Note that when we ran the function above, it did not print out boss even though the object exists in memory. All we did is define (ie, def) a function object called temp_converter.

In [24]:

t_f = 65        #temp in 
t_c, t_k = temp_converter(t_f)
print(t_f, f'degrees fahrenheit is {t_c:6.2f} degrees celsius and {t_k:6.2f} degrees kelvin.')

Kirby Smart
65 degrees fahrenheit is  18.33 degrees celsius and 291.48 degrees kelvin.

Let's try the opposite: Print out the variables temp_in_celsius (or temp_in_kelvin) which are created in the function.

In [25]:

print(temp_in_celsius)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [25], in <cell line: 1>()
----> 1 print(temp_in_celsius)

NameError: name 'temp_in_celsius' is not defined

We get an error because the function doesn't actually create objects to be stored in memory. Any objects it creates are stored temporarilly while the function is running.

Practice: Functions ¶

Take a few minutes and try the following. Feel free to chat with those around you if you get stuck.

Write a change counting function. Pass the function the number of pennies, nickels, dimes, and quarters, and return the value of the coins. Test it with 5 pennies, 4 dimes, 2 quarters.

In [26]:

def change(pennies,nickels,dimes,quarters):
    val = pennies + 5*nickels + 10*dimes + 25*quarters
    return val
    
p = 5
n = 0
d = 4
q = 2
print(f'Total value of the change: {change(p,n,d,q):2d} cents.')

Total value of the change: 95 cents.

Modify the name_fixer() function to return both the fixed-up full name and the length of the full name. Use multiple assignment.

In [27]:

def name_fixer(first, middle, last):
    """
    Fix any capitalization problems and create a single variable with the complete name.
    """
    full_name = first.title() + ' ' + middle.title() + ' ' + last.title()
    full_name_len = len(first) + len(middle) + len(last)
    return full_name, full_name_len, len(full_name)

mascot_first = 'HarRy'
mascot_middle = 'the'
mascot_last = 'DaWg'

full_name, full_name_length0, full_name_length1  = name_fixer(mascot_first, mascot_middle, mascot_last)
print(full_name)
print(f'{full_name_length0:2d} characters w/o spaces.')
print(f'{full_name_length1:2d} characters with spaces.')

Harry The Dawg
12 characters w/o spaces.
14 characters with spaces.

The split(delim) string method breaks up a string into sub-strings. The argument delim defines the delimiting character. For example

In [28]:

test_string = 'There is a place where the sidewalk ends'
test_string_chunks = test_string.split(' ')        #use the space as the delimiter
print(type(test_string_chunks))
print(test_string_chunks)

<class 'list'>
['There', 'is', 'a', 'place', 'where', 'the', 'sidewalk', 'ends']

Write a function that takes names of the form 'last,first,middle' and returns three strings: first, middle, and last. Test your function with 'Silverstein,Sheldon,Allan'.

In [29]:

def name_fixer_split(name):
    name_list = name.split(',')
    return name_list[1],name_list[2],name_list[0]

author = 'Silverstein,Sheldon,Allan'

first, middle, last = name_fixer_split(author)
print('First name: ', first)
print('Middle name: ', middle)
print('Last name: ', last)

First name:  Sheldon
Middle name:  Allan
Last name:  Silverstein

3. Objects and TAB completion (top)¶

Like c++ or javascript, python is an object-oriented language. This is a topic that a computer science course could devote weeks to, but our goal is understand objects enough to use them well.

Everything in python is an object. The variables we have been creating are objects. The functions we have written are objects. Objects are useful because they have attributes and methods associated with them. What attributes and methods an object has depends on the object's type. Let's take lists for example.

list_1 = ['a', 'b', 'c']
list_2 = [4, 5, 6, 7, 8]

Both lists are objects and both have type list, but their attributes are different. For example list length is an attribute: list_1 is of length 3, while list_2 is of length 5.

Methods are like functions that are attached to an object. Different types of objects have different methods available. Methods implement operations that we often use with a particular data type. We access methods with the 'dot' notation.

list_1.method()

where method() is a method associated with the list type. We have been using the lower(), upper(), and title() methods of the string class already.

In [30]:

list_1 = ['a', 'c', 'b']
print(list_1)

['a', 'c', 'b']

In [31]:

list_1.sort()        # the sort() method form the 'list' type

print(list_1)

['a', 'b', 'c']

How do we find out what methods are available for an object? Google is always a good way.

Important: We can also use TAB completion in jupyter. Type list_1. in the cell below and hit the TAB key.

In [ ]:

The TAB gives us a list of possible methods. We have already seen append(). reverse() looks interesting. Let's give it a try.

In [32]:

list_1.reverse()
print(list_1)

['c', 'b', 'a']

TAB completion is also there to make it easier to reference variables in the namespace. Start typing lis and hit tab. It should bring up a list of variables in the namespace that start with 'lis'. This is handy: it saves typing and avoids errors from typos.

Practice: Objects and TAB completion ¶

Take a few minutes and try the following. Feel free to chat with those around you if you get stuck.

Suppose you have data gdp = '18,570.50'. Convert the variable to a float. Use TAB completion (and Google, if needed) to find a method that removes the comma.

In [33]:

gdp = '18,570.50'
print('Before: ',gdp)
gdp = gdp.replace(',','')
print('After: ',gdp)

Before:  18,570.50
After:  18570.50

Sort the list scores = [50, 32, 78, 99, 39, 75] and use TAB completion (.) and the object inspector (?) to insert new_score into the list in the correct position so that the list stays sorted.

In [34]:

scores = [50, 32, 78, 99, 39, 75]
scores.sort()
print(scores)

[32, 39, 50, 75, 78, 99]

In [35]:

new_score = 85
scores.insert(5,new_score)  # of course, we could have appended 'new_score' to 'scores' and then sorted.
print(scores) 

[32, 39, 50, 75, 78, 85, 99]

Lecture 6: Slicing and Functions [SUGGESTED SOLUTIONS]

1. Slicing (top)¶

Practice: Slicing ¶

2. User-defined functions (top)¶

Practice: Functions ¶

3. Objects and TAB completion (top)¶

Practice: Objects and TAB completion ¶

Jeff Thurk // jeff.thurk@uga.edu // Department of Economics // University of Georgia