Topics: Using matplotlib & numpy: plotting mathematical functions and arrays
Today's lab focuses on other features of the popular matplotlib module: in particular, drawing functions and plotting data.
y = log(x) or y = √ x ?
To test out this question, we will write a program that:
import math import matplotlib.pyplot as pltSince it's unwieldly to type "matplotlib.pyplot" before every function we'd like to use from that library, instead we'll use the common abbreviation of "plt". With this, we can plt.plot(), instead of matplotlib.pyplot.plot().
x = range(1,101)Remember: Python starts counting at 0 and goes up to, but not including the 101. So, this creates the list [1,2,...,100].
y1 = [] for i in x: y = math.log(i) y1.append(y) y2 = [] for i in x: y = math.sqrt(i) y2.append(y)We need two separate lists since we have two separate functions to graph.
How could you rewrite the above using list comprehensions?
plt.plot(x,y1,label='y1 = log(x)') plt.plot(x,y2,label='y2 = sqrt(x)') plt.legend()Creates the plot for safe keeping but does not display it until told to (see next lines).
plt.show()This line pops up the new graphics window to display the plots.
From your plots, which do you think grows faster: log(x) or sqrt(x)?
It is commonly used with the abbreviation, np:
import numpy as np import matplotlib.pyplot as plt
Numpy's main object is a homogeneous multidimensional array. That is, sequences of similar objects in most any dimension. You can access the elements in the arrays by tuples of positive numbers. For example, if you have a 10 x 10 grid of numbers, it could be stored in a 2-dimensional array, and each element could be accessed by specifying the row and column using a tuple: (1,2) would give the element at row 1 and column 2. The number of dimensions is called the rank of the array and the dimensions are often called axes.
Try the following at the prompt:
import numpy as np import matplotlib.pyplot as plt a = np.array([2,3,4]) #Sets up a 1-dimensional array with 3 elements print a b = np.array(range(100)) #Not common but does work print b c = np.linspace(0,99,100) #More common, linearly space 100 numbers between 0 and 99, inclusive print c
Let's use these commands to create the simple plots from above:
x = np.linspace(1,100,100) y1 = np.log(x) y2 = np.sqrt(x)We need to use numpy's log and square root functions since those can handle arrays as inputs (the ones in the math library are expecting just single numbers, not np.arrays).
We can plot these in the same way as before:
plt.plot(x,y1) plt.plot(x,y2) plt.show()
Side Note: As we just demonstrated, the matplotlib library accepts both lists of numbers and arrays of numbers from the numpy library. We will go into more detail about arrays in the next lab.
Using the Python program you wrote above, try the following:
Next, Let's focus on the question: "Has Lyme Disease Increased?" and examine data from the Center for Disease Control (CDC) to answer that question. Let's start with the tri-state area. Here are the years and occurrences:
years = [2003,2004,2005,2006,2007,2008,2009,2010,2011] ny = [5399,5100,5565,4460,4165,5741,4134,2385,3118] nj = [2887,2698,3363,2432,3134,3214,4598,3320,3398] ct = [1403,1348,1810,1788,3058,2738,2751,1964,2004]
To plot New York data as a `scatter plot' (dots at each (x,y) point), we add the commands:
import matplotlib.pyplot as plt #Library of plotting functions plt.scatter(years, ny) plt.show()
matplotlib.pyplot.plot(years, ny, label='NY')Add in labels for New Jersey and Connecticut data. You can then display a legend by adding the command:
plt.legend()Add axis labels and a title:
plt.title("Lyme Disease in NY, NJ, & CT") plt.xlabel('Years') plt.ylabel('Number of Cases')
The data file statesSummary.csv is from the CDC. Before starting the program, open up the csv file and see what it looks like.
import matplotlib.pyplot as plt import numpy as np import csv
infile = open('statesSummary.csv','r')
infile = open('statesSummary.csv','rU') reader = csv.reader(infile) yearLine = reader.next() years = [int(w) for w in yearLine[1:]]
for i in range(5): line = reader.next() stateName = line[0] stateValues = [int(w) for w in line[1:]] color = (0.0, i/5.0, 0.0) plt.scatter(years, stateValues, c=color, label=stateName)
plt.title("Cases of Lyme Disease") plt.xlabel('Years') plt.ylabel('Number of Cases') plt.legend(loc = 2, fontsize = 'x-small') plt.show()
color = (1.0,0.0,i/5.0) #Color tuple: 100% red, 0% green, and blue increasing with iWhat happens?
The basemap webpage has an example of coloring in states by population, fillstates.py. Once you have the 3 files with the shapes of the states, you can run this program to see the map (don't worry to much about all the details, we will go through in detail a simpler one first).
We will use a simpler version of it to map the Lyme Disease data, statesFilled.py.
For each state, we will need the total number of incidences. We start out as before:
import matplotlib.pyplot as plt import numpy as np import csv infile = open('statesSummary.csv','r') reader = csv.reader(infile) yearLine = reader.next() years = [int(w) for w in yearLine[1:]]for each state, we'll save the name and total to a list:
stateNames = [] stateTotals = [] for row in reader: stateNames.append(row[0]) stateTotals.append(sum([int(r) for r in row[1:]]))
Note: The use of two 'parallel' arrays, stateNames and stateTotals, is not the best programming practice. Instead, since the information is linked (i.e. the ith state total is for the ith state name), we should store them in a linked way. In the future, we will (using a dictionary which will be covered in next week's lecture), but for now, this works!
We will scale every state total to be a fraction of the highest total:
maxCases = float(max(stateTotals)) scaledTotals = [i/maxCases for i in stateTotals]Now, let's add in the plotting of each state. The first part is as before:
import matplotlib.pyplot as plt from mpl_toolkits.basemap import Basemap from matplotlib.patches import Polygon # create the map map = Basemap(llcrnrlon=-119,llcrnrlat=22,urcrnrlon=-64,urcrnrlat=49, projection='lcc',lat_1=33,lat_2=45,lon_0=-95) # load the shapefile, use the name 'states' map.readshapefile('st99_d00', name='states', drawbounds=True) ax = plt.gca() # get current axes instance # collect the state names from the shapefile attributes so we can # look up the shape obect for a state by it's name names = [] for shape_dict in map.states_info: names.append(shape_dict['NAME'])What changes is how we add colors to each state. The line c = ... sets the color to be 100% red, a percentage of green that's based on the scaled totals, and 100% blue. When the scaled total for a state is low, this is close to 100% red, 100% green, and 100% blue, which is white (on the computer, colors mix like light, instead of the traditional paint-- that is, as you add more, instead of getting darker (like paint), it gets lighter). When the scaled total for a state is high, the color is still 100% red and blue but the green decreases, so, the color appears more purple:
#For each state that we have Lyme Disease data: for i in range(len(stateNames)): print "Plotting", stateNames[i] seg = map.states[names.index(stateNames[i])] c = (1.0,1.0-scaledTotals[i],1.0) poly = Polygon(seg, facecolor=c,edgecolor='black') ax.add_patch(poly) plt.show()(The whole file is in lymeMapped.py).
For each lab, you should submit a lab report by the target date to: kstjohn AT amnh DOT org. The reports should be about a page for the first labs and contain the following:
Target Date: 7 March 2016
Title: Lab 6:
Name & Email:
Purpose: Give summary of what was done in this lab.
Procedure: Describe step-by-step what you did (include programs or program outlines).
Results: If applicable, show all data collected. Including screen shots is fine (can capture via the Grab program).
Discussion: Give a short explanation and interpretation of your results here.
This course will use the on-line Rosalind system for submitting programs electronically. The password for the course has been sent to your email. Before leaving lab today, complete the first two challenges.