Plotting Data from Files:

Often there is too much data to type into your program. In these cases, it is easier to read in the information from a file. Below is a mixture of novel and previously used commands for accessing file from data and strings. Try to puzzle each one out on paper and then try in Python.

The data file statesSummary.csv is from the CDC. Before starting the program, open up the csv file and see what it looks like.

Challenges:

Shapes of Regions

Since the plot of all 50 states was quite crowded, let's plot the colors to a map. We will use basemap, as we did last week. In addition, we will need the outline of the states, which are stored in 3 files in the basemap examples directory: st99_d00.dbf, st99_d00.shp, and st99_d00.shx.

The basemap webpage has an example of coloring in states by population, fillstates.py. Once you have the 3 files with the shapes of the states, you can run this program to see the map (don't worry to much about all the details, we will go through a simpler one first).

We will use a simpler version of it to map the Lyme Disease data, statesFilled.py.

Mapping Regions

Let's combine the two programs together so that we're filling in states with a color. We will make the state with the highest incidence of Lyme disease the darkest color, and the lowest the lightest color. To keep the visualization folks happy, we will use a gradient of a single color ('rainbow' gradients are misleading).

For each state, we will need the total number of incidences. We start out as before:

import matplotlib.pyplot as plt
import numpy as np
import csv
infile = open('statesSummary.csv','r')
reader = csv.reader(infile)
yearLine = reader.next()
years = [int(w) for w in yearLine[1:]]
for each state, we'll save the name and total to a list:
stateNames = []
stateTotals = []
for row in reader:
     stateNames.append(row[0])
     stateTotals.append(sum([int(r) for r in row[1:]]))

Note: The use of two 'parallel' arrays, stateNames and stateTotals, is not the best programming practice. Instead, since the information is linked (i.e. the ith state total is for the ith state name), we should store them in a linked way, such as a dictionary.

We will scale every state total to be a fraction of the highest total:

maxCases = float(max(stateTotals))
scaledTotals = [i/maxCases for i in stateTotals]
Now, let's add in the plotting of each state. The first part is as before:
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
from matplotlib.patches import Polygon

# create the map
map = Basemap(llcrnrlon=-119,llcrnrlat=22,urcrnrlon=-64,urcrnrlat=49,
        projection='lcc',lat_1=33,lat_2=45,lon_0=-95)

# load the shapefile, use the name 'states'
map.readshapefile('st99_d00', name='states', drawbounds=True)

ax = plt.gca() # get current axes instance

# collect the state names from the shapefile attributes so we can
# look up the shape obect for a state by it's name
names = []
for shape_dict in map.states_info:
    names.append(shape_dict['NAME'])
What changes is how we add colors to each state. The line c = ... sets the color to be 100% red, a percentage of green that's based on the scaled totals, and 100% blue. When the scaled total for a state is low, this is close to 100% red, 100% green, and 100% blue, which is white (on the computer, colors mix like light, instead of the traditional paint-- that is, as you add more, instead of getting darker (like paint), it gets lighter). When the scaled total for a state is high, the color is still 100% red and blue but the green decreases, so, the color appears more purple:
#For each state that we have Lyme Disease data:
for i in range(len(stateNames)):
     print "Plotting", stateNames[i]
     seg = map.states[names.index(stateNames[i])]
     c = (1.0,1.0-scaledTotals[i],1.0)
     poly = Polygon(seg, facecolor=c,edgecolor='black')
     ax.add_patch(poly)

plt.show()
(The whole file is in lymeMapped.py).

Challenges: