In addition to directly reading in CSV files. Pandas has a built-in function for reading in JSON files. Since many of the team projects will likely want to extract data from JSON files, this section has more on the format. To illustrate the concepts, we will work through an example building a map of CitiBike stations from their real-time data feed.
New York City's bike share program, CitiBike, provides extensive data about their system. This includes a real-time feed of the status of stations in the system. The feed is in JSON. Here's the beginning of the file:
{"executionTime":"2017-03-21 10:37:12 PM", "stationBeanList":[ {"id":72,"stationName":"W 52 St & 11 Ave","availableDocks":18,"totalDocks":39,"latitude":40.76727216,"longitude":-73.99392888, "statusValue":"In Service", "statusKey":1,"availableBikes":19,"stAddress1":"W 52 St & 11 Ave","stAddress2":"","city":"","postalCode":"","location":"","altitude":"","testStation":false, "lastCommunicationTime":"2017-03-21 10:36:30 PM","landMark":""}, {"id":79,"stationName":"Franklin St & W Broadway","availableDocks":31,"totalDocks":33,"latitude":40.71911552,"longitude":-74.00666661,"statusValue":"In Service", "statusKey":1,"availableBikes":0,"stAddress1":"Franklin St & W Broadway","stAddress2":"","city":"","postalCode":"","location":"","altitude":"","testStation":false, "lastCommunicationTime":"2017-03-21 10:33:23 PM","landMark":""}, {"id":82,"stationName":"St James Pl & Pearl St","availableDocks":23,"totalDocks":27,"latitude":40.71117416,"longitude":-74.00016545,"statusValue":"In Service", "statusKey":1,"availableBikes":4,"stAddress1":"St James Pl & Pearl St","stAddress2":"","city":"","postalCode":"","location":"","altitude":"","testStation":false, "lastCommunicationTime":"2017-03-21 10:34:12 PM","landMark":""},
It begins with the time the file was created, followed by information about each station in a field marked stationBeanList. Each entry is organized as dictionary with (key:value) pairs. Let's look at the bean list entry for the first station:
{"id":72, "stationName":"W 52 St & 11 Ave", "availableDocks":18, "totalDocks":39, "latitude":40.76727216, "longitude":-73.99392888, "statusValue":"In Service", "statusKey":1, "availableBikes":19, "stAddress1":"W 52 St & 11 Ave", "stAddress2":"", "city":"", "postalCode":"", "location":"", "altitude":"", "testStation":false, "lastCommunicationTime":"2017-03-21 10:36:30 PM", "landMark":""}
Which ones are useful for making a map? We need latitude and longitude. For our popup message, it would also be good to give location as well as the number of bikes available and number of docks. If we called the stationBeanList entry for a station, beanList, these entries would be:
beanList['latitude'] beanList['longitude'] beanList['name'] beanList['availableBikes'] beanList['totalDocks']
To build our map, we need to:
The new part is reading in a json file, but pandas has a built-in method to do that for us: read_json():
stations = pd.read_json('https://feeds.citibikenyc.com/stations/stations.json')
The second step, creating a map object, is the same as before:
mapBikes = folium.Map(location=[40.75, -73.99],tiles="Cartodb Positron",zoom_start=14)
Extracting the information from each row has an added level, since the information is stored in the dictionary, stationBeanList. To make the lines more readable, we will save row['stationBeanList'] as the variable beanList. The rest of the for is the same as previous prorgrams:
for i,row in stations.iterrows(): beanList = row['stationBeanList'] lat = beanList['latitude'] lon = beanList['longitude'] name = beanList['stationName'] + ": " + str(beanList['availableBikes']) + " bikes available of " + str(beanList['totalDocks']) + " total bikes" print(name) if beanList['statusValue'] == 'Not In Service': name = beanList['stationName'] + ": Not In Service" i = folium.Icon(color='lightgray') else: name = beanList['stationName'] + ": " + str(beanList['availableBikes']) + " bikes available of " + str(beanList['totalDocks']) + " total bikes" if beanList['availableBikes'] < 2: i = folium.Icon(color='red') else: i = folium.Icon(color='green') folium.Marker([lat,lon],popup = name,icon = i).add_to(mapBikes) #Create the html file with the map: mapBikes.save(outfile='bikeLocations.html')
Putting this altogether gives the python program, cbStations.py.
Note: You can directly read from a URL, as we did in this program, or if you would like to work off-line, you can download the JSON file, save it locally, and use it as before.
Our first Voronoi diagram will be for libraries across the city (using the library data set from Homework #6). Our map highlights the regions closest to each library:
To make our map, we
If you are using anaconda (either spyder, idle3, or jupyter), the scipy and matplotlib packages are included. To install geojson, type at a terminal window:
pip install geojson
The program, makeVor.py, is a bit rambling, but contains all the steps above (a better design would be to split into separate functions or files for the different tasks). Try running it on the library dataset.
Note that it does very well in dense regions but has odd behavior on the edges of the map since we didn't include in the .json file any point at infinity and didn't clip the maps to the city boundaries.
Time is set aside this class for teams to meet, as well as a short presentation of each team to the class of the status of their project. The informal presentation should include a check list of what you have done and what still needs to be done as well as one visualization.