MfA Workshop: Python in the City

Katherine St. John
Fall 2019

What's the noisiest street in the city? And which streets have the most collisions? What trees are most commonly planted in the streets in your neighborhood? What parts of the city are above the 100-year flood level? We will explore these and related questions using Python and publicly available data about New York City. This mini-course is organized into three sessions, each focused on a challenge, that introduces intermediate programming concepts using the Python programming language and popular packages that allow the analysis of structured data and the visualization via graphs and HTML navigable maps. The workshop, via the challenges and variations, provides a multitude of scalable projects for use in the classroom.

Prerequisites: Assumes basic knowledge of Python: familiarity with basic variable types (integers, real numbers, characters, strings and lists), input/output, definite loops (for-loop), and decisions.

Session 1: Elevation Maps (Arrays & Images)

Wednesday, 6 November 2019

Our first challenge is to build a 'flood map' of New York City metro area. We start by exploring what data is needed and how to present the information, and then introduce the numerical analysis package, numpy, to store grids (2 dimensional arrays) of elevations of the region. We then use loops and decisions to traverse the array and create an image reflecting waterways and flood regions of the metropolitan are. We introduce red-green-blue color codes to construct our flood maps. The session ends with variations on the theme of maps based on elevations, slicing (accessing subsections of arrays), and representing colors in hexadecimal codes.

Session 2: Analyzing City Data (Structured data & File I/O)

Wednesday, 20 November 2019

The focus for the second meeting is analyzing structured data. We will start with opening spreadsheets (CSV files) and graphing the data (a la Excel). The opening challenge is on historical population data of New York City and answering questions such as: how has the population changed over time? What fraction of the population lives in your borough? As well as demonstrating some of the basic statistics that are included (e.g. minimum, mean, correlation). We then look at some data sets from NYC OpenData: daily school attendance, homeless shelter populations, parking tickets and 311 data. Our follow-up questions include: what days have the highest attendance at your school? what color cars get the most parking tickets in your neighborhood? What is the most common 311 complaint?

Session 3: Mapping City Data (Using Objects & Mapping Coordinates)

Wednesday, 11 December 2019

Our opening challenge is: where do collisions occur? For this last session, we focus on mapping GIS coordinates. We introduce the folium package (a Python wrapper for leaflet.js) that makes interactive maps that be viewed in a browser (like Google maps). We then discuss how to filter data to build maps of the locations, starting with the NYC public libraries. We then work filtering the data, and plotting collisions by GIS locations and color coding by vehicles involved (did it involve a taxi? a commercial vehicle?) and time. Our second theme focuses on comparing and clustering based on distances between the data. We start with a challenge of finding catchment areas (Voronoi diagrams) for an increasing number of libraries, and discuss different algorithms and their time complexity. We explore how the results change in terms of two common distances: Euclidean (L2 metric) and taxicab (L1) metrics. Using the collion datasets, we next consider where k tow trucks should be placed at the start of rush hour to minimize delays (using k-means clustering).