Homework #9

CMP 464-C401/MAT 456-01:
Topics Course: Data Science
Spring 2016

Topics: Shading Maps & PCA
Deadline: Thursday, 7 April 2016, 10:30am

Data

For this assignment, you will need to download three different data sets:

  1. New York Federal Reserve Labor Market for Recent Graduates: Download the excel file at the bottom of the page and convert to CSV to use in the problem set below:
    https://www.newyorkfed.org/research/college-labor-market/college-labor-market_compare-majors.html
  2. Shapemaps for New York City School Districts: This page includes the shapefiles as well as a CSV file (useful for exploring what data fields are available for each region):
    https://data.cityofnewyork.us/Education/2013-2014-School-Zones/pp5b-95kq
  3. Test Scores for New York City School Districts: For this homework, we will be using the District "Math Data Files":
    http://schools.nyc.gov/Accountability/data/TestResults/ELAandMathTestResults

We will use these data sets for later homework assignments. Since scraping the data takes time, save these data sets to use again for the future programs.

Assignment

The work to be submitted differs by whether you are enrolled in the computer science or mathematics course.

CMP 464 Homework: MAT 456 Homework:
#1-3 Analyse the NY Fed's Labor Market Data for Recent Graduates (see link above) using a Principal Components Analysis. There are three parts to this exercise:
  1. Compute and display the covariance matrix for the data,
  2. Generate a 3D plot of the data under the first three axis of a Principal Components Analysis. On this plot, highlight (using a different color) the computer science and mathematics majors, and
  3. Include the Python code that you used to generate your plots.
Make sure to include in the title of your plot the date plotted.

#1: Submit your Python program as a .py file.
#2: Submit a text file or screen shot that includes the covariance matrix.
#3: Submit a screen shot of the graphics window containing the plot.
#4-5 Using basemap, create a map of the New York City School Districts (elementary and middle school) and shade each districts by borough (that is, all districts in the Bronx will be the same color; the districts in Brooklyn will be another color, etc.).

#4: Submit your Python program as a .py file.
#5: Submit a screen shot of the graphics window containing the plot.
#6-7 Using the New York City data for district test scores, shade your map above by percentage of students proficient in mathematics (i.e. scored a 3 or 4 on the exam-- the last column in the CSV file).

#6: Submit your Python program as a .py file.
#7: Submit a screen shot of the graphics window containing the plot.
#6: Using the New York City data for district test scores, compute the covariance matrix for District 1 schools across grades (that is, rows 8 to 28, columns G onward). Which rows are most highly correlated? Submit a typeset or neatly handwritten image of your answer.

#7: Compute the first and second axis of a Principal Components Analysis using the covariance matrix above. Submit a typeset or neatly handwritten image of your answer.

Submitting Homework

To submit your homework, log on to the Blackboard system, and go to "Homework". For each part of the homework, there is a separate input box. You may submit the homework as many times as you would like before the deadline.