Homework #4

CMP 464-C401/MAT 456-01:
Topics Course: Data Science
Spring 2016

Topics: Distributions & Correlations
Deadline: Thursday, 3 March 2016, 10:30am

Textbook's Code

This assignment uses the basic statistics functions developed by the textbook's author and available at:

https://github.com/joelgrus/data-science-from-scratch/blob/master/code/statistics.py

Datasets

This assignment uses the following datasets:

Assignment

The work to be submitted differs by whether you are enrolled in the computer science or mathematics course.

CMP 464 Homework: MAT 456 Homework:
#1 Examine the correlation between the change in incidence of Lyme Disease in Connecticut, New Jersey, and New York. Compute the pairwise correlation-- that is ρ(CT,NJ), ρ(CT,NY), and ρ(NJ,NY). Use the textbook's code to compute the correlations between each pair of states. Include all correlations that you computed in your written answer.

#1: Submit a .txt or .pdf file with your answer.

#2-3 Use the dataset of New York City's historical population to answer the following: Which borough's change in population is most closely correlated to the city's change in population? Justify your answer. Use the textbook's code to compute the correlations between each borough and the overall city populations. Include all correlations that you computed in your written answer. For the second part include a plot of the borough population that most closely correlates and the city's population from 1790 to 2010. Make sure to include in the title of your plot the date plotted.

#2: Submit a .txt or .pdf file with your answer.
#3: Submit a screen shot of the graphics window containing the plot of the borough population that most closely correlated to the city, as well as the city's population.
#4-6 Could it be the case that drivers racing to get somewhere by the top of the hour, drive more recklessly and get in more accidents, then those who are driving just past the hour? Using the birthday data set, display the number of collisions that occur in the your birthday binned by minute. That is, for 0 (zero minutes after the hour), you should have as your y-value the fraction of: collisions that occurred on your birthday at 0 minutes after the hour over collisions. The x-axis of your plot should be the minutes from 0 to 59 (minutes after the hour), and the y-axis should be the sum of the accidents that occur at each minute after the hour. Include in your plot, a label containing the correlation, ρ(minutes,accidents).

#4: Submit your Python program as a .py file.
#5: Submit a screen shot of the graphics window containing the plot.
#6: Do minutes after the hour correlation with more accidents in your zipcode? Justify your answer. Include a .txt or .pdf file with your answer.

Hint: See Homework #3, \#3-4.
#4: Create two lists of 10 numbers, x and y such that their correlation is bounded between: 0.7< ρ(x,y) < 0.8. Submit your lists in a .txt file as well as the process that you followed to discover lists with correlation in between the bounds.

#5: Create two lists of 10 numbers, x and y such that their correlation is bounded between: -0.1< ρ(x,y) < 0.1. Submit your lists in a .txt file as well as the process that you followed to discover lists with correlation in between the bounds.

#6: Create two lists of 10 numbers, x and y such that their correlation is bounded between: -1< ρ(x,y) < -0.9. Submit your lists in a .txt file as well as the process that you followed to discover lists with correlation in between the bounds.

Submitting Homework

To submit your homework, log on to the Blackboard system, and go to "Homework". For each part of the homework, there is a separate input box. You may submit the homework as many times as you would like before the deadline.