Homework #1

CMP 464-C401/MAT 456-01:
Topics Course: Data Science
Spring 2016

Topics: Simple plots using matplotlib, scraping data from WeatherUnderground
Deadline: Thursday, 11 February 2016, 10:30am

Getting Started with matplotlib

This homework uses matplotlib which is available with anaconda or visit matplotlib.

If you would prefer not to upgrade your Python installation (or would like to work through a browswer), there are several web-based services. One that already has the libraries were are using is PythonAnywhere.

Weather Data

Built-in to Python are functions for downloading pages ('scraping data') directly from the web. We will use the urllib2 library to plot historical weather data.

We will use just one function from urllib2, urlopen(), which takes as input a URL (uniform record locator of a web page) and opens the page for reading. The format is:

	import urllib2
	page = urllib2.urlopen("http://lehman.edu")
Once the page variable is set up, it can be used just like a file variable. For example, you can read all the lines into a list of strings:
	lines = page.readlines()
in the same way as a file.

We can also combine data from multiple pages into a single program. We will use Weather Underground's historical weather data to plot temperatures. The idea is:

The hard part is figuring out the URL for the webpages. Let's look at the URLs

http://www.wunderground.com/history/airport/KLGA/2000/02/02/DailyHistory
http://www.wunderground.com/history/airport/KLGA/2001/02/02/DailyHistory
http://www.wunderground.com/history/airport/KLGA/2002/02/02/DailyHistory
http://www.wunderground.com/history/airport/KLGA/2003/02/02/DailyHistory
http://www.wunderground.com/history/airport/KLGA/2004/02/02/DailyHistory

The only thing that changes is the year, the suffix before it and the prefix after it stay the same. Let's store those in variables and then loop through the years:

    prefix = "http://www.wunderground.com/history/airport/KLGA/"
    suffix = "/07/07/DailyHistory"
    for year in range(2000,2015):
        url = prefix+str(year)+suffix
        ...

Each time through the loop, the url variable will hold the prefix+str(year)+suffix.

Try running the program, weather.py and then start the assignment.

Assignment

The work to be submitted differs by whether you are enrolled in the computer science or mathematics course.

CMP 464 Homework: MAT 456 Homework:
#1-2 Using the above as a starting point, use matplotlib to produce a plot of the high temperature over the last 25 years for your birthday. For example, if you were born on February 2, then your plot would be the same as the first plot of the sample program weather.py. Make sure to change the title of your plot to include your name and birthday.

#1: Submit your Python program as a .py file.
#2: Submit a screen shot of the graphics window containing the plot.

Note: You will use this same data set below. Since scraping the data takes the most time of running the program, save it and use it again for the programs.
#3-4 Modify the above program to plot both the minimum and maximum temperature for the last 25 years for your birthday.

#3: Submit your Python program as a .py file.
#4: Submit a screen shot of the graphics window containing the plot.
Using the data you collected, compute the average high temperature for your birthday over the last 20 years. Plot the maximum temperature from above as well as a constant line representing the average temperature (i.e. y = ave).

#3: Submit your Python program as a .py file.
#4: Submit a screen shot of the graphics window containing the plot.
#5-6 Collect the minimum temperatures for January 2016. Display the collected data as a histogram.

#5: Submit your Python program as a .py file.
#6: Submit a screen shot of the graphics window containing the plot.

Submitting Homework

To submit your homework, log on to the Blackboard system, and go to "Homework". For each part of the homework, there is a separate input box. You may submit the homework as many times as you would like before the deadline.