HW #11, Data Science at Lehman College, CUNY, Spring 2016

Data

For this assignment, you will need to download the following data sets:

CUNY Locations: the locations of the campuses of the City University of New York on a satelitte image of the city:
https://data.ny.gov/Education/City-University-of-New-York-CUNY-University-Campus/i5b5-imzn

We will use these data sets for later homework assignments. Since scraping the data takes time, save these data sets to use again for the future programs.

Assignment

CMP 464 Homework: MAT 456 Homework:

#1-2 Using the functions from the scipy Voronoi module, create a Voronoi diagram of the CUNY campus locations.

Make sure to include in the title in your plot.
#1: Submit your Python program as a .py file.
#2: Submit a screen shot of the graphics window containing the plot.

#3-4

	CMP 464 Homework:	MAT 456 Homework:
#1-2	Using the functions from the scipy Voronoi module, create a Voronoi diagram of the CUNY campus locations. Make sure to include in the title in your plot. #1: Submit your Python program as a .py file. #2: Submit a screen shot of the graphics window containing the plot.
#3-4	In class, we wrote the function computeNearestNeighbor(fixedPoints, newPoint) where `fixedPoints` is a list of points and `newPoint` is a new point, and the function returns the point from the list `fixedPoints` that is closest to `newPoint`. Use this function to shade an image based on distance to the nearest point in a list. Your program should: Create an array to hold an image of 400 by 400 pixels. Ask the user for a list of points. Assign a (random) color to each fixed point (a dictionary would work well here). For each 0<= i,j < 400: Compute the closest fixedPoint, `fp` to the point (i,j) Shade the point (i,j) by the color assigned to `fp` Display the image created. Save the image to a file. Make sure to include in the title in your plot. #3: Submit your Python program as a .py file. #4: Submit .png file that contains your image generated by a run of your program on the points: (100,100), (200,350), (5,395), (375,25), and (200,100). Note your plot might look upside down since on images, (0,0) is the upper left corner.
#5-7	Use the book's `clustering.py` program to produce images of Gillet Hall with 5 colors. Modify his program to compute the sum squared error of your clustering to 5 colors (i.e. the sum of the squares of the distance between each point's original color and the color assigned to it). Make sure to include in the title in your plot. #5: Submit your modified Python program as a .py file. #6: Submit the `.png` file of the image with 5 colors. #7: Include a screen shot or text file with the summed square error that you computed for the images. Hint: The k-means clustering is slow, so, it will take a while to compute the new images. The book's program includes a method for computing the squared error that reruns the classifer (which takes quite a while). Instead of recomputing, store your clusters and compute the error directly on those.

In class, we wrote the function

	computeNearestNeighbor(fixedPoints, newPoint)

where fixedPoints is a list of points and newPoint is a new point, and the function returns the point from the list fixedPoints that is closest to newPoint. Use this function to shade an image based on distance to the nearest point in a list.

Your program should:

Create an array to hold an image of 400 by 400 pixels.
Ask the user for a list of points.
Assign a (random) color to each fixed point (a dictionary would work well here).
For each 0<= i,j < 400:
Compute the closest fixedPoint, fp to the point (i,j)
Shade the point (i,j) by the color assigned to fp
Display the image created.
Save the image to a file.

Make sure to include in the title in your plot.
#3: Submit your Python program as a .py file.
#4: Submit .png file that contains your image generated by a run of your program on the points: (100,100), (200,350), (5,395), (375,25), and (200,100).
Note your plot might look upside down since on images, (0,0) is the upper left corner.

#5-7 Use the book's clustering.py program to produce images of Gillet Hall with 5 colors. Modify his program to compute the sum squared error of your clustering to 5 colors (i.e. the sum of the squares of the distance between each point's original color and the color assigned to it).

Make sure to include in the title in your plot.

#5: Submit your modified Python program as a .py file.
#6: Submit the .png file of the image with 5 colors.
#7: Include a screen shot or text file with the summed square error that you computed for the images.

Hint: The k-means clustering is slow, so, it will take a while to compute the new images. The book's program includes a method for computing the squared error that reruns the classifer (which takes quite a while). Instead of recomputing, store your clusters and compute the error directly on those.

Homework #11

CMP 464-C401/MAT 456-01:
Topics Course: Data Science
Spring 2016

Textbook's Code

Data

Assignment

Submitting Homework

Homework #11

CMP 464-C401/MAT 456-01: Topics Course: Data Science Spring 2016

Textbook's Code

Data

Assignment

Submitting Homework

CMP 464-C401/MAT 456-01:
Topics Course: Data Science
Spring 2016