Homework #11

CMP 464-C401/MAT 456-01:
Topics Course: Data Science
Spring 2016

Topics:Voronoi Diagrams & Clustering
Deadline: Thursday, 21 April 2016, 10:30am

Textbook's Code

For this assignment, the following code from the textbook will be useful:

Data

For this assignment, you will need to download the following data sets:

  1. CUNY Locations: the locations of the campuses of the City University of New York on a satelitte image of the city:
    https://data.ny.gov/Education/City-University-of-New-York-CUNY-University-Campus/i5b5-imzn

We will use these data sets for later homework assignments. Since scraping the data takes time, save these data sets to use again for the future programs.

Assignment

CMP 464 Homework: MAT 456 Homework:
#1-2 Using the functions from the scipy Voronoi module, create a Voronoi diagram of the CUNY campus locations.

Make sure to include in the title in your plot.
#1: Submit your Python program as a .py file.
#2: Submit a screen shot of the graphics window containing the plot.
#3-4 In class, we wrote the function
	computeNearestNeighbor(fixedPoints, newPoint)
where fixedPoints is a list of points and newPoint is a new point, and the function returns the point from the list fixedPoints that is closest to newPoint. Use this function to shade an image based on distance to the nearest point in a list.

Your program should:

  1. Create an array to hold an image of 400 by 400 pixels.
  2. Ask the user for a list of points.
  3. Assign a (random) color to each fixed point (a dictionary would work well here).
  4. For each 0<= i,j < 400:
  5.    Compute the closest fixedPoint, fp to the point (i,j)
  6.    Shade the point (i,j) by the color assigned to fp
  7. Display the image created.
  8. Save the image to a file.
Make sure to include in the title in your plot.
#3: Submit your Python program as a .py file.
#4: Submit .png file that contains your image generated by a run of your program on the points: (100,100), (200,350), (5,395), (375,25), and (200,100).
Note your plot might look upside down since on images, (0,0) is the upper left corner.
#5-7 Use the book's clustering.py program to produce images of Gillet Hall with 5 colors. Modify his program to compute the sum squared error of your clustering to 5 colors (i.e. the sum of the squares of the distance between each point's original color and the color assigned to it).

Make sure to include in the title in your plot.

#5: Submit your modified Python program as a .py file.
#6: Submit the .png file of the image with 5 colors.
#7: Include a screen shot or text file with the summed square error that you computed for the images.

Hint: The k-means clustering is slow, so, it will take a while to compute the new images. The book's program includes a method for computing the squared error that reruns the classifer (which takes quite a while). Instead of recomputing, store your clusters and compute the error directly on those.

Submitting Homework

To submit your homework, log on to the Blackboard system, and go to "Homework". For each part of the homework, there is a separate input box. You may submit the homework as many times as you would like before the deadline.