Topics: Modular Design, Analyzing sorts by running time on varying data sets (sorted, almost sorted, and random data); Traversing Trees with Recursive Functions;

Designing Modular Programs

Today's lab has several themes:

Let's start with designing a general sorting program that we can use as a framework. The basic outline of our sorting program will be:

  1. Get list of items to sort.
  2. Sort list.
  3. Print sorted list.

We will translate this list into a "main" function, and then fill in the pieces. As we fill in the pieces, if easier to test each part ('unit test') independently, then waiting to test at the end (since if there's a problem at the end, we have to search through all the code to find the problem). Before we start our main, write down what the inputs and outputs for each part are:

  1. Get list of items to sort.
    Inputs: nothing obvious
    Outputs: the list to sort
  2. Sort list.
    Inputs: the original list (sorts it in place)
    Outputs: nothing obvious
  3. Print sorted list.
    Inputs: the sorted list
    Outputs: nothing obvious
We start by creating a main function and including our "To Do" list as comments:
def main():
	print "Welcome to the sorting program!"
	#Get list of items to sort.
	#Sort list.
	#Print sorted list

if __name__ == "__main__":
	main()

Next, we'll add in empty functions (or "stubs") for each item on our annotated To Do list. We have included "dummy" variables corresponding to the inputs and outputs from the annotated list:

def getInputList():
	"""
	No inputs
	Returns the list to sort
	"""
	a = []
	return a
	
def sortList(a):
	"""
	Takes a list and sorts it in place
	"""

	
def printList(b):
	"""
	Prints sorted list.
	No outputs
	"""

def main():
	print "Welcome to the sorting program!"
	#Get list of items to sort.
	a = getInputList()
	#Sort list.
	sortList(a)
	#Print sorted list
	printList(a)

if __name__ == "__main__":
	main()

Make sure the program runs (you can also look at the completed program at sortingLab.py) before continuing.

Now, from the To Do list, let's write the easiest functions first. For printList(), we will just use a simple print() (we can improve on it later):

def printList(b):
	"""
	Prints sorted list.
	No outputs
	"""
	print "The sorted list is:"
	print b

The next easiest function is the getInputList. To start, let's ask the user for a list, separated by spaces (again, we can improve this function later and have it read from a file for really large lists):

def getInputList():
	"""
	No inputs
	Returns the list to sort
	"""
	s = raw_input('Please enter a list, separated by spaces: ')
	a = [int(w) for w in s.split()]
	return a

(We're assuming that we're sorting numbers-- if not we can remove the "int() from the loop comprehension.)

For our sorting algorithm, we'll write a couple from pseudo-code. Here's a popular one that we haven't discussed: selectionSort (but is in the textbook):

  1. Search the list for the largest element.
  2. Move it to last position.
  3. Search the remaining list for the largest element.
  4. Move to second to last position.
  5. Repeat until list is sorted.

Let's translate that to pseudocode:

  1. For each i, starting at len(aList)-1 and decrementing to 1.
    1. Find the index of the largest element in the list aList[0:i+1]
    2. Swap the largest element with the element at position i.

Try writing out each part (note finding the index of the largest element will take another loop). Then try out your code (or look below at the solution from the book).

def selectionSort(alist):
   for fillslot in range(len(alist)-1,0,-1):
       positionOfMax=0
       for location in range(1,fillslot+1):
           if alist[location]>alist[positionOfMax]:
               positionOfMax = location

       temp = alist[fillslot]
       alist[fillslot] = alist[positionOfMax]
       alist[positionOfMax] = temp

Try on several cases to make sure your program works. When choosing test cases, try 'border cases' where the items are close to the edges of the arrays or the edges of the conditional tests (broadly interpreted).

Challenges:

Timing Functions

In lecture, we analyzed running times theoretically. We discussed worst-case time bounds for several of the algorithms. But how does that work in practice? We saw the sorting algorithms demonstration that showed that different types of input yield very different running times. For example, while the worst case analysis says InsertionSort is O(n^2) and MergeSort is O(n log n), when the data is nearly sorted, then InsertionSort finishes more quickly.

Python provides several built-in methods for timing functions. While there is a complete module for analyzing complete programs to determine which functions are using the most time (called ``profiling'' the code), we will use a simpler function, timeit() that times how long a single function takes. In the simplest form, we can time how long a command will take:

timeit() runs in its own environment to give as accurate a measurement as possible. The environment knows about built-in functions, but to use other functions, we need to "import" them into the environment. For example, the following will time a simple function that compute the factorial:

import timeit

def factR(n):
     if n <= 1:
          return 1
     else:
          return n*factR(n-1)

if __name__ == '__main__':
    import timeit
    print(timeit.timeit("factR(100)", setup="from __main__ import factR", number=10))
The last parameter controls the number of times it runs the function to compute the average. The default value is 1,000,000 runs. The function returns the amount of time, in seconds, that it took to run through the code the specified number of times.

Challenges

Recursion vs. Iteration

Everything you can write with recursion, you can write with iteration (just loops). Should you? Each function call requires Python to set up a new frame that keeps track of the parameters passed and current state. For small datasets, that a small portion of the overall running time, but what happens for larger computations?

Let's use a very simple algorithm to test recursion versus iteration running times:

def factR(n):
    if n <= 1:
        return 1
    else:
        return n*factR(n-1)
and
def factI(n):
    prod = 1
    for i in range(2,n):
        prod = i*prod
    return prod

Challenges



Side Note: In practice, after you have a program running, you profile the code to determine which functions are using the most time. Then, you look at the most time-intensive function and see if there are ways to improve its efficiency. Sometimes, this can be accomplished by simple changes to the code. Other times, it might be necessary to rewrite the entire function. For time-critical programs, this could mean rewriting code in a specialized language for the task or one that allows low-level manipulation to optimize efficiency (e.g. c or assembly language).

Lab Report

For each lab, you should submit a lab report by the target date to: kstjohn AT amnh DOT org. The reports should be about a page for the first labs and contain the following:

Target Date: 6 April 2016
Title: Lab 9: Sorting & Recursion
Name & Email:

Purpose: Give summary of what was done in this lab.
Procedure: Describe step-by-step what you did (include programs or program outlines).
Results: If applicable, show all data collected. Including screen shots is fine (can capture via the Grab program).
Discussion: Give a short explanation and interpretation of your results here.

Using Rosalind

This course will use the on-line Rosalind system for submitting programs electronically. The password for the course has been sent to your email. Before leaving lab today, complete the first two challenges.