RGGS 670: Algoritmic Approaches to Biological Data

Spring 2016

Lecture: Mondays: 2:30-5:30pm, Wednesdays: 1-2pm
Lab: Wednesdays: 3-5pm
Instructor: Dr. Katherine St. John

Announcements:

Useful Links:

Summary:

The emphasis of this course is to develop analytic reasoning and computational thinking skills to address novel questions that arise from analyzing biological data. It assumes no previous programming knowledge, only that the students have a graduate-level analysis skills for biological data. The course is structured in two parts. The first part is an intensive programming course in the Python programming language. For the first part, multiple small programming and laboratory exercises are required to reinforce skills. The second part of the course is organized into an in-depth exploration of scientific topics and the algorithms needed to address them. For the second part, each topic starts with small programming exercises that culminate in a project addressing how to solve a biological problem using computer programming. A final project, preferably using data from the student's thesis project, is required.

The specific topics are:

Outline:

Date:             Time:         Topics: Handouts: Reading: Target Dates:        
Wednesday,
20 January
Lecture 1,
1-2pm
Course Overview, Introduction to Python programs: printing and simple functions, definite loops, problem solving and the design process Syllabus, Slides Think Like CS, Introduction and Data chapters
Lab 1,
3-5pm
Brownian motion: Simulating simple 1D and 2D motion; Simple graphics & object oriented programming introduction: using the built-in turtle and random libraries Lab 1 Think Like CS, Debugging Interlude 1, Turtle Graphics, and Python Modules & Getting Help chapters
Monday,
25 January
Lecture 2,
2:30-5:30pm
Simple Python data; Using Python Modules: math & random libraries; Input-Process-Output (IPO) design pattern;
Accumulator Design Pattern: Using variables as accumulators. Introduction to Functions
Slides, variables example, for & range example Think Like CS, Data and Python Modules & Getting Help chapters, and Functions chapter through accumulator pattern
Wednesday,
27 January
Lecture 3,
1-2pm
Simple Input, More on Functions, using parameters and return values, Program Design: top-down design and stepwise refinement Slides, simple function example, harder function example Think Like CS, Functions chapter HW #1
Lab 2,
3-5pm
Python-as-Calculator: using the built-in math library & loops to compute values and make simple tables; Debugging: common parse errors Lab 2 Think Like CS, Debugging Interlude 1, and Functions chapters
Monday,
1 February
Lecture 4,
2:30-5:30pm
Boolean Values & Expressions, Logical Operators, Simple Decisions,
Nested Decisions, Exception Handling, Boolean Functions
Slides Think Like CS, Selection chapter Lab 1 report
Wednesday,
3 February
Lecture 5,
1-2pm
More on Iteration, Indefinite (while) Loops Slides, loop example Think Like CS, Selection and Iteration Revisited chapters HW #2
Lab 3,
3-5pm
Newick tree format: parsing and (simple) visualization of trees; Looping through strings; Using decisions Lab 3 Think Like CS, Strings chapter
Monday,
8 February
Lecture 6,
2:30-5:30pm
Computing with strings: string variables, simple string processing,
Strings as Lists, Lists as Strings, Using Files
Slides, splits & joins Think Like CS, Strings, Lists, and Files chapters Lab 2 report
Wednesday,
10 February
Lecture 7,
1-2pm
More on Files, Finding Patterns Slides, textbook on in & not in, regular expressions for biologists, regex cheat sheet Think Like CS, Strings and Files chapters HW #3
Lab 4,
3-5pm
Counting Bases, Finding motifs in sequences,
More on program design and debugging
Lab 4 Think Like CS, Strings and Files chapters
Monday,
15 February
Holiday: No Class
Wednesday,
17 February
Lecture 8,
1-2pm
More on Pattern Finding: Regular Expressions,
Processing CSV files
Slides, email.txt Think Like CS, Strings and Lists chapters Lab 3 report, HW #4-6
Lab 5,
3-5pm
Parsing CSV files; introducing matplotlib and basemap Lab 5 Think Like CS, Lists and Files chapters
Monday,
22 February
Lecture 9,
2:30-5:30pm
More on Lists: built-in functions, as parameters, nestings, comprehensions Slides, function demo, list comprehensions demo Think Like CS, Lists chapter Lab 4 write-up
Wednesday,
24 February
Lecture 10,
1-2pm
Formatting data; Strings & Lists; 2D Arrays and numpy Slides Think Like CS, Files chapter HW #7-9
Lab 6,
3-5pm
Numerical analysis and plotting with numpy and matplotlib Lab 6 Think Like CS, Files chapter
Monday,
29 February
Lecture 11,
2:30-5:30pm
Dictionaries, built-in methods, lists versus dictionaries; Hashing; Program design: using dictionaries and traversing multi-dimensional arrays efficiently Slides, dictionary demo Think Like CS, Dictionaries chapter Lab 5 report
Wednesday,
2 March
Lecture 12,
1-2pm
Using Files; Accessing Data from image files Slides,
butterfly images: bf1.png, bf2.png, bfly.png, bflyBlue.png, bflyGreen.png, bflySwarm.png,
Think Like CS, Dictionaries chapter HW #10-12
Lab 7,
3-5pm
Multi-dimensional arrays and image files Lab 7 Bioinformatics Algorithms, Volume 1, Chapter 1, Think Like CS, Dictionaries chapter,
Monday,
7 March
Lecture 13,
2:30-5:30pm
Insertion, Bubble, and Merge Sorts, Analyzing sorts by running time on varying data sets (sorted, almost sorted, and random data);
Recursion: functions that call themselves
Slides,
sorting examples, wiki insertion sort, sorting demos, nested.py
Think Like CS, Recursion chapter, Problem Solving with Algorithms, sorting and analysis chapters Lab 6 report
Wednesday,
9 March
Lecture 14,
1-2pm
Parsing Structured Data: genbank & html; Counting Revisited: Using Dictionaries to accumulate pattern counts; Scraping data from the web Slides Dictionaries chapter HW #13-15
Lab 8,
3-5pm
Counting Revisited: Using Dictionaries to accumulate pattern counts.
Parsing structured data: genbank and html files.
Lab 8 Dictionaries chapter
Monday,
14 March
Spring Break: No Class
Wednesday,
16 March
Spring Break: No Class
Monday,
21 March
No Class
Wednesday,
23 March
Lecture 15,
1-2pm
Review; Sorting by Keys and Partially Sorted Data; Recursion slides, moreSorting.py Problem Solving with Algorithms, sorting chapter HW #16-18,
Lab 7 report
Lab 9,
3-5pm
Analyzing sorts by running time on varying data sets (sorted, almost sorted, and random data); Traversing Trees with Recursive Functions; Modular Design Lab 9 Problem Solving with Algorithms, sorting chapter, Think Like CS, Recursion chapter Project Abstract
Monday,
28 March
No Class
Wednesday,
30 March
Lecture 16,
1-2pm
Program Design: reframing biological problems into computational terms; Representing graphs: adjacency matrices, adjacency lists slides Bioinformatics Algorithms, Volume 1, Chapter 3 HW #19-21
Lab 8 report
Lab 10,
3-5pm
Representing Networks & Trees: Adjacency Matrices and Adjacency Lists Lab 10 Bioinformatics Algorithms, Volume 1, Chapter 3 Project Checklist
Monday,
4 April
No Class
Wednesday,
6 April
Lecture 17,
1-2pm
Paths in Graphs: Hamiltonian & Eulerian Paths; Overlap Graphs and de Bruijn Graphs recursion demo, slides Bioinformatics Algorithms, Volume 1, Chapter 3,
Compeau, Pevzner, & Tesler 2011
Lab 9, report
HW #22-25
Lab 11,
3-5pm
Assembling short reads: de Bruijn graphs Lab 11 Bioinformatics Algorithms, Volume 1, Chapter 3
Monday,
11 April
No Class
Wednesday,
13 April
Lecture 18,
1-2pm
Sequence Alignment and the Longest Common Substring; Dynamic Programming Example: Manhattan Tourist Problem slides Bioinformatics Algorithms, Volume 1, Chapter 5 Lab 10 report
Lab 12,
3-5pm
More on Accessing Structured Data: SQL Lab 12 Project Progress Report
Monday,
18 April
Lecture 19,
2:30-5:30pm
Dynamic Programming for pairs of sequences; Scoring Matrices
More graph techniques: minimal spanning trees
slides Bioinformatics Algorithms, Volume 1, Chapter 5
Wednesday,
20 April
Lecture 20,
1-2pm
Using Dynamic Programming; Variations on the theme: Gap Penalties and other scoring functions;
Breadth first and depth first searching
slides Bioinformatics Algorithms, Volume 1, Chapter 5 Lab 11 report
Lab 13,
3-5pm
Pairwise Alignment of Sequences Lab 13 Bioinformatics Algorithms, Volume 1, Chapter 5
Monday,
25 April
Lecture 21,
2:30-5:30pm
Searching Graphs: More on Breadth First and Depth First Searching, Local Search Paradigms;
Optimality Criteria & Complexity, Parsimony Example: scoring trees in linear time
slides Bioinformatics Algorithms, Volume 2, Chapter 1 Project Progress Report
Wednesday,
27 April
Lecture 22,
1-2pm
More on Fitch's Algorithm, Wagner Builds, Distance Matrices, UPGMA, Neighbor Joining slides Bioinformatics Algorithms, Volume 2, Chapter 1 Lab 12 report
Lab 14,
3-5pm
Implementing Fitch's Algorithm for the Small Parsimony Problem Lab 14 Bioinformatics Algorithms, Volume 2, Chapter 1
Monday,
2 May
Lecture 23,
2:30-5:30pm
Complexity revisited: NP-hardness, heuristic search, approximation algorithms, Tree Rearangements & Edit Distances slides Bioinformatics Algorithms, Volume 2, Chapter 1
Wednesday,
4 May
Lecture 24,
1-2pm
Approximation algorithms for starting trees: understanding strengths & limitations of Wagner Builds and Neighbor Joining slides Bioinformatics Algorithms, Volume 2, Chapter 1 Lab 13 report
Lab 15,
3-5pm
Tree Algorithms Lab 15 Bioinformatics Algorithms, Volume 2, Chapter 1
Monday,
9 May
Project Presentations,
2:30-5:30pm
Project Presentations; Open Lab Final Project

? May
Last Day to Submit Lab Reports and Rosalind Lab 14 & Lab 15 reports



(Last updated: 4 May 2016)