CSci 493 Syllabus        FAQ



Problem Set 2: Trees as Vectors
CSci 493: Algorithms for Tree Distributions
Fall 2022


Learning Objective: to recap standard tree traversal techniques and gain fluency with the biopython phylo and nexus submodules.

General Notes

Program Descriptions

Program 2.1: Bipartitions.Target Date: Wednesday, 12 October.
Learning Objective: to use built-in functions to extract properties (in this case, edge information) from tree objects.
Available Libraries: biopython and core Python 3.7+.

Description: read in tree in Newick format and print out all the splits of that tree. The splits should be ordered, lexicographically.

Your program should print out the non-trivial bipartitions, including the root, of the tree. A trivial bipartition has one side with only 1 element, e.g. 1 | 0 2 3 4 5.

For example, if the user enters:

((1,2),3);
There is one non-trivial split of 0 3 | 1 2 and the rest of the splits are a single leaf (or root) versus all the other leaves (i.e., 0 | 1 2 3; 1 | 0 2 3; 2 | 0 1 3; 3 | 0 1 2; 0 3 | 1 2). Your program should only print out the non-trivial splits. Further, each split should have each "side" ordered lexicographically, as well as ordering the sides (i.e. 0 3 | 1 2 instead of 1 2 | 3 0). For this example, your program should print:
0 3 | 1 2

For example, if the user enters:

((1,2),((3,4),5));
your program would print:
0 3 4 5 | 1 2; 0 5 | 1 2 3 4

Program 2.2: LCA's.Target Date: Thursday, 13 October.
Learning Objective: to increase facility with Newick tree format and the biopython libraries.
Available Libraries: biopython and core Python 3.7+.

Description: Write a function that takes a tree in Newick format and two leaf labels and returns the depth of their least common ancestor.

For example, if the user enters:

((1,2),((3,4),5));
as the tree and the nodes 1 and 5, your program would return 0 since the least common ancestor of 1 and 5 is the root, and the root's depth is 0.

With the same tree, if the nodes entered are 3 and 5, then the least common ancestor is the child of the root with leaves 3, 4, and 5, which has depth 1. So, 1 would be returned.

Hint: In the tree class of Biopython, there are methods that will find the common ancestor of two nodes as well as the distance between two nodes that may be useful for computing the depth of the least common ancestor.

Program 2.3: Co-Phenetic Vectors.Target Date: Friday, 14 October.
Learning Objective: to increase facility with Newick tree format and the biopython libraries.
Available Libraries: biopython and core Python 3.7+.

Description: Write a program that takes a tree in Newick format and returns a cophenetic vector of the tree. The ordering of the coordinates in the vector should match the ordering of the leaves in Newick format.

For example, if the user enters:

((1,2),(3,4));
There are 4 leaves, so, 4*3/2 = 6 pairs of distances. If we order the pairs, lexicographically, we have
pairs:121314232434
depths:100001
and your program should return (1, 0, 0, 0, 0, 1).

For example, if the user enters:

((1,2),((3,4),5));
There are 5*4/2=10 pairs, and the program should return (1, 0, 0, 0, 0, 0, 0, 2, 1, 1).