Topics: Representing networks and trees; Adjacency Matrices and Adjacency Lists
There are two canonical ways to represent graphs (networks) and trees:
Below, there are several examples to think through in pairs or in triples.
Can you get here from there? Say you an airlines that flies:
New York | Chicago | Houston | San Jose | |
---|---|---|---|---|
New York | 1 | 1 | 1 | 0 |
Chicago | 1 | 1 | 0 | 1 |
Houston | 1 | 0 | 1 | 1 |
San Jose | 0 | 1 | 1 | 1 |
We're making the assumption that if you start in a city, you can stay in the city (i.e. there is a `self-loop' from New York to New York). You can also model this without self-loops, depending on the application.
Let's use a dictionary so, we don't have to remember which index corresponds to which city:
cityNum = {} cityNum["New York"] = 0 cityNum["Chicago"] = 1 cityNum["Houston"] = 2 cityNum["San Jose"] = 3
Using numpy, the matrix (without labels):
import numpy as np flights = np.zeros( (4,4) ) flights[cityNum["NYC"],cityNum["Chicago"]] = 1 flights[cityNum["NYC"],cityNum["Houston"]] = 1 flights[cityNum["Chicago"],cityNum["NYC"]] = 1 flights[cityNum["Chicago"],cityNum["NYC"]] = 1 flights[cityNum["Houston"],cityNum["NYC"]] = 1 flights[cityNum["Houston"],cityNum["San Jose"]] = 1 flights[cityNum["San Jose"],cityNum["Chicago"]] = 1 flights[cityNum["San Jose"],cityNum["Houston"]] = 1 #Create the self-loops: for i in range(4): flights[i,i] = 1(complete program in flights.py).
We can reach San Francisco from New York by taking flights to an intermediary city (in this case, Chicago or Houston) and then continuing on. That is:
There are flights: (NYC -> Chicago and Chicago -> San Jose) or (NYC -> Houston and Houston -> San Jose)In the matrix, we have existing flights as 1 and no flight as 0, so instead of the and, we can multiply and just check that sum is greater than 0 (the sum is non-zero only when one of the terms is, which happens exactly when there's flights via the intermediary cities):
(NYC -> Chicago * Chicago -> San Jose) + (NYC -> Houston * Houston -> San Jose) > 0
In terms of math, we are taking a vector of the outgoing flights from a city and combining it with a vector of the incoming flights to the next city. This combination is called the dot product:
v dot w = v1*w1 + v2*w2 + ... vN*wNwhere the vectors are v = (v1, v2, ... vN) and w = (w1, w2,... wN).
In numpy, there is a built-in function for taking the dot product of matrices (often called `multiplying matrices', though the numpy matrix multiplication does something very different).
This is a lovely feature of using adjacency matrices to reprensent graphs. If there is an edge from a to b and an edge from b to c, then a to c is non-zero in the matrix A.dot(A). Why? The i,jth element of the A.dot(A) is the dot product of the row indexed by i by the column index by j, which is:
A[i,0]*A[0,j] + A[i,1]*A[1,j] + A[i,2]*A[2,j] + ... + A[i,n-1]*A[n-1,j]This entry is not zero, only when one of the terms of the sum is not zero (there's no cancelling since all distances are positive).
The result of A.dot(A) is:
New York | Chicago | Houston | San Jose | |
---|---|---|---|---|
New York | 3 | 2 | 2 | 2 |
Chicago | 2 | 3 | 2 | 2 |
Houston | 2 | 2 | 3 | 2 |
San Jose | 2 | 2 | 2 | 3 |
That is, we can get from every city to every other city (the values indicate the number of different ways to do so, including taking the 'self-loops' or empty flights).
Before continuing, let's draw the cities we have. To do, we'll use basemap focused on the US (complete program in flights.py). In github, there is already a dictionary set up with entries for cities and there longitude and latitudes major_us_city_dma_codes. Instead of typing the coordinates in by hand, we will access the dictionary entries for each city we're mapping.
Try the program and see how it works.
A good use of adjacency lists is for storing sparsely populated graphs, such as trees. We will write a function that prints out all the leaves in a clade of a tree.
It is not hard, but a bit tedious, to parse a tree in Newick format (since you need to worry about extra spaces, names of more than one character, etc.). The file adjTrees.py has a tree already stored as an adjacency list in the dictionary tree. Pictorally, the tree is:
(Image from Smithsonian Institute)
The entries in our dictionary have the following structure:
(parentName,[child1,child2,...])If there are no children, the last part of the tuple is an empty list.
The goal is to write a function that given a vertex name, v, to print out all leaves below it. Let's break this into parts:
printLeaves(v): If v is a leaf, print out v Else, for each child of v, printLeaves(child)Let's translate that into Python:
def printLeaves(v): if len(adjList[v][1]) == 0: print v else: for child in adjList[v][1]: printLeaves(child)
Try running the program adjTrees.py.