Topics: Representing networks and trees; Adjacency Matrices and Adjacency Lists

Adjacency Matrices

Can you get here from there? Say you an airlines that flies:

Can you fly from New York to San Francisco? Well, not directly, but it is possible if you change plans in Chicago or Houston. We can represent this in a matrix with entries of 1 if there is a flight and 0 if there is not:
New York Chicago Houston San Jose
New York1110
San Jose0111
We're making the assumption that if you start in a city, you can stay in the city (i.e. there is a `self-loop' from New York to New York). You can also model this without self-loops, depending on the application.

Let's use a dictionary so, we don't have to remember which index corresponds to which city:

cityNum = {}
cityNum["New York"] = 0
cityNum["Chicago"] = 1
cityNum["Houston"] = 2
cityNum["San Jose"] = 3

Using numpy, the matrix (without labels):

import numpy as np
flights = np.zeros( (4,4) )

flights[cityNum["NYC"],cityNum["Chicago"]] = 1
flights[cityNum["NYC"],cityNum["Houston"]] = 1
flights[cityNum["Chicago"],cityNum["NYC"]] = 1
flights[cityNum["Chicago"],cityNum["NYC"]] = 1
flights[cityNum["Houston"],cityNum["NYC"]] = 1
flights[cityNum["Houston"],cityNum["San Jose"]] = 1
flights[cityNum["San Jose"],cityNum["Chicago"]] = 1
flights[cityNum["San Jose"],cityNum["Houston"]] = 1

#Create the self-loops:
for i in range(4): flights[i,i] = 1
(complete program in

We can reach San Francisco from New York by taking flights to an intermediary city (in this case, Chicago or Houston) and then continuing on. That is:

There are flights: (NYC -> Chicago and Chicago -> San Jose) or (NYC -> Houston and Houston -> San Jose)
In the matrix, we have existing flights as 1 and no flight as 0, so instead of the and, we can multiply and just check that sum is greater than 0 (the sum is non-zero only when one of the terms is, which happens exactly when there's flights via the intermediary cities):
(NYC -> Chicago * Chicago -> San Jose) + (NYC -> Houston * Houston -> San Jose) > 0

In terms of math, we are taking a vector of the outgoing flights from a city and combining it with a vector of the incoming flights to the next city. This combination is called the dot product:

	v dot w = v1*w1 + v2*w2 + ... vN*wN
where the vectors are v = (v1, v2, ... vN) and w = (w1, w2,... wN).

In numpy, there is a built-in function for taking the dot product of matrices (often called `multiplying matrices', though the numpy matrix multiplication does something very different).

This is a lovely feature of using adjacency matrices to reprensent graphs. If there is an edge from a to b and an edge from b to c, then a to c is non-zero in the matrix Why? The i,jth element of the is the dot product of the row indexed by i by the column index by j, which is:

A[i,0]*A[0,j] + A[i,1]*A[1,j] + A[i,2]*A[2,j] + ... + A[i,n-1]*A[n-1,j]
This entry is not zero, only when one of the terms of the sum is not zero (there's no cancelling since all distances are positive).

The result of is:
New York Chicago Houston San Francisco
New York3222
San Francisco2223
That is, we can get from every city to every other city (the values indicate the number of different ways to do so, including taking the 'self-loops' or empty flights).

Before continuing, let's draw the cities we have. To do, we'll use basemap focused on the US (complete program in In github, there is already a dictionary set up with entries for cities and there longitude and latitudes major_us_city_dma_codes. Instead of typing the coordinates in by hand, we will access the dictionary entries for each city we're mapping.

  1. Set up a dictionary with city names and coordinates from file.
  2. Set up a basemap with state boundaries.
  3. Include our flight table as a numpy array.
  4. Draw a marker on each city in the flights table.
  5. For each edge in the table, draw a great circle between the cities.
  6. TO DO: Compute the cities that can be reached in two flights.
  7. TO DO: Draw the cities and great circles for those flights.

Try the program and see how it works.


Adjacency Lists

A good use of adjacency lists is for storing sparsely populated graphs, such as trees. In this part of the lab, we will write a function that prints out all the leaves in a clade of a tree.

It is not hard, but a bit tedious, to parse a tree in Newick format (since you need to worry about extra spaces, names of more than one character, etc.). The file has a tree already stored as an adjacency list in the dictionary tree. Pictorally, the tree is:

(Image from Smithsonian Institute)

The entries in our dictionary have the following structure:

If there are no children, the last part of the tuple is an empty list.

The goal of this part of the lab is to write a function that given a vertex name, v, to print out all leaves below it. Let's break this into parts:

    If v is a leaf, print out v
    Else, for each child of v, printLeaves(child)
Let's translate that into Python:
def printLeaves(v):
    if len(adjList[v][1]) == 0:
    	print v
	    for child in adjList[v][1]:

Try running the program


Lab Report

For each lab, you should submit a lab report by the target date to: kstjohn AT amnh DOT org. The reports should be about a page for the first labs and contain the following:

Target Date: 13 April 2016
Title: Lab 10: Representing Networks and Trees
Name & Email:

Purpose: Give summary of what was done in this lab.
Procedure: Describe step-by-step what you did (include programs or program outlines).
Results: If applicable, show all data collected. Including screen shots is fine (can capture via the Grab program).
Discussion: Give a short explanation and interpretation of your results here.

Using Rosalind

This course will use the on-line Rosalind system for submitting programs electronically. The password for the course has been sent to your email. Before leaving lab today, complete the first two challenges.