Today's lab focuses on reading data from files in the R programming language.

Try R Vector Tutorial

In the last labs, we worked through the first two chapters of the Try R.

Today, we will work through the chapters on summary statistics and 2D data. Work through the following:

• Chapter 3: Matrices of Try R: this chapter reviews what we covered in lecture on matrices.
• Chapter 4: Summary Statistics of Try R: three useful ways to summarize data (mean, median, and standard deviation).
• Chapter 5: Factors of Try R: a short, but useful, chapter on factors or "types" of data.
• Chapter 6: Data Frames of Try R: very similar to the dataFrames from pandas in Python. This chapter also covers reading in data from a file.

Let's use the commands you just learned to load in data (if you haven't done so already, work through Chapters 2-6 from the Try R).

We'll use the NYC historical population data that we explored with pandas. If you do not have a copy, download and save to the desktop.

Open R and use the list.files() command to make sure the file is there:

list.files(path = "~/Desktop")
R will print out a list of the files currently on the desktop. If nycHistPop.csv does not show up on that list, download it again from NYC historical population data.

Use Excel, or your favorite spreadsheet program, to open up the file. How many lines of comments and introductory material are there? As with pandas, we need to skip those row when we read in the program.

To find the option for skipping lines, bring up the help page for read.csv:

Read down the documentation until you find the command for skipping extra lines at the beginning of the file. Try it before looking at the answer below.

To read in the CSV file:

Let's use the notation for R data frames to make a bar plot of the Bronx population over time:

barplot(names=pop\$Year, pop\$Bronx)
Next, let's add in the average population into the graph. First, we need to compute the mean:
mean(pop\$Bronx)
When we do this in R, we end up with the value na ("not assigned"). Let's check why:
print(pop)
The early years in the table have no values for the Bronx population. Remember from lecture that there's an option to ignore unassigned values? If not, try help:
help(mean)
What is the option to ignore the values? (The answer is below.)
mean(pop\$Bronx, na.rm = TRUE)
Great! Now we have the average population of the Bronx, over time. Let's assign it to a variable and add a line to the graph:
m <- mean(pop\$Bronx, na.rm = TRUE)
abline(h = m)

Challenges

• Add in a legend to your plot (hint: see the tutorial on reading in data frames for details).
• Make a plot of the population of Manhattan. Include a line marking the mean on your plot.

In-class Quiz

During lab, there is a quiz on vectors in R. The password to access the quiz will be given during lab.

What's Next?

If you finish the lab early, now is a great time to get a head start on the programming problems due early next week. There's instructors to help you and you already have Python up and running. The Programming Problem List has problem descriptions, suggested reading, and due dates next to each problem.