Today's lab focuses on reading data from files in the R programming language.
In the last labs, we worked through the first two chapters of the Try R.
Today, we will work through the chapters on summary statistics and 2D data. Work through the following:
Let's use the commands you just learned to load in data (if you haven't done so already, work through Chapters 2-6 from the Try R).
We'll use the NYC historical population data that we explored with pandas. If you do not have a copy, download and save to the desktop.
Open R and use the list.files() command to make sure the file is there:
list.files(path = "~/Desktop")R will print out a list of the files currently on the desktop. If nycHistPop.csv does not show up on that list, download it again from NYC historical population data.
Use Excel, or your favorite spreadsheet program, to open up the file. How many lines of comments and introductory material are there? As with pandas, we need to skip those row when we read in the program.
To find the option for skipping lines, bring up the help page for read.csv:
help(read.csv)Read down the documentation until you find the command for skipping extra lines at the beginning of the file. Try it before looking at the answer below.
To read in the CSV file:
pop <- read.csv("~/Desktop/nycHistPop.csv", skip=5)
Let's use the notation for R data frames to make a bar plot of the Bronx population over time:
barplot(names=pop$Year, pop$Bronx)Next, let's add in the average population into the graph. First, we need to compute the mean:
mean(pop$Bronx)When we do this in R, we end up with the value na ("not assigned"). Let's check why:
print(pop)The early years in the table have no values for the Bronx population. Remember from lecture that there's an option to ignore unassigned values? If not, try help:
help(mean)What is the option to ignore the values? (The answer is below.)
mean(pop$Bronx, na.rm = TRUE)Great! Now we have the average population of the Bronx, over time. Let's assign it to a variable and add a line to the graph:
m <- mean(pop$Bronx, na.rm = TRUE) abline(h = m)
If you finish the lab early, now is a great time to get a head start on the programming problems due early next week. There's instructors to help you and you already have Python up and running. The Programming Problem List has problem descriptions, suggested reading, and due dates next to each problem.