Today's lab focuses on standard statistical tools for comparing data series and useful packages for plotting in the R programming language.
As with Python, R has both built-in functionality as well as a large community writing additional, useful packages. R packages can be found at the Comprehensive R Archive Network (CRAN) repository.
In lecture, we used both the built-in (base) graphics package and a fancier graphics package to make prettier images, ggplot2. Here's directions on how to load it onto your computer:
install.packages("ggplot2")
library("ggplot2")
Today's lab uses ggplot as well as GGally to make visualizations of correlations in a dataframe. Repeat the steps above to download and install GGally.
Some packages that we have seen (or will see) in lecture, lab, and programming include:
In lecture, we briefly touched on the Harvard's R graphics tutorial. Work through the tutorial, and then try the related challenges/programming problems.
When analyzing data, a common question is do two things have intertwined behavior? That is, if one is larger than average, would you expect the larger than average? There are rigorous tools to measure this inter-related behavior. We will focus on an incredibly useful one that is easily used in R, called correlation. Work through the DataCamp's correlation tutorial. Then try the related challenges/programming problems.
If you finish the lab early, now is a great time to get a head start on the programming problems due early next week. There's instructors to help you and you already have Python up and running. The Programming Problem List has problem descriptions, suggested reading, and due dates next to each problem.