Lab 2 - Data Visualization in R
Base R Graphics
In Lab 1, we learned a few functions for plotting data in base R ("base R" is the term we use to describe the R program when we haven't loaded any additional libraries):
plot
hist
boxplot
Let's review these functions here using Current Population Survey (CPS) data. These particular data consist of a random sample of 534 people from the CPS in 1985, with information on wages and other characteristics of the workers, including sex, number of years of education, years of work experience, occupational status, region of residence and union membership. Variables in the data set are described below.
Variable |
Description |
educ |
Number of years of education |
south |
Indicator variable for living in a southern region: |
sex |
Gender: M = male, F = female |
exper |
Number of years of work experience (inferred from age and education) |
union |
Indicator variable for union membership: Union or Not |
wage |
Wage (dollars per hour) |
age |
Age (years) |
race |
Race: W = white, NW = not white |
sector |
Sector of the economy: clerical, const (construction), management, |
married |
Marital status: Married or Single |
Load these data into your R session by running the following command.
CPS <- read.csv("http://math.montana.edu/shancock/data/cps.csv")
Exercise
Practice the base R graphics functions by answering the following questions:
- Is there an association between number of years of education and wage?
- Is there an association between age and union membership?
- Do men make more than women?
Use both plots and summary statistics to investigate these questions.
Data Visualization with ggplot2
R has numerous packages (libraries) for data visualization and graphics beyond what is available in base R. One of the more popular packages is ggplot2. Since there already exist excellent tutorials in using ggplot2, we will outsource this portion of the lab to Garrett Grolemund and Hadley Wickham: work through their tutorial on data visualization (which is Chapter 3 of their book, R for Data Science).