STAT 408 - Week 4:
Tidy Data, Data Manipulation, and Processing
First read in the data set which is available at: http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/BaltimoreTowing.csv.
baltimore.tow <-
read.csv('http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/BaltimoreTowing.csv',
stringsAsFactors = F)
baltimore.tow$totalNumeric <-
as.numeric(substr(baltimore.tow$totalPaid,
start = 2, stop=nchar(baltimore.tow)))
str(baltimore.tow)
## 'data.frame': 30263 obs. of 6 variables:
## $ vehicleType : chr "Van" "Car" "Car" "Car" ...
## $ vehicleMake : chr "LEXUS" "Mercedes" "Chysler" "Chevrolet" ...
## $ vehicleModel : chr "" "" "Cirrus" "Cavalier" ...
## $ receivingDateTime: chr "10/24/2010 12:41:00 PM" "04/28/2015 09:27:00 AM" "07/23/2015 07:55:00 AM" "10/23/2010 11:35:00 AM" ...
## $ totalPaid : chr "$322.00" "$130.00" "$280.00" "$1057.00" ...
## $ totalNumeric : num 322 130 280 1057 469 ...
Now also use the group by procedure to compute the average towing cost for all vehicle types.
The first goal is to determine how many vehicles were towed for each year in the data set.
Use the substr() function to extract year and create a new variable in R.
# baltimore.tow$Year <-
Now we can extract year from this chunk of code contained in pieces.mat.
#baltimore.tow$Year <-
Next we wish to compute how many vehicles were towed in the AM and PM for each type of vehicle.
However, we want to take a close look at the vehicle types in the data set and perhaps create more useful groups.
Spelling errors can be addressed, by reassigning vehicles to the correct spelling.
baltimore.tow$vehicleMake[baltimore.tow$vehicleMake ==
'Peterbelt'] <- 'Peterbilt'
baltimore.tow$vehicleMake[baltimore.tow$vehicleMake ==
'Izuzu'] <- 'Isuzu'
baltimore.tow$vehicleMake[baltimore.tow$vehicleMake ==
'Frightliner'] <- 'Freightliner'
baltimore.tow$vehicleMake[baltimore.tow$vehicleMake ==
'Internantional'] <- 'International'
Also note that many of the groupings have mis-classified vehicles, but we will not focus on that yet.
First we will delete golf carts, boats, and trailers. There are several ways to do this, consider making a new data frame called balt.tow.small that does not include golf carts, boats, and trailers.
balt.tow.small <-
Now we need to create a variable for the additional groups below.
One way to create groups is by creating a new variable
First we need to extract the AM/PM tag from the time-date character string.
As the tag that we are looking for falls at the end of the string, we can use nchar()
to find the length of the string.
We could use aggregate, as such:
We could use aggregate, as such: