STAT 408 - Statistical Learning
Predictive Modeling
titanic <- read.csv(
'http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/titanic.csv')
set.seed(11142017)
titanic <- titanic %>% filter(!is.na(Age))
num.pass <- nrow(titanic)
test.ids <- base::sample(1:num.pass, size=round(num.pass*.3))
test.titanic <- titanic[test.ids,]
train.titanic <- titanic[(1:num.pass)[!(1:num.pass) %in%
test.ids],]
dim(titanic)
## [1] 714 12
dim(test.titanic)
## [1] 214 12
dim(train.titanic)
## [1] 500 12
See if you can improve the classification error from the model below.
glm.titanic <- glm(Survived ~ Age, data=train.titanic, family = binomial)
Class.Error <- mean(test.titanic$Survived != round(predict(glm.titanic, test.titanic, type='response')))
The logistic regression model only using age is wrong \(40\)% of the time.