--- title: 'Case Study: Challenger Explosion' author: "Stacey Hancock" date: "1/14/2020" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` Load required libraries: ```{r message = FALSE} library(tidyverse) ``` In this case study, we will examine part of the analysis in Dalal et al. (1989) on O-ring data on shuttle launches prior to the 1986 Challenger shuttle explosion. Variables in this data set are: * Flight = Flight number * Date = Date of launch * Field = Number of primary field-joint O-rings (out of 6) that experienced erosion or blowby * Temp = Joint temperature at time of launch * Pres = Field-joint leak-check pressure Note that the Challenger launched at a temperature of 31 degrees F. Copy and paste data from: ```{r} Orings <- structure(list( Flight = structure(c(1L, 2L, 3L, 8L, 17L, 22L, 23L, 24L, 4L, 5L, 6L, 7L, 9L, 11L, 12L, 10L, 14L, 13L, 15L, 16L, 18L, 19L, 20L), .Label = c("1", "2", "3", "41-B", "41-C", "41-D", "41-G", "5", "51-A", "51-B", "51-C", "51-D", "51-F", "51-G", "51-I", "51-J", "6", "61-A", "61-B", "61-C", "61-I", "7", "8", "9"), class = "factor"), Date = structure(c(8L, 22L, 5L, 21L, 6L, 11L, 15L, 24L, 4L, 7L, 16L, 18L, 20L, 2L, 9L, 10L, 12L, 13L, 14L, 17L, 19L, 23L, 1L), .Label = c("01/12/86", "01/24/85", "01/28/86", "02/03/84", "03/22/82", "04/04/83", "04/06/84", "04/12/81", "04/12/85", "04/29/85", "06/16/83", "06/17/85", "07/29/85", "08/27/85", "08/30/83", "08/30/84", "10/03/85", "10/05/84", "10/30/85", "11/08/84", "11/11/82", "11/12/81", "11/26/85", "11/28/83"), class = "factor"), Field = c(0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 1), Temp = c(66, 70, 69, 68, 67, 72, 73, 70, 57, 63, 70, 78, 67, 53, 67, 75, 70, 81, 76, 79, 75, 76, 58), Pres = c(50, 50, 50, 50, 50, 50, 100, 100, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200)), .Names = c("Flight", "Date", "Field", "Temp", "Pres"), row.names = c(NA, 23L), class = "data.frame") ``` Let's replicate the statistical analysis of scientists before the Challenger launch. First, they only looked at flights that experienced some thermal distress. Let's subset the data: ```{r} Orings.fail <- Orings %>% filter(Field != 0) ``` They examined a plot of these data, and then fit a simple linear regression. ```{r} Orings.fail %>% ggplot(aes(x = Temp, y = Field)) + geom_point() + ylim(0,6) + labs(x = "Temperature (F)", y = "Number of O-ring Failures", title = "Launches with at least one O-ring failure") + geom_smooth(method = "lm", se = FALSE) mod1 <- lm(Field ~ Temp, data = Orings.fail) summary(mod1) ``` With a p-value of 0.568, it was concluded that temperature had no significant effect on the predicted number of O-ring failures.... what is wrong with this analysis? * Extrapolated outside of available data * Not normal data - p-value not valid * Not linear * Only looked at flights with failures! Let's look at an appropriate analysis, including all of the past launches (not just those that experienced O-ring failure!). ```{r} Orings %>% ggplot(aes(x = Temp, y = Field)) + geom_point() + ylim(0,6) + labs(x = "Temperature (F)", y = "Number of O-ring Failures", title = "All past launches") + geom_smooth(method = "lm", se = FALSE) mod2 <- lm(Field ~ Temp, data = Orings) summary(mod2) ``` Now, the p-value for the slope of a simple linear regression is 0.00532 -- a very different conclusion! But is simple linear regression appropriate? (Hint: Is the response variable normally distributed?) For a binomial response variable, we use logistic regression. ```{r} mod3 <- glm(cbind(Field, 6) ~ Temp, family = binomial, data = Orings) summary(mod3) ``` Logistic regression models a transformation of the probability of failure. ```{r} Orings %>% ggplot(aes(x = Temp, y = Field/6)) + geom_point() + ylim(0,1) + labs(x = "Temperature (F)", y = "Probability of O-ring Failure", title = "All past launches") + stat_function(fun = function(x){ exp(mod3$coef[1] + mod3$coef[2]*x)/(1+exp(mod3$coef[1] + mod3$coef[2]*x)) }) ``` So what was the predicted probability an O-ring fails at a temperature of 31 degrees F? ```{r} pred.prob <- predict(mod3, newdata = data.frame(Temp = 31), type = "response") pred.prob ``` Thus, the probability of _at least one_ O-ring failing is ```{r} sum(dbinom(1:6, 6, pred.prob)) ```