---
title: 'Case Study: Challenger Explosion'
author: "Stacey Hancock"
date: "1/14/2020"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

Load required libraries:
```{r message = FALSE}
library(tidyverse)
```
In this case study, we will examine part of the analysis in
Dalal et al. (1989) on O-ring data on shuttle launches
prior to the 1986 Challenger shuttle explosion. Variables in
this data set are:

* Flight = Flight number
* Date = Date of launch
* Field = Number of primary field-joint O-rings (out of 6) that experienced erosion or blowby
* Temp = Joint temperature at time of launch
* Pres = Field-joint leak-check pressure

Note that the Challenger launched at a temperature of 31 degrees F.

Copy and paste data from:
<http://www.math.montana.edu/shancock/data/Challenger_Data.html>

```{r}
Orings <- structure(list(
  Flight = structure(c(1L, 2L, 3L, 8L, 17L, 22L, 
    23L, 24L, 4L, 5L, 6L, 7L, 9L, 11L, 12L, 10L, 14L, 13L, 15L, 16L, 
    18L, 19L, 20L), .Label = c("1", "2", "3", "41-B", "41-C", "41-D", 
    "41-G", "5", "51-A", "51-B", "51-C", "51-D", "51-F", "51-G", 
    "51-I", "51-J", "6", "61-A", "61-B", "61-C", "61-I", "7", "8", 
    "9"), class = "factor"), 
  Date = structure(c(8L, 22L, 5L, 21L, 
    6L, 11L, 15L, 24L, 4L, 7L, 16L, 18L, 20L, 2L, 9L, 10L, 12L, 13L, 
    14L, 17L, 19L, 23L, 1L), .Label = c("01/12/86", "01/24/85", "01/28/86", 
    "02/03/84", "03/22/82", "04/04/83", "04/06/84", "04/12/81", "04/12/85", 
    "04/29/85", "06/16/83", "06/17/85", "07/29/85", "08/27/85", "08/30/83", 
    "08/30/84", "10/03/85", "10/05/84", "10/30/85", "11/08/84", "11/11/82", 
    "11/12/81", "11/26/85", "11/28/83"), class = "factor"), 
  Field = c(0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 1), 
  Temp = c(66, 70, 69, 68, 67, 72, 73, 70, 57, 63, 70, 78, 
    67, 53, 67, 75, 70, 81, 76, 79, 75, 76, 58), 
  Pres = c(50, 50, 50, 50, 50, 50, 100, 100, 200, 200, 200, 
    200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200)), 
  .Names = c("Flight", "Date", "Field", "Temp", "Pres"), 
  row.names = c(NA, 23L), class = "data.frame")
```

Let's replicate the statistical analysis of
scientists before the Challenger launch. First,
they only looked at flights that experienced some thermal distress.
Let's subset the data:
```{r}
Orings.fail <- Orings %>%
  filter(Field != 0)
```

They examined a plot of these data, and then fit a simple linear
regression.
```{r}
Orings.fail %>% ggplot(aes(x = Temp, y = Field)) +
  geom_point() + 
  ylim(0,6) +
  labs(x = "Temperature (F)",
       y = "Number of O-ring Failures",
       title = "Launches with at least one O-ring failure") +
  geom_smooth(method = "lm", se = FALSE)

mod1 <- lm(Field ~ Temp, data = Orings.fail)
summary(mod1)
```
With a p-value of 0.568, it was concluded that temperature had no significant effect
on the predicted number of O-ring failures.... what is wrong with this analysis?

* Extrapolated outside of available data
* Not normal data - p-value not valid
* Not linear
* Only looked at flights with failures!


Let's look at an appropriate analysis, including
all of the past launches (not just those that experienced
O-ring failure!).
```{r}
Orings %>% ggplot(aes(x = Temp, y = Field)) +
  geom_point() + 
  ylim(0,6) +
  labs(x = "Temperature (F)",
       y = "Number of O-ring Failures",
       title = "All past launches") +
  geom_smooth(method = "lm", se = FALSE)

mod2 <- lm(Field ~ Temp, data = Orings)
summary(mod2)
```

Now, the p-value for the slope of a simple linear regression
is 0.00532 -- a very different conclusion!

But is simple linear regression appropriate? (Hint: Is the
response variable normally distributed?)

For a binomial response variable, we use logistic regression.
```{r}
mod3 <- glm(cbind(Field, 6) ~ Temp, family = binomial, data = Orings)
summary(mod3)
```

Logistic regression models a transformation of the
probability of failure.
```{r}
Orings %>% ggplot(aes(x = Temp, y = Field/6)) +
  geom_point() + 
  ylim(0,1) +
  labs(x = "Temperature (F)",
       y = "Probability of O-ring Failure",
       title = "All past launches") +
    stat_function(fun = function(x){
        exp(mod3$coef[1] + mod3$coef[2]*x)/(1+exp(mod3$coef[1] + mod3$coef[2]*x))
    })
```
So what was the predicted probability an O-ring fails
at a temperature of 31 degrees F?
```{r}
pred.prob <- predict(mod3, newdata = data.frame(Temp = 31), type = "response")
pred.prob
```
Thus, the probability of _at least one_ O-ring failing is
```{r}
sum(dbinom(1:6, 6, pred.prob))
```