---
title: "STAT 491 - Lecture 6"
output: pdf_document
---

```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
set.seed(01152018)
```

# Ch.6 Inferring a Binomial Probability via Exact Mathematical Analysis

Again we focus on binary outcomes, or coin flips. A motivating example that will show up as a lab exercise uses a dataset from Royle and Dorazio's __Hierarchical Modeling and Inference in Ecology__ related to the occurence of a bird (willow tit) in a sampled region.

\vfill

## Likelihood for Bernoulli Distribution

We saw the Bernoulli distribution in the previous section, recall that the probability distribution function can be written as:
\vfill

where the outcome of a single binary event is $y = \{0,1\}$.

- Using this equation what is the probability that $y=1$?
\vfill
- What about the probability of $y=0$
\vfill

With the probability distribution function, or as it is sometimes callled the sampling function, the function is about the data $y$ conditioned upon the the parameter(s) $\theta$.

\vfill

Points about a likelihood function:
- in this case, 
\vfill
- the likelihood 
\vfill
- in classical statistics, 
\vfill
For notational purposes, I'll usually denote the likelihood function as $\mathcal{L}(\theta|y)$, but unfortunately it can also be denoted as $p(y|\theta)$.

\newpage
Now, consider a series of $N$ binary outcomes, $\{y_1, y_2, \dots, y_n\}$, then we are interested in the joint distribution $p(\{y_1, y_2, \dots, y_n \}| \theta)$

\begin{eqnarray*}
 p(\{y_1, y_2, \dots, y_n \}| \theta) &=& \prod_i p(y_i|\theta) \; \; \text{ by independent trials}\\
&=& \prod_i \theta^{y_i} (1-\theta)^{(1-y_i)}\\
&=& \theta^{\sum_i y_i} (1-\theta)^{\sum_i (1-y_i)}\\
&=& \theta^z (1-\theta)^{N-z}
\end{eqnarray*}
where $z=\sum_i y_i$.
\vfill

## Properties of a Prior Distribution
Now given the likelihood of the data $\mathcal{L}(\theta|\{y_1, y_2, \dots, y_n \})$, a prior distribution, $p(\theta)$ for the parameter $\theta$ is required.
\vfill
Properties of a prior distribution:

- a 
\vfill
- it 
\vfill
- Furthermore, 
\vfill

### Properties of the Beta Distribution 
We briefly saw the beta distribution in the previous chapter, talking about a prior distribution on the probability of rolling a 6. 

- In essence this distribution is 
\vfill
- The mathematical foundation 
\vfill
- The Beta distribution 
\vfill

\newpage

A Beta distribution for $\theta$ has the probability density function:

\vfill
\vfill

where $a$ and $b$ are parameters in this distribution and $\Gamma()$ is the gamma function ($\Gamma(a) = \int_0^\infty t^{(a-1)} exp(-t) dt$). 

- What is $\int \frac{\Gamma(a) \Gamma(b)}{\Gamma(a+b)} \theta^{(a-1)} (1- \theta)^{(b-1)} d\theta$ = 
\vfill
- What is $\int \theta^{(a-1)} (1- \theta)^{(b-1)} d\theta$ = 
\vfill 
- What is 
\begin{eqnarray*}
\int \theta^{(a-1)} (1-\theta)^{(b-1)} \theta^z (1-\theta)^{(N-z)} d\theta &=& 
\end{eqnarray*}
\vfill
\vfill
\vfill

\newpage

### Intuition about $a$ and $b$ in Beta distribution
- The mean of a $Beta(\theta|a,b)$ 
\vfill
- $a+b$ is known as the concentration, \vfill

*Exercise*: in R create plots for the following 5 Beta distributions and try a few more to get a sense of what changing $a$, $b$, and $a+b$ does to the distribution:

- Beta(1,1)
- Beta(1,10)
- Beta(10,1)
- Beta(5,5)
- Beta(10,10)

```{r, fig.width=5, fig.align='center'}
a <- 1
b <- 1
theta <- seq(0,1, by = .01)
plot(theta, dbeta(theta, a, b ), ylim=c(0,10),type='n', # change n to l
     ylab=expression(paste('p(', theta, ')')), xlab = expression(theta))
```

*Q:* summarize
- how does $a$ effect the shape of the distribution?
\vfill
- how does $b$ effect the shape of the distribution?
\vfill
- how does $a+b$ effect the shape of the distribution?
\vfill
\newpage
We will see that:

- $a$ corresponds 
\vfill
- $b$ corresponds 
\vfill
- $a+b$ corresponds
\vfill
- as $a+b$ gets large, 
\vfill
- the larger $a$ relative to $b$ 
\vfill
- the larger $b$ relative to $a$ 
\vfill

### Posterior Distribution
The goal of the analysis is to infer $p(\theta|z,N)$, in other words we want to learn about the parameter $\theta$ from a series of $N$ trials with $z$ successes.

Assume we use a Beta(a,b) as a prior distribution on $\theta$.

\newpage


\begin{eqnarray*}
p(\theta|z,N) &=& \frac{p(z,N|\theta) p(\theta)}{p(z,N)}\\
 &=& \frac{p(z,N|\theta) p(\theta)}{\int p(z,N|\theta) p(\theta) d\theta}\\
&=& \frac{\theta^z (1-\theta)^{(N-z)} \times (\Gamma(a)\Gamma(b) / \Gamma(a+b))\theta^{(a-1)} (1-\theta)^{(b-1)}}{\int\theta^z (1-\theta)^{(N-z)} \times (\Gamma(a)\Gamma(b) / \Gamma(a+b))\theta^{(a-1)} (1-\theta)^{(b-1)} d\theta}\\
\end{eqnarray*}
\vfill
\vfill
\vfill

#### Posterior is a compromise of likelihood and prior
For more intuition, consider the posterior mean:

\begin{eqnarray*}
\frac{z+a}{N+a+b} &=& \frac{z}{N+a+b} + \frac{a}{N+a+b} \\
&=&\frac{N}{N}\frac{z}{N+a+b} + \frac{a+b}{a+b}\frac{a}{N+a+b}\\
\end{eqnarray*}

\vfill
\vfill