--- title: "STAT 491 - Lecture 6" output: pdf_document --- ```{r setup, include=FALSE} library(knitr) knitr::opts_chunk$set(echo = TRUE) set.seed(01152018) ``` # Ch.6 Inferring a Binomial Probability via Exact Mathematical Analysis Again we focus on binary outcomes, or coin flips. A motivating example that will show up as a lab exercise uses a dataset from Royle and Dorazio's __Hierarchical Modeling and Inference in Ecology__ related to the occurence of a bird (willow tit) in a sampled region. \vfill ## Likelihood for Bernoulli Distribution We saw the Bernoulli distribution in the previous section, recall that the probability distribution function can be written as: \vfill where the outcome of a single binary event is $y = \{0,1\}$. - Using this equation what is the probability that $y=1$? \vfill - What about the probability of $y=0$ \vfill With the probability distribution function, or as it is sometimes callled the sampling function, the function is about the data $y$ conditioned upon the the parameter(s) $\theta$. \vfill Points about a likelihood function: - in this case, \vfill - the likelihood \vfill - in classical statistics, \vfill For notational purposes, I'll usually denote the likelihood function as $\mathcal{L}(\theta|y)$, but unfortunately it can also be denoted as $p(y|\theta)$. \newpage Now, consider a series of $N$ binary outcomes, $\{y_1, y_2, \dots, y_n\}$, then we are interested in the joint distribution $p(\{y_1, y_2, \dots, y_n \}| \theta)$ \begin{eqnarray*} p(\{y_1, y_2, \dots, y_n \}| \theta) &=& \prod_i p(y_i|\theta) \; \; \text{ by independent trials}\\ &=& \prod_i \theta^{y_i} (1-\theta)^{(1-y_i)}\\ &=& \theta^{\sum_i y_i} (1-\theta)^{\sum_i (1-y_i)}\\ &=& \theta^z (1-\theta)^{N-z} \end{eqnarray*} where $z=\sum_i y_i$. \vfill ## Properties of a Prior Distribution Now given the likelihood of the data $\mathcal{L}(\theta|\{y_1, y_2, \dots, y_n \})$, a prior distribution, $p(\theta)$ for the parameter $\theta$ is required. \vfill Properties of a prior distribution: - a \vfill - it \vfill - Furthermore, \vfill ### Properties of the Beta Distribution We briefly saw the beta distribution in the previous chapter, talking about a prior distribution on the probability of rolling a 6. - In essence this distribution is \vfill - The mathematical foundation \vfill - The Beta distribution \vfill \newpage A Beta distribution for $\theta$ has the probability density function: \vfill \vfill where $a$ and $b$ are parameters in this distribution and $\Gamma()$ is the gamma function ($\Gamma(a) = \int_0^\infty t^{(a-1)} exp(-t) dt$). - What is $\int \frac{\Gamma(a) \Gamma(b)}{\Gamma(a+b)} \theta^{(a-1)} (1- \theta)^{(b-1)} d\theta$ = \vfill - What is $\int \theta^{(a-1)} (1- \theta)^{(b-1)} d\theta$ = \vfill - What is \begin{eqnarray*} \int \theta^{(a-1)} (1-\theta)^{(b-1)} \theta^z (1-\theta)^{(N-z)} d\theta &=& \end{eqnarray*} \vfill \vfill \vfill \newpage ### Intuition about $a$ and $b$ in Beta distribution - The mean of a $Beta(\theta|a,b)$ \vfill - $a+b$ is known as the concentration, \vfill *Exercise*: in R create plots for the following 5 Beta distributions and try a few more to get a sense of what changing $a$, $b$, and $a+b$ does to the distribution: - Beta(1,1) - Beta(1,10) - Beta(10,1) - Beta(5,5) - Beta(10,10) ```{r, fig.width=5, fig.align='center'} a <- 1 b <- 1 theta <- seq(0,1, by = .01) plot(theta, dbeta(theta, a, b ), ylim=c(0,10),type='n', # change n to l ylab=expression(paste('p(', theta, ')')), xlab = expression(theta)) ``` *Q:* summarize - how does $a$ effect the shape of the distribution? \vfill - how does $b$ effect the shape of the distribution? \vfill - how does $a+b$ effect the shape of the distribution? \vfill \newpage We will see that: - $a$ corresponds \vfill - $b$ corresponds \vfill - $a+b$ corresponds \vfill - as $a+b$ gets large, \vfill - the larger $a$ relative to $b$ \vfill - the larger $b$ relative to $a$ \vfill ### Posterior Distribution The goal of the analysis is to infer $p(\theta|z,N)$, in other words we want to learn about the parameter $\theta$ from a series of $N$ trials with $z$ successes. Assume we use a Beta(a,b) as a prior distribution on $\theta$. \newpage \begin{eqnarray*} p(\theta|z,N) &=& \frac{p(z,N|\theta) p(\theta)}{p(z,N)}\\ &=& \frac{p(z,N|\theta) p(\theta)}{\int p(z,N|\theta) p(\theta) d\theta}\\ &=& \frac{\theta^z (1-\theta)^{(N-z)} \times (\Gamma(a)\Gamma(b) / \Gamma(a+b))\theta^{(a-1)} (1-\theta)^{(b-1)}}{\int\theta^z (1-\theta)^{(N-z)} \times (\Gamma(a)\Gamma(b) / \Gamma(a+b))\theta^{(a-1)} (1-\theta)^{(b-1)} d\theta}\\ \end{eqnarray*} \vfill \vfill \vfill #### Posterior is a compromise of likelihood and prior For more intuition, consider the posterior mean: \begin{eqnarray*} \frac{z+a}{N+a+b} &=& \frac{z}{N+a+b} + \frac{a}{N+a+b} \\ &=&\frac{N}{N}\frac{z}{N+a+b} + \frac{a+b}{a+b}\frac{a}{N+a+b}\\ \end{eqnarray*} \vfill \vfill