Write Stan code to estimate the parameters of these models
A fun question
Breaking statistical models down
response = deterministic component + random component
This section / lecture is based heavily on Kery and Schaub 2012; Kery and Royle 2016
The random (noise) component
Hallmark of statistical models: they must account for randomness
Check out ?ddist in R, and replace dist by any of the following: pois, binom, norm, multinom, exp, and unif
Changing first letter d to p, q, or r allows one to get the density, the distribution function, the percentiles, and random numbers from these distributions, respectively.
Note R calls mass functions density functions (e.g., dbinom())
Kery and Schaub 2012; Kery and Royle 2016
The deterministic (signal) component
The signal component of the model contains the predictable parts of a response or the mean structure of a model
Often the mean structure is described by a linear model, although nonlinear models can also be used (Seber and Wild 2003)
Linear model is just one specific way to describe how we imagine our explantory variables influencing our repsonse
This model is linear in the parameters and does not need to represent a straight line when plotted
t-test, simple and multiple linear regressions, ANOVA, ANCOVA, and many mixed models are all linear models
Kery and Schaub 2012; Kery and Royle 2016
A brief illustration of an analysis of covariance ANCOVA
\(y_{i}\) is a response of unit (data point, individual, row) \(i\), \(X_{i}\) is the value of the continuous explanatory variable \(x\) for unit \(i\)
Factor \(A\) codes for the group membership of each unit with indeces \(g\) for groups 1,2, or 3
Two parameters in the mean relationship, \(\beta_{g(i)}\) and \(\beta_{1}\)
The random (noise) part of the model consists of the part of the response which we cannot explain using our linear combination of explanatory variables
Represented by residuals \(\varepsilon_{i}\)
Assume they come from a normal distribution with common variance \(\sigma^{2}\)
How many parameters in total does this model have?
The value of the linear predictor for the first data point is given by \(1 \cdot \beta_{g = 1}+0 \cdot beta_{g = 2} +0 \cdot beta_{g = 3} +1 \cdot \beta_{1}\)
A trick for learning about the design matrix in R: model.matrix()
# Means parameterizationX_ij <-model.matrix(~ A -1+ X)X_ij # rows = i, j = columns
see also effects or treatment contrast parameterization model.matrix(~ A + X)
for more information on linear model see: Kery 2010
The ANCOVA in Stan
The code looks very similar to the algebraic specification of this linear model
Note I am picking vague-ish priors but the specific priors aren’t the point of this lesson
The ANCOVA.stan file
data { int<lower=0> n_obs;// number of observations = i int<lower=0> n_col;// columns of design matrix = j vector[n_obs] y_obs;// observed data matrix[n_obs, n_col] X_ij;// design matrix: model.matrix(~A-1+X)}parameters { vector[n_col] b_j;// one parameter for each column of Xij real<lower=0> sig;// sigma must be postive }model { vector[n_obs] y_pred;// container for mean response b_j ~normal(0,100);// priors for b_j sig ~normal(0,100);// prior for sig y_pred = X_ij * b_j;// linear algebra sneakery y_obs ~normal(y_pred, sig);// likelihood}
And the corresponding R code:
library(cmdstanr)mod <-cmdstan_model("week3/soln_files/ANCOVA.stan") # compile# names in tagged list correspond to the data block in the Stan programX_ij <-model.matrix(~ A -1+ X)stan_data <-list(n_obs =nrow(X_ij), n_col =ncol(X_ij), y_obs = my_df$y, X_ij =as.matrix(X_ij) )# write a function to get starting valuesinits <-function() {list(b_j =jitter(rep(0, ncol(X_ij)), amount =0.5),sig =jitter(10, 1) )}fit <- mod$sample(data = stan_data,init = inits,seed =1, # ensure simulations are reproduciblechains =4, # multiple chainsiter_warmup =1000, # how long to warm up the chainsiter_sampling =1000, # how many samples after warmpparallel_chains =4# run them in parallel?)
Note this is all in the solution file for this week
Break
Now we will move into Generalized Linear Models (GLM), where all of the information we just learned still applies
Primary difference for GLMs is that they will allow us to model non-normal response variables in a manner similar to what we just did with an ANCOVA
Do this via a link function
Generalized Linear Models (GLMs)
The GLM is a flexible generalization of linear regression, developed by Nelder and Wedderburn in 1972 while working together at the Rothamsted Experimental Station in the U.K.
Extend the concept of linear effect of covariates to response variables for statistical distributions where something other than a normal is assumed
e.g., Poisson, binomial/Bernoulli, gamma, exponential, etc.
Linear effect of covariates is expressed not for the expected response directly, but rather for a transformation of the expected response (McCullagh and Nelder 1989)
Unifies various statistical methods, and thus fundamental to much contemporary statistical modeling
Hilbe et al. 2017; Gelman and Hill 2007; Kery and Royle 2016
Generalized Linear Models (GLMs)
The linear effect of covariates is expressed not for the expected response directly, but for a transformation of the expected response (Kery and Royle 2016)
This transformation is called a link function
We generally describe a GLM for a response \(y_{i}\) in terms of three things:
A random component (i.e., the likelihood)
A link function (i.e., a mathematical transformation)
Systematic component (i.e., the linear predictor)
Hilbe et al. 2017; Gelman and Hill 2007; Kery and Royle 2016
The three parts of a GLM
Random component of the response: a statistical distribution \(f\) with parameter(s) \(\theta\):
\[
y_{i} \sim f(\theta)
\]
A link function \(g\), which is applied to the expected response \(E(y) = \mu_{i}\), with \(\eta_{i}\) known as the linear predictor:
\[
g(E(y)) = g(\mu_{i}) = \eta_{i}
\]
Systematic part of the response (mean structure of the model containing a linear model):
A response \(y\) follows a distribution \(f\) with parameter(s) \(\theta\), and a transformation \(g\) of the expected response, which is modeled as a linear function of covariates
This is how we will code them in Stan, which makes the Bayesian framework powerful for learning GLMs
Many exciting ecological models can be viewed as coupled GLMs
GLMs are defined for all members of statistical distributions belonging to the so-called “exponential family” (McCullagh and Nelder 1989; Dobson and Barnett 2008)
normal, Poisson, binomial/Bernoulli, multinomial, beta, gamma, lognormal, exponential, and Dirichlet
Principles of linear modeling can be carried over to models other than normal linear regression 😎
The mean and variance of the Poisson distribution are equal
Almost never holds and usually requires a negative binomial or another model structure
Causes of over/under dispersion are complex
see Chapter 3, Kery and Royle 2016; Zuur et al. 2017
Testing for over- or under-dispersion: Poisson GLM
mean = variance = \(\lambda\)
Another way: Bayesian p-value
Idea: define a test statistic and compare the posterior distribution of that statistic for the original data to the posterior distribution of that test statistic for “replicate” datasets
Note we could also use graphical posterior predictive checks
Gelman et al. 2006; see Chapter 3, Kery and Royle 2016; Zuur et al. 2017