Even more hierarchical models

FW 891
Click here to view presentation online

Christopher Cahill

25 October 2023

Purpose

  • Today we introduce a powerful extension of mixed effects models
    • Random slopes (aka varying effects models)
    • Repent for our earlier sins
  • Adventures in covariance a la McElreath (2023)
    • Cover some math necessary for working with covariance matrices
  • Simulate a varying effects problem
  • Develop both centered and noncentered varying effects models in Stan

Some references

  • This lecture is drawing heavily on McElreath (2023), and much of the code and analyses are adapted from information in that text

Some references

  • This lecture is drawing heavily on McElreath (2023), and much of the code and analyses are adapted from information in that text
    • If you want more information, check “Adventures in Covariance” chapter of this book

Some references

  • This lecture is drawing heavily on McElreath (2023), and much of the code and analyses are adapted from information in that text
    • If you want more information, check “Adventures in Covariance” chapter of this book
  • See also Gelman and Hill (2007)
    • Specifically chapter 13

Thinking about variability in ecological systems

Three key takeways

Point #1

  • variability in both intercepts and slopes among replicates

Point #2

  • slopes get steeper as intercepts get bigger

Point #3

Simpson’s paradox

Group question

  • what is a question in your field of study that might show a similar pattern?

Varying effects

  • Generalization of standard multilevel regression
    • Specifically, models that allow slopes and intercepts to vary by group
  • Several ways to write, here’s one for the model we just visualized

Varying effects maths

\[ \begin{array} \text{y}_{i} \sim \operatorname{N}\left(\mu_{i}, \sigma\right) & \text { [likelihood] } \\ \mu_{i}=\beta_{0[group]}+\beta_{1[group]}x_{1[i]} & \text { [linear model] } \\ \end{array} \]

adapted from McElreath 2023

Varying effects maths

\[ \begin{array} \text{y}_{i} \sim \operatorname{N}\left(\mu_{i}, \sigma\right) & \text { [likelihood] } \\ \mu_{i}=\beta_{0[group]}+\beta_{1[group]}x_{1[i]} & \text { [linear model] } \\ \end{array} \]

Then comes the matrix of varying intercepts and slopes, with it’s covariance matrix \(\mathbf{\Sigma}\):


adapted from McElreath 2023

Varying effects maths

\[ \begin{array} \text{y}_{i} \sim \operatorname{N}\left(\mu_{i}, \sigma\right) & \text { [likelihood] } \\ \mu_{i}=\beta_{0[group]}+\beta_{1[group]}x_{1[i]} & \text { [linear model] } \\ \end{array} \]

Then comes the matrix of varying intercepts and slopes, with it’s covariance matrix \(\mathbf{\Sigma}\): \[ \begin{array}{l}\begin{aligned}{\left[\begin{array}{c}\beta_{0_{group}} \\ \beta_{1_{group}}\end{array}\right] } & \sim \operatorname{MVN}\left(\left[\begin{array}{l}\beta_{0} \\ \beta_{1}\end{array}\right], \mathbf{\Sigma}\right) \text { [population of varying effects] }\\ \mathbf{\Sigma} & =\left(\begin{array}{cc}\sigma_{\beta_{0}} & 0 \\ 0 & \sigma_{\beta_{1}}\end{array}\right) \mathbf{\Omega} \left(\begin{array}{cc}\sigma_{\beta_{0}} & 0 \\ 0 & \sigma_{\beta_{1}}\end{array}\right) \text { [construct covariance matrix] } \end{aligned} \end{array} \]

adapted from McElreath 2023

Varying effects maths

\[ \begin{array} \text{y}_{i} \sim \operatorname{N}\left(\mu_{i}, \sigma\right) & \text { [likelihood] } \\ \mu_{i}=\beta_{0[group]}+\beta_{1[group]}x_{1[i]} & \text { [linear model] } \\ \end{array} \]

Then comes the matrix of varying intercepts and slopes, with it’s covariance matrix \(\mathbf{\Sigma}\): \[ \begin{array}{l}\begin{aligned}{\left[\begin{array}{c}\beta_{0_{group}} \\ \beta_{1_{group}}\end{array}\right] } & \sim \operatorname{MVN}\left(\left[\begin{array}{l}\beta_{0} \\ \beta_{1}\end{array}\right], \mathbf{\Sigma}\right) \text { [population of varying effects] }\\ \mathbf{\Sigma} & =\left(\begin{array}{cc}\sigma_{\beta_{0}} & 0 \\ 0 & \sigma_{\beta_{1}}\end{array}\right) \mathbf{\Omega} \left(\begin{array}{cc}\sigma_{\beta_{0}} & 0 \\ 0 & \sigma_{\beta_{1}}\end{array}\right) \text { [construct covariance matrix] } \end{aligned} \end{array} \]

  • \(\mathbf{\Omega}\) is a correlation matrix

adapted from McElreath 2023

Let’s break that down

  • Math on the previous slide says that each group has a \(\beta_{0[group]}\) and \(\beta_{1[group]}\) with a prior distribution defined by the two dimensional Gaussian distribution with means \(\beta_{0}\) and \(\beta_{1}\) and covariance matrix \(\mathbf{\Sigma}\)

adapted from McElreath 2023

Let’s break that down

  • Math on the previous slide says that each group has a \(\beta_{0[group]}\) and \(\beta_{1[group]}\) with a prior distribution defined by the two dimensional Gaussian distribution with means \(\beta_{0}\) and \(\beta_{1}\) and covariance matrix \(\mathbf{\Sigma}\)
    • This is a multivariate normal distribution

adapted from McElreath 2023

Let’s break that down

  • Math on the previous slide says that each group has a \(\beta_{0[group]}\) and \(\beta_{1[group]}\) with a prior distribution defined by the two dimensional Gaussian distribution with means \(\beta_{0}\) and \(\beta_{1}\) and covariance matrix \(\mathbf{\Sigma}\)
    • This is a multivariate normal distribution
  • The final line defines how we construct our covariance matrix \(\mathbf{\Sigma}\) by factoring it into a diagonal matrix of \(\sigma_{\beta_{0}}\) and \(\sigma_{\beta_{1}}\) and a correlation matrix \(\mathbf{\Omega}\)

adapted from McElreath 2023

Let’s break that down

  • Math on the previous slide says that each group has a \(\beta_{0[group]}\) and \(\beta_{1[group]}\) with a prior distribution defined by the two dimensional Gaussian distribution with means \(\beta_{0}\) and \(\beta_{1}\) and covariance matrix \(\mathbf{\Sigma}\)
    • This is a multivariate normal distribution
  • The final line defines how we construct our covariance matrix \(\mathbf{\Sigma}\) by factoring it into a diagonal matrix of \(\sigma_{\beta_{0}}\) and \(\sigma_{\beta_{1}}\) and a correlation matrix \(\mathbf{\Omega}\)
    • Several ways to construct \(\mathbf{\Sigma}\), but splitting it into standard deviations, \(\sigma_{\beta_{0}}\) and \(\sigma_{\beta_{1}}\), and a correlation matrix \(\mathbf{\Omega}\) helps with understanding

adapted from McElreath 2023

Let’s break that down

  • Math on the previous slide says that each group has a \(\beta_{0[group]}\) and \(\beta_{1[group]}\) with a prior distribution defined by the two dimensional Gaussian distribution with means \(\beta_{0}\) and \(\beta_{1}\) and covariance matrix \(\mathbf{\Sigma}\)
    • This is a multivariate normal distribution
  • The final line defines how we construct our covariance matrix \(\mathbf{\Sigma}\) by factoring it into a diagonal matrix of \(\sigma_{\beta_{0}}\) and \(\sigma_{\beta_{1}}\) and a correlation matrix \(\mathbf{\Omega}\)
    • Several ways to construct \(\mathbf{\Sigma}\), but splitting it into standard deviations, \(\sigma_{\beta_{0}}\) and \(\sigma_{\beta_{1}}\), and a correlation matrix \(\mathbf{\Omega}\) helps with learning
  • Compare this with a standard normal distribution which takes a mean and a standard deviation

adapted from McElreath 2023

The correlation matrix

  • For this simple example, the correlation matrix looks like

\[ \mathbf{\Omega}=\left(\begin{array}{ll} 1 & \rho \\ \rho & 1 \end{array}\right) \]

adapted from McElreath 2023

The correlation matrix

  • For this simple example, the correlation matrix looks like

\[ \mathbf{\Omega}=\left(\begin{array}{ll} 1 & \rho \\ \rho & 1 \end{array}\right) \]

  • where \(\rho\) is the correlation between \(\beta_{0}\) and \(\beta_{1}\)

adapted from McElreath 2023

The correlation matrix

  • For this simple example, the correlation matrix looks like

\[ \mathbf{\Omega}=\left(\begin{array}{ll} 1 & \rho \\ \rho & 1 \end{array}\right) \]

  • where \(\rho\) is the correlation between \(\beta_{0}\) and \(\beta_{1}\)
  • \(\mathbf{\Omega}\) gets more complicated for models with more varying slopes

adapted from McElreath 2023

Cholesky decomposition

  • Note that we can take any arbitrary symmetric, positive-definite matrix \(\mathbf{A}\), and factor or decompose it into
    \[ \mathbf{A}=\mathbf{L} \mathbf{L}^{T} \]

Cholesky decomposition

  • Note that we can take any arbitrary symmetric, positive-definite matrix \(\mathbf{A}\), and factor or decompose it into
    \[ \mathbf{A}=\mathbf{L} \mathbf{L}^{T} \]

  • where \(\mathbf{L}\) is a lower triangular matrix with real and positive diagonal entries and \(\mathbf{L}^{T}\) is a transpose of \(\mathbf{L}\)

Cholesky Decomposition

  • If we visualize a Cholesky decomposition

\[ \left[\begin{array}{lll} A_{00} & A_{01} & A_{02} \\ A_{10} & A_{11} & A_{12} \\ A_{20} & A_{21} & A_{22} \end{array}\right]=\left[\begin{array}{lll} L_{00} & 0 & 0 \\ L_{10} & L_{11} & 0 \\ L_{20} & L_{21} & L_{22} \end{array}\right]\left[\begin{array}{ccc} L_{00} & L_{10} & L_{20} \\ 0 & L_{11} & L_{21} \\ 0 & 0 & L_{22} \end{array}\right] \]

Cholesky Decomposition

  • If we visualize a Cholesky decomposition

\[ \left[\begin{array}{lll} A_{00} & A_{01} & A_{02} \\ A_{10} & A_{11} & A_{12} \\ A_{20} & A_{21} & A_{22} \end{array}\right]=\left[\begin{array}{lll} L_{00} & 0 & 0 \\ L_{10} & L_{11} & 0 \\ L_{20} & L_{21} & L_{22} \end{array}\right]\left[\begin{array}{ccc} L_{00} & L_{10} & L_{20} \\ 0 & L_{11} & L_{21} \\ 0 & 0 & L_{22} \end{array}\right] \]

  • This is helpful from a numerical perspective, particularly with noncentered parameterizations

Cholesky factors continued

  • Note that there is a lot of convenient linear algebra that can be done with Cholesky factors of covariance matrices \(\mathbf{L}\) or of correlation matrices \(\mathbf{L_{corr}}\)
    • For example, \[ \mathbf{L}=\left(\begin{array}{cc} \sigma_{\beta_{0}} & 0 \\ 0 & \sigma_{\beta_{1}} \end{array}\right) \mathbf{L_{corr}} \]
  • See this link for a useful review

Cholesky factors

# create a correlation matrix and declare sigmas
OMEGA <- matrix(c(1, 0.7, 0.7, 1), nrow = 2)
sigmas <- c(1, 2) # sd_b0, sd_b1

OMEGA
     [,1] [,2]
[1,]  1.0  0.7
[2,]  0.7  1.0
sigmas
[1] 1 2
# note also
diag(sigmas) # diagonal matrix 
     [,1] [,2]
[1,]    1    0
[2,]    0    2

Cholesky factors

# calculate covariance matrix:
SIGMA <- diag(sigmas) %*% OMEGA %*% diag(sigmas)
SIGMA
     [,1] [,2]
[1,]  1.0  1.4
[2,]  1.4  4.0

Cholesky factors

# convert Cholesky factor of correlation matrix 
# to covariance Cholesky factor 
L_corr <- t(chol(OMEGA)) # note chol() returns upper triangular matrix
diag(sigmas) %*% L_corr
     [,1]     [,2]
[1,]  1.0 0.000000
[2,]  1.4 1.428286
t(chol(SIGMA)) # L of SIGMA
     [,1]     [,2]
[1,]  1.0 0.000000
[2,]  1.4 1.428286

Cholesky factors

# Cholesky factor of correlation matrix to 
# covariance matrix Cholesky factor:
L_corr <- t(chol(OMEGA)) # note chol() returns upper tri
Lambda <- diag(sigmas) %*% L_corr

t(chol(SIGMA)) # L of SIGMA
     [,1]     [,2]
[1,]  1.0 0.000000
[2,]  1.4 1.428286
Lambda
     [,1]     [,2]
[1,]  1.0 0.000000
[2,]  1.4 1.428286

Cholesky factors

# generate random values with desired covariance
Z <- rbind(rnorm(1e4),rnorm(1e4))
X <- Lambda %*% Z
par(mfrow=c(1,2))
plot(Z[1,]~Z[2,], main = "uncorrelated deviates", col = "firebrick")
plot(X[1,]~X[2,], main = "correlated deviates", col = "firebrick")

A problem

  • People want to know the extent to which juvenile walleye growth rate is density dependent
    • Has implications for both basic ecology and management
  • DNR Biologists go to a collection of lakes and measure length of age-0 walleye in fall as a proxy of juvenile growth rate
  • Each year, the biologists attempt to go to 30 lakes in total (weather pending)
    • They also conduct surveys to get an estimate of juvenile density
  • Let’s simulate some fake data representing this problem, and then build some Stan models to recover
  • go to the varying_effects.r script

see also Cahill et al. 2020

Hyperpriors for varying effects model

\[ \begin{array}{l}\begin{aligned}\beta_{0} & \sim \operatorname{Normal}(0,25) & \text { [prior for average intercept] }\\ \beta_{1} & \sim \operatorname{Normal}(0,25) & \text { [prior for average slope] } \\ \sigma & \sim \operatorname{Exponential}(0.01) & \text { [prior for stddev within group] }\\ \sigma_{\beta_{0}} & \sim \operatorname{Exponential}(0.01) & \text{[prior stddev among intercepts]}\\ \sigma_{\beta_{1}} & \sim \operatorname{Exponential}(0.01) & \text{[prior stddev among intercepts]}\\ \mathbf{\Omega} & \sim \operatorname{LKJ} \operatorname{corr}(2) & \text{[prior for correlation matrix]} \end{aligned}\\ \end{array} \]

  • \(\operatorname{LKJ} \operatorname{corr}(2)\) defines a weakly informative prior on \(\rho\) that is skeptical of extreme correlations near -1 or 1

adapted from McElreath 2023

Visualizing the LKJcorr prior

  • \(\operatorname{LKJ} \operatorname{corr}(1)\)

Visualizing the LKJcorr prior

  • \(\operatorname{LKJ} \operatorname{corr}(2)\)

Visualizing the LKJcorr prior

  • \(\operatorname{LKJ} \operatorname{corr}(4)\)

Visualizing the LKJcorr prior

  • \(\operatorname{LKJ} \operatorname{corr}(10)\)

Why go through all of that

Wrap up

  • Introduced a powerful extension to mixed effects models: varying effects
  • Went through a bunch of math to show how to play with multivariate normal distributions
  • Simulated an example, conducted prior predictive checks, estimated the model
  • Showed a non-centered version of this model as well
  • Up next, spatial random effects

References

Cahill et al. 2020. A spatial-temporal approach to modeling somatic growth across inland fisheries landscapes. CJFAS.

Gelman, A. and J. Hill. 2007. Data analysis using regression and multilevel/hierarchical models

McElreath 2023. Statistical Rethinking.

Simpson’s paradox Wikipedia: https://en.wikipedia.org/wiki/Simpson%27s_paradox

https://mlisi.xyz/post/simulating-correlated-variables-with-the-cholesky-factorization/