Runtime warnings, errors, and debugging in R and Stan

FW 891
Click here to view presentation online

Christopher Cahill

13 September 2023

Purpose

Talk through some common errors and warning messages in Stan
Most of this is available on the Stan webpage
- Can be useful for debugging or improving models
How I debug Stan files (probably the only way)
How I debug R code (not the only way)

Runtime warnings and convergence problems webpage and cmdstan documentation:

https://mc-stan.org/misc/warnings.html

https://mc-stan.org/docs/2_24/cmdstan-guide-2_24.pdf

Overview

Stan throws a range of diagnostics, and throws warnings or errors for anything suspicious
Some of these are more important than others, but in general having an understanding of these warnings and errors can help you debug models or figure out problems with your model
Figuring out what’s going on is as much an art as science

The folk theorem of statistical computing

This is a pithy rule that originated from Andrew Gelman (author of Bayesian Data Analysis). It is paraphrased as

“When you have computational problems, often there is a problem with your model [rather than the software]”—— Andrew Gelman

Types of warnings

Divergent transitions after warmup
Exceptions thrown when Hamiltonian proposal rejected
R-hat
Bulk and Tail ESS
Maximum treedepth
BFMI low

Divergent transitions after warmup

Probably the most important warning
Commonplace with mixed effects models
Hamiltonian Monte Carlo provides internal checks on efficiency and accuracy
- These come free, arising from constraints of the physics simulation
Recall that HMC simulates a frictionless particle
- In any given transition, total energy at the start of a simulation should be equal to the total energy at the end
- When this doesn’t happen, the energy is “divergent”
  - often means the posterior distribution is very steep in some region of parameter space, hard to sample

McElreath 2023

Divergent transitions after warmup

Critical to recognize that divergent transitions cannot be safely ignored if completely reliable inference is desired
- Proceed at your own risk: you may not be faithfully representing the posterior uncertainty if you ignore DTs
If you only get a few divergences and good Rhat values and ESS, the resulting posterior is often good enough to move forward
There are cases where a small number of divergences without any pattern in their locations can be shown to be unimportant, but this requires a lot of simulation and math
The bottom line is your inferences may be biased if there are DTs after warmup

Diagnosing and eliminating divergences

Divergent transitions after warmup

How to deal with DTs:

Double check the code (even coding errors can cause DTs)
Increase the adaptation target acceptance statistic, adapt_delta, > 0.8
- Must be between 0,1, so values >> 0.999 have diminishing returns
- The step size used by the numerical integrator underlying Stan is a function of adapt_delta, so this results in a smaller step size and (sometimes) fewer divergences
If (2) does not work, consider priors or reparameterizing the model
- Plot the data, hone in on where in the model the divergent transitions are occurring
- Lots of pairs plots, divergent transition plots
- DTs can indicate many things, and often reveal information about your experimental design or the lack of information content contained in your data
Weep 😭

https://mc-stan.org/misc/warnings.html

Exceptions thrown when Hamiltonian proposal rejected

Exception thrown at line 24: normal_log: Scale parameter is 0, 
but must be positive! 
-----
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.`

First warning indicates that the standard deviation parameter of the normal distribution is zero, but it must be positive for Stan to compute the density function
The second message indicates that the gradient of the target (as computed by Stan’s automatic differention) is infinite, indicating numerical problems somewhere in the model but unfortunately without clear information about where exactly.

Exceptions thrown when Hamiltonian proposal rejected

Exception thrown at line 24: normal_log: Scale parameter is 0, 
but must be positive! 
-----
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.`

Note: it is possible that even in a correctly specified model a numerical inaccuracy is occasionally seen, especially during warmup
- For example, scale parameter with zero lower bound can still become numerically indistinguishable from zero

https://mc-stan.org/misc/warnings.html

Split-\(\widehat{R}\) aka Rhat

This statistic compares means and variances of the chains
Remember, we use several chains to make convergence diagnostics easier, starting chains at different starting points
Values < 1.1 often sufficient
- We remove draws from the beginning of the chains and run the chains long enough so that it is not possible to distinguish where each chain started
- A fair amount of math behind this (see link below)

https://avehtari.github.io/masterclass/slides_rhat_neff.pdf

Some slides from Aki Vehtari on convergence to a common posterior

See here:

https://avehtari.github.io/masterclass/slides_rhat_neff.pdf

Nonconvergence

https://avehtari.github.io/masterclass/slides_rhat_neff.pdf

Nonconvergence viewed another way

https://avehtari.github.io/masterclass/slides_rhat_neff.pdf

Visually converged chains

https://avehtari.github.io/masterclass/slides_rhat_neff.pdf

High \(\widehat{R}\) values

Indicate the chains have not mixed well
Good reasons to NOT think of them as being fully representative of the posterior = bad inference
When there are also DTs in the model, just another symptom of problematic geometry that caused the divergences
If high \(\widehat{R}\) without other warnings, often associated with posteriors that have multiple, well-separated modes
Running chains longer may help

https://mc-stan.org/misc/warnings.html; Gelman et al. 2003

Bulk and tail Effective Sample Size (ESS)

Roughly speaking, ESS tells you how many independent draws contain the same amount of information as the dependent sample obtained by the MCMC algorithm
Higher is better
For final analyses, Stan recommends bulk-ESS > 100x the number of chains
- 4 chains, want rank-normalized ESS of at least 400
- Early in an analysis, ESS > 20 is fine

https://mc-stan.org/misc/warnings.html

Bulk and tail Effective Sample Size (ESS)

Tail-ESS computes the minimum of the ESS of the 5th and 95th quantiles
- Warnings about tail-ESS can help diagnose slow mixing in the tails
These warnings are often accompanied by large \(\widehat{R}\) values
Useful quick summary, but for final results it can be useful to check Monte Carlo standard error for quantities of interest (not shown)
Introduce thinning, and the only reason to do it

https://mc-stan.org/docs/reference-manual/effective-sample-size.html

Maximum treedepth warnings

Not as serious as other warnings
- DTs, high \(\widehat{R}\) and low ESS are a validity concern
- Maximum treedepth warnings are an efficiency concern
Configuring the NUTS sampler (variant of HMC used by Stan) involves putting a cap on the number of simulation steps it evaluates during each iteration
- This is controlled by the max_treedepth parameter, where maximum number of steps = 2^max_treedepth
If you are only getting this warning and other diagnostics are good, likely safe to ignore (but you may lose out on efficiency)

<https://mc-stan.org/misc/warnings.html#bulk-and-tail-ess >

Bayesian Fraction of Missing Information (BFMI)

Example:

The E-BFMI 0.2786 is below the nominal threshold of 0.3 
which suggests that HMC may have trouble exploring the target 
distribution.

There is an energy__ output from Stan’s samplers is used to diagnose the accuracy of HMC
Implication of this warning is that HMC is not exploring the posterior well, in particular the tails of the posterior
- Run the model longer or reparameterize your model
Mathematically dense (see Betancourt 2016) https://arxiv.org/pdf/1604.00695.pdf

Debugging Stan code

Stan provides print() statements that can print literal strings and the values of expressions
print(): accepts any number of arguments

for (n in 1:N) { 
    print("loop iteration: ", n); ... 
}

Each time the loop body is executed, it will print the string “loop iteration:” (with the trailing space), followed by the value of the expression n, followed by a new line

https://mc-stan.org/docs/reference-manual/print-statements.html

Useful things to do with `print()`

Very useful for debugging, particularly when coupled with target()
- target() is the log density accumulator, and is actually a reserved word in Stan

vector[2] y;
y[1] = 1;
print("log density before =", target());
y ~ normal(0,1);  // bug!  y[2] not defined
print("log density after =", target());

https://mc-stan.org/docs/reference-manual/print-statements.html

Debugging R code

Ten minute warp-speed lesson on debugging in R
Several ways to do this, but here we’ll do a demo of the functions traceback() and browser()

`traceback()`

foo <- function(x) { print(1); bar(2) }
bar <- function(x) { x + a.variable.which.does.not.exist }
# foo(2) # gives a strange error
# Error in bar(2) : object 'a.variable.which.does.not.exist' not found

traceback()
2: bar(2) at #1
1: foo(2)

identifies which function generated the error

`browser()`

Abstract and not well implemented (Belinsky, pers comm 2023)
This is what I (Cahill) mainly use to debug R code
Interrupts the execution of an expression and allow the inspection of the environment where browser was called from

my_fx <- function(x){
    y <- x^2
    z <- y - x + 3
    a <- z - 1 / y
    a
}

my_fx(1)

[1] 2

`browser()`

# demonstrate some weird behavior:
my_fx(NA)

[1] NA

my_fx(3)

[1] 8.888889

my_fx(0)

[1] -Inf

Put a browser() in it to win it

https://stat.ethz.ch/R-manual/R-devel/library/base/html/browser.html

`browser()`

my_fx <- function(x){
    browser() # slap a browser in it
    y <- x^2
    z <- y - x + 3
    a <- z - 1 / y
    a
}

my_fx(0) # call the function

Go to R to demonstrate how to use
Note this works with for-loops and other horrid things

Summary and outlook

Went through some common Stan runtime warnings and error messages
- Discussed that some checks are about the validity of inferences while others are about efficiency
Debugging Stan code via print() statements
Debugging R code via traceback() and browser()
Next few weeks, move into nonlinear and mixed effects modeling (where we will experience some of these warnings/issues)

References

Betancourt, M. 2016. Diagnosing suboptimal cotangent disintegrations in Hamiltonian Monte Carlo. https://arxiv.org/pdf/1604.00695.pdf

Gelman et al. 2003. Bayesian Data Analysis.

McElreath 2023. Statistical Rethinking. Second Edition.

Stan documentation. 2023. https://mc-stan.org/docs/reference-manual/index.html

Runtime warnings, errors, and debugging in R and Stan

Purpose

Runtime warnings and convergence problems webpage and cmdstan documentation:

Overview

The folk theorem of statistical computing

Types of warnings

Divergent transitions after warmup

Divergent transitions after warmup

Divergent transitions after warmup

Exceptions thrown when Hamiltonian proposal rejected

Exceptions thrown when Hamiltonian proposal rejected

Split-\(\widehat{R}\) aka Rhat

Some slides from Aki Vehtari on convergence to a common posterior

Nonconvergence

Nonconvergence viewed another way

Visually converged chains

High \(\widehat{R}\) values

Bulk and tail Effective Sample Size (ESS)

Bulk and tail Effective Sample Size (ESS)

Maximum treedepth warnings

Bayesian Fraction of Missing Information (BFMI)

Debugging Stan code

Useful things to do with print()

Debugging R code

traceback()

browser()

browser()

browser()

Summary and outlook

References

Useful things to do with `print()`

`traceback()`

`browser()`

`browser()`

`browser()`