Model selection and cross validation

FW 891
Click here to view presentation online

Christopher Cahill

8 November 2023

Purpose

Goal
A philosophical preface to model selection
k-fold and loo cross validation
Performance criteria
Challenges
Approximate methods to loo cross validation
R and Stan demo on how to implement these ideas

Useful reference on cross validation in Stan:

https://users.aalto.fi/~ave/CV-FAQ.html

TLDR: Model selection is hard and requires careful thought

Model selection: are we stuck between the devil and the deep blue sea?

Burnham and Anderson 1998; Navarro 2018; Bolker 2023

Between the devil and the deep blue sea?

We can calculate how well we predict things

Navarro 2018; Bolker 2023

Between the devil and the deep blue sea?

We can calculate how well we predict things
The difference between prediction and inference

Navarro 2018; Bolker 2023

Between the devil and the deep blue sea?

We can calculate how well we predict things
The difference between prediction and inference
If scientific reasoning takes place in a world where all our models are systematically wrong in some sense, what do we hope to achieve by selecting a model?

Navarro 2018; Bolker 2023

Between the devil and the deep blue sea?

We can calculate how well we predict things
The difference between prediction and inference
If scientific reasoning takes place in a world where all our models are systematically wrong in some sense, what do we hope to achieve by selecting a model?
The devil: statistical decision making

Navarro 2018; Bolker 2023

Between the devil and the deep blue sea?

We can calculate how well we predict things
The difference between prediction and inference
If scientific reasoning takes place in a world where all our models are systematically wrong in some sense, what do we hope to achieve by selecting a model?
The devil: statistical decision making
The deep blue sea: addressing scientific questions

Navarro 2018; Bolker 2023

Between the devil and the deep blue sea?

We can calculate how well we predict things
The difference between prediction and inference
If scientific reasoning takes place in a world where all our models are systematically wrong in some sense, what do we hope to achieve by selecting a model?
The devil: statistical decision making
The deep blue sea: addressing scientific questions
A question well worth pondering that I have no intention of answering:

Navarro 2018; Bolker 2023

Between the devil and the deep blue sea?

We can calculate how well we predict things
The difference between prediction and inference
If scientific reasoning takes place in a world where all our models are systematically wrong in some sense, what do we hope to achieve by selecting a model?
The devil: statistical decision making
The deep blue sea: addressing scientific questions
A question well worth pondering that I have no intention of answering:
- Are scientific model selection questions addressable with statistical tools?

Navarro 2018; Bolker 2023

With that in mind

Arthur Schopenhauer the philosophy Bunny peering into the inferential abyss

Introduction

After fitting a Bayesian model, we often want to measure its predictive accuracy

Vehtari et al. 2019

Introduction

After fitting a Bayesian model, we often want to measure its predictive accuracy
We might do this to:

Vehtari et al. 2019

Introduction

After fitting a Bayesian model, we often want to measure its predictive accuracy
We might do this to:
- For its own sake

Vehtari et al. 2019

Introduction

After fitting a Bayesian model, we often want to measure its predictive accuracy
We might do this to:
- For its own sake
- Compare models

Vehtari et al. 2019

Introduction

After fitting a Bayesian model, we often want to measure its predictive accuracy
We might do this to:
- For its own sake
- Compare models
- Model selection

Vehtari et al. 2019

Introduction

After fitting a Bayesian model, we often want to measure its predictive accuracy
We might do this to:
- For its own sake
- Compare models
- Model selection
- Model averaging

Vehtari et al. 2019

What is cross validation?

Cross validation is a family of techniques that try to estimate how well a model would predict previously unseen data

Vehtari 2023

What is cross validation?

Cross validation is a family of techniques that try to estimate how well a model would predict previously unseen data
- Typically do this by fitting the model to some subset of the data, and then predicting the left out data

Vehtari 2023

What is cross validation?

Cross validation is a family of techniques that try to estimate how well a model would predict previously unseen data
- Typically do this by fitting the model to some subset of the data, and then predicting the left out data
Cross validation can be used to

Vehtari 2023

What is cross validation?

Cross validation is a family of techniques that try to estimate how well a model would predict previously unseen data
- Typically do this by fitting the model to some subset of the data, and then predicting the left out data
Cross validation can be used to
- Assess the predictive performance of a single model

Vehtari 2023

What is cross validation?

Cross validation is a family of techniques that try to estimate how well a model would predict previously unseen data
- Typically do this by fitting the model to some subset of the data, and then predicting the left out data
Cross validation can be used to
- Assess the predictive performance of a single model
- Assess model misspecification

Vehtari 2023

What is cross validation?

Cross validation is a family of techniques that try to estimate how well a model would predict previously unseen data
- Typically do this by fitting the model to some subset of the data, and then predicting the left out data
Cross validation can be used to
- Assess the predictive performance of a single model
- Assess model misspecification
- Compare multiple models

Vehtari 2023

What is cross validation?

Cross validation is a family of techniques that try to estimate how well a model would predict previously unseen data
- Typically do this by fitting the model to some subset of the data, and then predicting the left out data
Cross validation can be used to
- Assess the predictive performance of a single model
- Assess model misspecification
- Compare multiple models
- Select a single model from multiple candidates

Vehtari 2023

What is cross validation?

Cross validation is a family of techniques that try to estimate how well a model would predict previously unseen data
- Typically do this by fitting the model to some subset of the data, and then predicting the left out data
Cross validation can be used to
- Assess the predictive performance of a single model
- Assess model misspecification
- Compare multiple models
- Select a single model from multiple candidates
- Combine the predictions of multiple models

Vehtari 2023

K-fold and leave-one-out cross validation

K-fold cross validation refers to splitting a dataset into K approximately equal sized chunks

K-fold and leave-one-out cross validation

K-fold cross validation refers to splitting a dataset into K approximately equal sized chunks
- Often K = 10

K-fold and leave-one-out cross validation

K-fold cross validation refers to splitting a dataset into K approximately equal sized chunks
- Often K = 10
Procedure:
- Estimate the model on K-1 of the chunks then predict the left out chunk

K-fold and leave-one-out cross validation

K-fold cross validation refers to splitting a dataset into K approximately equal sized chunks
- Often K = 10
Procedure:
- Estimate the model on K-1 of the chunks then predict the left out chunk
- Repeat this process until we’ve shuffled through each chunk or fold of the data

K-fold and leave-one-out cross validation

K-fold cross validation refers to splitting a dataset into K approximately equal sized chunks
- Often K = 10
Procedure:
- Estimate the model on K-1 of the chunks then predict the left out chunk
- Repeat this process until we’ve shuffled through each chunk or fold of the data
Leave one out (LOO) cross validation represents the limit of K-fold cross validation, where K equals number of data points

Some measures of predictive accuracy

Mean Square Error (MSE)

\[ \frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-\mathrm{E}\left(y_{i} \mid \theta\right)\right)^{2} \]

\(y_{i}\) is data point i
\(\theta\) represent fitted model parameters
proportional to MSE if model is normal with constant variance
Easy to compute and understand, but less appropriate for non-normal models

Gelman et al. 2014; Vehtari et al. 2016

Expected log pointwise predictive density (elpd)

Consider data \(y_{1}, ... , y_{n}\) modeled as independent given parameters \(\theta\)
Also suppose we have a prior distribution \(p(\theta)\) yielding a posterior \(p(\theta \mid y)\)
And a posterior predictive distribution \(p(\tilde{y} \mid y)=\int p\left(\tilde{y}_{i} \mid \theta\right) p(\theta \mid y) d \theta\)

Gelman et al. 2014; Vehtari et al. 2016

Expected log pointwise predictive density (elpd)

Consider data \(y_{1}, ... , y_{n}\) modeled as independent given parameters \(\theta\)
Also suppose we have a prior distribution \(p(\theta)\) yielding a posterior \(p(\theta \mid y)\)
And a posterior predictive distribution \(p(\tilde{y} \mid y)=\int p\left(\tilde{y}_{i} \mid \theta\right) p(\theta \mid y) d \theta\)
We can then define a measure of predictive accuracy for the n data points as:

Gelman et al. 2014; Vehtari et al. 2016

Expected log pointwise predictive density (elpd)

Consider data \(y_{1}, ... , y_{n}\) modeled as independent given parameters \(\theta\)
Also suppose we have a prior distribution \(p(\theta)\) yielding a posterior \(p(\theta \mid y)\)
And a posterior predictive distribution \(p(\tilde{y} \mid y)=\int p\left(\tilde{y}_{i} \mid \theta\right) p(\theta \mid y) d \theta\)
We can then define a measure of predictive accuracy for the n data points as:

\[ \begin{aligned} \text { elpd } & =\text { expected } \log \text { pointwise predictive density for a new dataset } \\ & =\sum_{i=1}^{n} \log \left(\frac{1}{S} \sum_{s=1}^{S} p\left(y_{i} \mid \theta^{s}\right)\right) . \end{aligned} \]

Gelman et al. 2014; Vehtari et al. 2016

Expected log pointwise predictive density (elpd)

Consider data \(y_{1}, ... , y_{n}\) modeled as independent given parameters \(\theta\)
Also suppose we have a prior distribution \(p(\theta)\) yielding a posterior \(p(\theta \mid y)\)
And a posterior predictive distribution \(p(\tilde{y} \mid y)=\int p\left(\tilde{y}_{i} \mid \theta\right) p(\theta \mid y) d \theta\)
We can then define a measure of predictive accuracy for the n data points as:

where \(\theta^{s}\) represent posterior simulations from \(s=1, \ldots, s\)

Gelman et al. 2014; Vehtari et al. 2016

Some extensions to simple k-fold cross validation

Often the data are subsetted randomly; however, this may not always represent the relevant prediction task

Roberts et al. 2017

Some extensions to simple k-fold cross validation

Often the data are subsetted randomly; however, this may not always represent the relevant prediction task
Ecological data are commonly correlated in space, time, groups, or even phylogenetic structure

Roberts et al. 2017

Some extensions to simple k-fold cross validation

Often the data are subsetted randomly; however, this may not always represent the relevant prediction task
Ecological data are commonly correlated in space, time, groups, or even phylogenetic structure
- Dependency in groups, space, or time

Roberts et al. 2017

Some extensions to simple k-fold cross validation

Often the data are subsetted randomly; however, this may not always represent the relevant prediction task
Ecological data are commonly correlated in space, time, groups, or even phylogenetic structure
- Dependency in groups, space, or time
Many strategies we can use depending on our prediction task

Roberts et al. 2017

Cross validation and LOO have many limitations