## Bayes Factors

Introduced by Harold Jeffreys, a 'Bayes factor' is a Bayesian alternative to frequentist hypothesis testing that is most often used for the comparison of multiple models by hypothesis testing, usually to determine which model better fits the data (Jeffreys, 1961). Bayes factors are notoriously difficult to compute, and the Bayes factor is only defined when the marginal density of **y** under each model is proper. However, Bayes factors are easy to approximate with the Laplace-Metropolis Estimator (Kass and Raftery, 1995; Lewis and Raftery, 1997).

Hypothesis testing with Bayes factors is more robust than frequentist hypothesis testing, since the Bayesian form avoids model selection bias, evaluates evidence in favor the null hypothesis, includes model uncertainty, and allows non-nested models to be compared (though of course the model must have the same dependent variable). Also, frequentist significance tests become biased in favor of rejecting the null hypothesis with sufficiently large sample size.

The Bayes factor for comparing two models may be approximated as the ratio of the marginal likelihood of the data in model 1 and model 2. Formally, the Bayes factor in this case is

where *p*(**y**|M_{1}) is the marginal likelihood of the data in model 1.

The Bayes factor, B, is the posterior odds in favor of the hypothesis divided by the prior odds in favor of the hypothesis, where the hypothesis is usually M_{1} > M_{2}. Put another way,

For example, when B = 2, the data favor M_{1} over M_{2} with 2:1 odds.

Here is an interpretation of the strength of evidence for Bayes factor B:

Bayes Factor |
Strength of Evidence |

Strong against | |

Substantial against | |

Barely worth mentioning against | |

Barely worth mentioning for | |

Substantial for | |

Strong for | |

The above interpretation is by Jeffreys (1961), though he included two additional categories: very strong for B > 30, and decisive for B > 100. The above table has also been extended below 1. Other interpretations of Bayes factors have been proposed as well.

In a non-hierarchical model, the marginal likelihood may easily be approximated with the Laplace-Metropolis Estimator for model *m* as

where *d* is the number of parameters and Σ is the inverse of the negative of the Hessian matrix of second derivatives. Lewis and Raftery (1997) introduce the Laplace-Metropolis method of approximating the marginal likelihood in MCMC, though it naturally works with Laplace Approximation as well. For a hierarchical model that involves both fixed and random effects, the Compound Laplace-Metropolis Estimator must be used.

Gelman finds Bayes factors generally to be irrelevant, because they compute the relative probabilities of the models conditional on one of them being true. Gelman prefers approaches that measure the distance of the data to each of the approximate models (Gelman et al., 2004, p. 180). However, Kass and Raftery (1995) explain that "the logarithm of the marginal probability of the data may also be viewed as a predictive score. This is of interest, because it leads to an interpretation of the Bayes factor that does not depend on viewing one of the models as 'true'".

Three of many possible alternatives are to use

- pseudo Bayes factors (PsBF) based on a ratio of pseudo marginal likelihoods (PsMLs)
- Deviance Information Criterion (DIC)
- Widely Applicable Information Criterion (WAIC)

DIC is the most popular method of assessing model fit and comparing models, though Bayes factors are better, when appropriate, because they take more into account. WAIC is a newer criterion.