In Bayesian inference, the most common method of assessing the goodness of fit of an estimated statistical model is a generalization of the frequentist Akaike Information Criterion (AIC). The Bayesian method, like AIC, is not a test of the model in the sense of hypothesis testing, though Bayesian inference has Bayes factors for such purposes. Instead, like AIC, Bayesian inference provides a model fit statistic that is to be used as a tool to refine the current model or select the better-fitting model of different methodologies.
To begin with, model fit can be summarized with deviance, which is defined as -2 times the log-likelihood (Gelman et al., 2004, p. 180), such as
Just as with the likelihood, p(y|Θ), or log-likelihood, the deviance exists at both the record- and model-level. Due to the development of BUGS software (Gilks, Thomas, and Spiegelhalter, 1994), deviance is defined differently in Bayesian inference than frequentist inference. In frequentist inference, deviance is -2 times the log-likelihood ratio of a reduced model compared to a full model, whereas in Bayesian inference, deviance is simply -2 times the log-likelihood. In Bayesian inference, the lowest expected deviance has the highest posterior probability (Gelman et ̃al. 2004, p. 181).
It is possible to have a negative deviance. Deviance is derived from the likelihood, which is derived from probability density functions (PDFs). Evaluated at a certain point in parameter space, a PDF can have a density larger than 1 due to a small standard deviation or lack of variation. Likelihoods greater than 1 lead to negative deviance, and are appropriate.
On its own, the deviance is an insufficient model fit statistic, because it does not take model complexity into account. The effect of model fitting, pD, is used as the 'effective number of parameters' of a Bayesian model. The sum of the differences between the posterior mean of the model-level deviance and the deviance at each draw i of θi is the pD.
A related way to measure model complexity is as half the posterior variance of the model-level deviance, known as pV (Gelman et al., 2004, p. 182)
The effect of model fitting, pD or pV, can be thought of as the number of 'unconstrained' parameters in the model, where a parameter counts as: 1 if it is estimated with no constraints or prior information; 0 if it is fully constrained or if all the information about the parameter comes from the prior distribution; or an intermediate value if both the data and the prior are informative (Gelman et al., 2004, p. 182). Therefore, by including prior information, Bayesian inference is more efficient in terms of the effective number of parameters than frequentist inference. Hierarchical, mixed effects, or multilevel models are even more efficient regarding the effective number of parameters.
Model complexity, pD or pV, should be positive. Although pV must be positive since it is related to variance, it is possible for pD to be negative, which indicates one or more problems: log-likelihood is non-concave, a conflict between the prior and the data, or that the posterior mean is a poor estimator (such as with a bimodal posterior).
The sum of both the mean model-level deviance and the model complexity (pD or pV) is the Deviance Information Criterion (DIC), a model fit statistic that is also an estimate of the expected loss, with deviance as a loss function (Spiegelhalter, Best, and Carlin, 1998; Spiegelhalter, Best, Carlin, and van der Linde, 2002). DIC is
DIC may be compared across different models and even different methods, as long as the dependent variable does not change between models, making DIC the most flexible model fit statistic. DIC is a hierarchical modeling generalization of the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Like AIC and BIC, it is an asymptotic approximation as the sample size becomes large. DIC is valid only when the joint posterior distribution is approximately multivariate normal. Models should be preferred with smaller DIC. Since DIC increases with model complexity (pD or pV), simpler models are preferred.
It is difficult to say what would constitute an important difference in DIC. Very roughly, differences of more than 10 might rule out the model with the higher DIC, differences between 5 and 10 are substantial, but if the difference in DIC is, say, less than 5, and the models make very different inferences, then it could be misleading just to report the model with the lowest DIC.
The Widely Applicable Information Criterion (WAIC) is an information criterion that is more fully Bayesian than DIC. WAIC is more difficult to calculate because the record-level log-likelihood is required over numerous samples. However, when available, the result more closely resembles leave-one-out cross-validation (LOO-CV).
The Bayesian Predictive Information Criterion (BPIC) was introduced as a criterion of model fit when the goal is to pick a model with the best out-of-sample predictive power (Ando, 2007). BPIC is a variation of DIC where the effective number of parameters is 2pD (or 2pV). BPIC may be compared between ynew and yholdout, and has many other extensions, such as with Bayesian Model Averaging (BMA).