Empirical Assignment of Modeling Errors

The Bayesian statistical framework makes the tradeoffs in modeling distinct data types explicit through assignment of error-terms. A typical empirical-data approach is to initially adopt a reasonable set of errors (\(\sigma_0\))—often based on modeling experience or prior studies—and then iteratively rescale the overall errors within each data-type until average model residuals match the imposed errors within each data-type category: \[ \sigma^2_{t+1}(\{d\}) = \langle \Delta y_t^2(\{ d \})\rangle \] where \(\Delta y_t\) is the current estimate of model misfit (at calibration time \(t\)) for a particular subset of the data \(\{d\}\), and \(\sigma_{t+1}\) is the error update for the next iteration of the calibration process \((t+1)\). Convergence of model residuals and errorbar values implies that each data point deviates within \(\pm1\sigma\) of its model value (on average), reflecting an error estimate that captures (by construction) the empirical deviations of data from model. This method is generally appropriate in cases where model-limitations dominate the error budget, as is the case for many geological modeling applications. This error-reweighting scheme explicitly encapsulates the typical (unstated) assumption that the final best-fit model is a useful representation of the processes that generated the data (complex rock formation history in the case of MCS) and that remaining deviations from the model can be treated as residual statistical fluctuations. Therefore, the errors determined through this method actually provide a fully quantified summary of the benefits and limitations of a particular set of modeling choices.

Sensitivity to initial assignment of model-dominated errors

One of the clear potential drawbacks of this empirical model-error assessment scheme is its dependence on initially assumed values. If a modeling exercise is dominated by measurement uncertainties (where all underlying processes that generated the data are directly and accurately captured), then no iteration is required and the fitting algorithm will immediately converge on a single (accurate) family of solutions. In the far more typical case of simulation using incomplete models, the model must strike a compromise between the conflicting evidence being presented by the data. In truth, this apparent conflict is resolvable (in principle) by combining a more sophisticated model and possibly a more complete dataset, but we are charged to make progress with the data and model in hand.

This inherent compromise introduces an ambiguity into the modeling exercise, in which the researcher’s choices about how to weight the various data types injects an implicit bias toward one family of solutions over another. In some cases, such a bias is welcome as it stems from a priori knowledge about the system or deep physical insight about the plausibility of particular scenarios. In most cases, however, this bias is the accidental consequence of the initially assumed data weights; the iterative error update scheme helps to partially mitigate against this danger, but it remains easy to unwittingly fall into one family of solutions over another, perhaps without even realizing that a tradeoff was made. The most insidious example of this accidental bias occurs when analytical uncertainties are (seemingly reasonably) adopted for each measurement, but the system is actually dominated by model errors. In this case, the calibration will unreasonably tend to give more credence to data with the smallest analytical errors—ignoring that such accuracy is impossible to achieve simultaneously for all measurements given model limitations—and the resulting model will be hopelessly biased toward some data and be forced to give correspondingly little weight inappropriately to other data. Due to the zero-sum nature of data-weights in model calibration, a seemingly innocuous initial assignment of analytical errors will result in a lopsided compromise between the conflicting data that remains entirely hidden from the modeler. It is therefore critical to explore and resolve the issue of relative data-weighting through sensitivity analysis.

Bayesian sensitivity analysis for mixed-data models

In the Bayesian statistical framework, assumed values for model-dominated errors are viewed as additional parameters of the statistical model, and their impact on physical/chemical model results are explored through a systematic sensitivity analysis. The complete set of model parameters are thus composed of the parameters describing the physical processes being studied along with the additional statistical model parameters (or hyperparameters), describing the data error-model needed to holistically combine all measurement constraints on the system. A complete analysis of the system requires exploration of both the physical parameters as well as the statistical ones. In addition to sampling the range of plausible physical parameter values, hyperparameters controlling the error-model are also sampled within a range of reasonable values. Ideally, we can demonstrate that within these plausible ranges, either the model solution is not particularly sensitive to the assumed error-model or alternatively we can marginalize (or combine results through averaging) across plausible value ranges. To implement this procedure, we must therefore quantify our assumptions about the acceptable range of model deviations for each observational data type. Establishing appropriate values and bounds for these hyperparameters is a non-trivial task involving considerable exploration, as it codifies deep intuition and past experience with both the underlying data and the limitations of this class of physical/chemical models, but can be made simpler and more explicit by decomposing the different sources of model-variance for each data-type.

Decomposing errors into variability & relative weights

One of the most challenging aspects of assigning errorbars for model calibration to mixed data-types is that the exercise inherently involves comparing apples and oranges. The different data classes typically represent separate measurement techniques, using different sample preparation methods, observing disparate characteristics of the system, and have unique non-overlapping sensitivities to distinct subsets of the model parameters. While top-down direct assignment of errorbars mathematically accomplishes placing all measurements on a common statistical deviance scale, the researcher has almost no methodological basis for selecting the magnitudes of these errors. The errors themselves reflect a combination of analytical uncertainties, model limitations, and the researcher’s own biases toward faithfully reproducing some physical phenomena over others, depending on the intended uses for the resulting model. Selecting error magnitudes in this space is too difficult a task to expect quality results that do not obfuscate the choices and tradeoffs being made.

To improve clarity and enable a more data-based approach, we decompose these errors into the separate contributing factors, exposing the choices being made and providing better guidance for value selection. We propose to split the error-model uncertainties \((\sigma_{tot})\) into three sources: \[ \sigma_{tot}^2 = (\sigma_{var}^2 + \sigma_{meas}^2) / w_{bias} \] where \(\sigma_{var}\) is the natural variability of the quantity measured, \(\sigma_{meas}\) is the analytical measurement uncertainty, and \(w_{bias}\) is the overall (positive) model-bias weight. The relative importance of each data type is thus tuned by selecting model-bias weights above (or below) one, thereby shrinking (or inflating) errors to guide the calibration process. By decomposing the overall error weights \(\sigma_{tot}\) into these three separate sources, we capture the primary causes for data-model disagreement, while separating largely objective error sources (\(\sigma_{var}\) and \(\sigma_{meas}\)) that can be determined empirically from the subjective model-bias term, which reflects the intentions and judgement of the researcher building the model.