Probabilistic-Numerics.org

# Connections, Part I: Uncertainty Quantification

### By Philipp Hennig, 2015-01-14 07:00:00 +0100

As I go around presenting the idea of probabilistic numerics to various audiences, certain questions about related areas come up repeatedly. This post explains how probabilistic numerics compares to the established area of Uncertainty Quantification. Subsequent posts will discuss connections to certain kinds of stochastic numerical methods, and the young and popular area of Bayesian Optimization.

A disclaimer: Obviously, everyone has different opinions about the scope and purpose of certain concepts and academic fields. And I am not an expert in the areas discussed here. This post relates a part of my own personal justification, why I think certain ideas are novel and interesting. It is not intended as a holistic overview of a field. If anyone disagrees with characterizations I make here, I would be glad if you could relate your opinion in the comments section below. I’m grateful to Mike Osborne for some comments on a draft for this post.

Uncertainty Quantification (UQ) is a relatively young area (but considerably older than probabilistic numerics) at the boundary of numerical mathematics and statistics, dealing with, as SIAM and the ASA define it: “the interface of complex modeling of processes and data, especially characterizations of the uncertainties inherent in the use of such models.” This description is (probably deliberately) vague and might well describe all of statistics. As Tony O’Hagan writes (O’Hagan, 2013):

It is the AM [applied mathematics] community that is responsible for the term UQ (Uncertainty Quantification). To a member of the Stat community, however, UQ seems a curiously inappropriate term because (a) it sounds as though it should cover the whole of Statistics, since one way of viewing the science of Statistics is precisely as the science of quantifying uncertainty, whereas (b) the uncertainties that are studied in the AM community do not even cover the range of uncertainties that statisticians would recognise in the predictions of models. Nevertheless, UQ is the de facto term that has been accepted by the SIAM/ASA joint initiative.

In my impression – and my knowledge of the field is very limited – UQ is still evolving and continues to expand and address new questions. But the principle problem at its core is the propagation of input uncertainty. In very simple terms: Consider a complicated model $f:x\to y$ mapping inputs $x$ to outputs $y$. If I “wiggle” $x$ at a given value $x_0$ according to some perturbation $\delta x$, what is the effect on $y$? For example, $f$ may be a climate model, or a flow field over some complicated airframe.

There is a variety of methods used for this task in UQ, including the popular concept of a polynomial chaos expansion, which, intriguingly, has a connection to the Gaussian process surrogates used in Bayesian Optimization, which will be the subject of a later blog post on connections. And clearly, the uncertainty propagation described above addresses a certain type of probabilistic uncertainty we are also concerned with in Probabilistic Numerics: Given uncertain inputs to a problem and a particular algorithm used to solve it, what should the uncertainty over the output be?

Nevertheless, our questions in probabilistic numerics take a different, and in some cases broader view that warrants its own research. For us, the object of interest is the computation itself, rather than its reaction to a change in input. Questions we would like to answer in probabilistic numerics include, for a particular computation task: Is it performed as efficiently as possible (from an information theoretic perspective) by a particular algorithm? What kind of information does it collect about related objects? And could it be made more adaptive to salient prior knowledge about a certain task at hand?

### many different types of uncertainty, not all quantified

To get an intuition for the many different ways that uncertainty can play a role in computation, let’s look at an elementary problem: the linear problem of finding $x\in\mathbb{R}^N$ such that $Ax=b$ with $b\in\mathbb{R}^M$ and a matrix $A\in\mathbb{R}^{M\times N}$. Here are some questions one could ask:

• Uncertainty from ill-posedness: If $% $, there are generally many $x$ that solve the problem. What is the space of such admissible $x$? This is (a base case of) the kind of uncertainty typically studied in statistics, when a dataset is not sufficiently informative to identify the parameters of a model.

• Uncertainty from imprecise inputs: If $b$ is perturbed to $b+\delta b$, what is the effect on $x$? This is the elementary version of a central point of interest for Uncertainty Quantification. (But, as Tony O’Hagan says above, this kind of uncertainty is of course also of interest in statistics sometimes).

These two questions are classic ones, and in this linear problem, they have straightforward, or in fact trivial answers, while in nonlinear problems, they can pose formidable challenges. With probabilistic numerics, we add another set of questions:

• Uncertainty from computation: To make the point very clear, let’s assume $N=M$ and $A$ is nonsingular, so that there is only one unique $x$ satisfying $Ax=b$. Then the problem is ‘well-posed’, and perturbations in $b$ have linear effects on $x$. But there can still be computational uncertainty: If $N$ is too large for an exact solution (e.g. by Gauss-Jordan elimination) we would use an iterative method. For example, if $A$ happens to also be symmetric positive definite, the method of conjugate gradients (CG) would be a standard approach. CG is an iterative algorithm producing a sequence of increasingly better approximations $\{x_i\}_{i=1,\dots,N}$ to the correct $x$. It is usually stopped long before its (theoretical) convergence. If we stop CG after $k$ iterations, what kind of information have we effectively collected? The quality of the constructed approximation itself can be assessed by the residual $r_k = Ax_k-b$, but there are questions for which the residual alone is not sufficient: Maybe we also need to solve the related problem $Ay=c$. How much do we learn about $y$ from computing $x_k$?
Another question one might ask is whether CG is always the right method to use. Maybe we have some vague prior information about $A$. For example we may know that the elements in row and column 1 through 10 are a lot larger than those in the remaining columns. Or we know that, while it is not quite a Toeplitz matrix, it tends to have a banded structure. Can we use this information to find a solution quicker than standard CG?
These are the kinds of problem that entice us in probabilistic numerics (shameless plug: answers for this linear positive definite setting can be found in a recent paper by myself, about to come out in the SIAM JOPT (Hennig, 2015)).

To summarize: Probabilistic numerics focusses on the computations used to solve a particular problem: what is the uncertainty added by performing the computation approximately, and how can this information be used in ways not yet anticipated? PN and UQ can live very well next to each other. PN can learn much from UQ, in particular in areas like uncertainty propagation. Conversely, I hope colleagues working in UQ will be interested in how our results on PN can help in large scale UQ. The existence of UQ as such is no reason not to study PN.

### References

1. Hennig, P. (2015). Probabilistic Interpretation of Linear Solvers. SIAM J on Optimization, 25(1).

This paper proposes a probabilistic framework for algorithms that iteratively solve unconstrained linear problems Bx = b with positive definite B for x. The goal is to replace the point estimates returned by existing methods with a Gaussian posterior belief over the elements of the inverse of B, which can be used to estimate errors. Recent probabilistic interpretations of the secant family of quasi-Newton optimization algorithms are extended. Combined with properties of the conjugate gradient algorithm, this leads to uncertainty-calibrated methods with very limited cost overhead over conjugate gradients, a self-contained novel interpretation of the quasi-Newton and conjugate gradient algorithms, and a foundation for new nonlinear optimization methods.

@article{2014arXiv14022058H,
author = {{Hennig}, P.},
journal = {SIAM J on Optimization},
month = jan,
title = {{Probabilistic Interpretation of Linear Solvers}},
year = {2015},
volume = {25},
issue = {1},
file = {http://probabilistic-numerics.org/assets/pdf/HennigLinear2015.pdf}
}

2. O’Hagan, A. (2013). Polynomial Chaos: A Tutorial and Critique from a Statistician’s Perspective. University of Sheffield, UK.
@techreport{ohagan13-polyn-chaos,
author = {O'Hagan, Anthony},
title = {Polynomial Chaos: A Tutorial and Critique from a
Statistician's Perspective},
institution = {University of Sheffield, UK},
year = {2013},
month = may
}