To avoid a frequent initial confusion for new readers, it may be helpful to
point out that there are two common ways in which probabilistic methods are
used in combination with ordinary differential equations: The “classic” problem
of numerics is to infer the solution to an initial value problem given access
to the differential equation. Below, we term this problem “solving ODEs”. The reverse problem, in some sense, has
also found interest in machine learning: inferring a differential equation from
(noisy) observations of trajectories that are assumed to be governed by this
ODE. Below, this is listed under “inferring ODEs”.

### Solving ODEs

## 2020

A role for symmetry in the Bayesian solution of differential equations

*Bayesian Analysis*
2020

## 2018

Convergence Rates of Gaussian ODE Filters

*ArXiv e-prints*
2018

A recently-introduced class of probabilistic (uncertainty-aware) solvers for ordinary differential equations (ODEs) applies Gaussian (Kalman) filtering to initial value problems. These methods model the true solution x and its first q derivatives a priori as a Gauss–Markov process X, which is then iteratively conditioned on information about x’. We prove worst-case local convergence rates of order h^q+1 for a wide range of versions of this Gaussian ODE filter, as well as global convergence rates of order h^q in the case of q=1 and an integrated Brownian motion prior, and analyze how inaccurate information on x’ coming from approximate evaluations of f affects these rates. Moreover, we present explicit formulas for the steady states and show that the posterior confidence intervals are well calibrated in all considered cases that exhibit global convergence—in the sense that they globally contract at the same rate as the truncation error.

Random time step probabilistic methods for uncertainty quantification in chaotic and geometric numerical integration

Abdulle, A.,
and Garegnani, G.

*ArXiv e-prints*
2018

A novel probabilistic numerical method for quantifying the uncertainty induced by the time integration of ordinary differential equations (ODEs) is introduced. Departing from the classical strategy to randomize ODE solvers by adding a random forcing term, we show that a probability measure over the numerical solution of ODEs can be obtained by introducing suitable random time-steps in a classical time integrator. This intrinsic randomization allows for the conservation of geometric properties of the underlying deterministic integrator such as mass conservation, symplecticity or conservation of first integrals. Weak and mean-square convergence analysis are derived. We also analyze the convergence of the Monte Carlo estimator for the proposed random time step method and show that the measure obtained with repeated sampling converges in mean-square sense independently of the number of samples. Numerical examples including chaotic Hamiltonian systems, chemical reactions and Bayesian inferential problems illustrate the accuracy, robustness and versatility of our probabilistic numerical method.

Implicit Probabilistic Integrators for ODEs

2018

We introduce a family of implicit probabilistic integrators for initial value problems (IVPs) taking as a starting point the multistep Adams–Moulton method. The implicit construction allows for dynamic feedback from the forthcoming time-step, by contrast with previous probabilistic integrators, all of which are based on explicit methods. We begin with a concise survey of the rapidly-expanding field of probabilistic ODE solvers. We then introduce our method, which builds on and adapts the work of Conrad et al. (2016) and Teymur et al. (2016), and provide a rigorous proof of its well-definedness and convergence. We discuss the problem of the calibration of such integrators and suggest one approach. We give an illustrative example highlighting the effect of the use of probabilistic integrators – including our new method – in the setting of parameter inference within an inverse problem.

## 2016

A probabilistic model for the numerical solution of initial value problems

*ArXiv e-prints*
2016

Like many numerical methods, solvers for initial value problems (IVPs) on ordinary differential equations estimate an analytically intractable quantity, using the results of tractable computations as inputs. This structure is closely connected to the notion of inference on latent variables in statistics. We describe a class of algorithms that formulate the solution to an IVP as inference on a latent path that is a draw from a Gaussian process probability measure (or equivalently, the solution of a linear stochastic differential equation). We then show that certain members of this class are connected precisely to generalized linear methods for ODEs, a number of Runge–Kutta methods, and Nordsieck methods. This probabilistic formulation of classic methods is valuable in two ways: analytically, it highlights implicit prior assumptions favoring certain approximate solutions to the IVP over others, and gives a precise meaning to the old observation that these methods act like filters. Practically, it endows the classic solvers with ‘docking points’ for notions of uncertainty and prior information about the initial value, the value of the ODE itself, and the solution of the problem.

Active Uncertainty Calibration in Bayesian ODE Solvers

*In Uncertainty in Artificial Intelligence (UAI)*
2016

There is resurging interest, in statistics and machine learning, in solvers for ordinary differential equations (ODEs) that return probability measures instead of point estimates. Recently, Conrad et al. introduced a sampling-based class of methods that are ’well-calibrated’ in a specific sense. But the computational cost of these methods is significantly above that of classic methods. On the other hand, Schober et al. pointed out a precise connection between classic Runge-Kutta ODE solvers and Gaussian filters, which gives only a rough probabilistic calibration, but at negligible cost overhead. By formulating the solution of ODEs as approximate inference in linear Gaussian SDEs, we investigate a range of probabilistic ODE solvers, that bridge the trade-off between computational cost and probabilistic calibration, and identify the inaccurate gradient measurement as the crucial source of uncertainty. We propose the novel filtering-based method Bayesian Quadrature filtering (BQF) which uses Bayesian quadrature to actively learn the imprecision in the gradient measurement by collecting multiple gradient evaluations.

Probabilistic Linear Multistep Methods

Teymur, Onur,
Zygalakis, Kostas,
and Calderhead, Ben

2016

We present a derivation and theoretical investigation of the Adams-Bashforth and Adams-Moulton family of linear multistep methods for solving ordinary differential equations, starting from a Gaussian process (GP) framework. In the limit, this formulation coincides with the classical deterministic methods, which have been used as higher-order initial value problem solvers for over a century. Furthermore, the natural probabilistic framework provided by the GP formulation allows us to derive probabilistic versions of these methods, in the spirit of a number of other probabilistic ODE solvers presented in the recent literature. In contrast to higher-order Runge-Kutta methods, which require multiple intermediate function evaluations per step, Adams family methods make use of previous function evaluations, so that increased accuracy arising from a higher-order multistep approach comes at very little additional computational cost. We show that through a careful choice of covariance function for the GP, the posterior mean and standard deviation over the numerical solution can be made to exactly coincide with the value given by the deterministic method and its local truncation error respectively. We provide a rigorous proof of the convergence of these new methods, as well as an empirical investigation (up to fifth order) demonstrating their convergence rates in practice.

## 2015

A Random Riemannian Metric for Probabilistic Shortest-Path Tractography

Hauberg, Søren,
Schober, Michael,
Liptrot, Matthew,

Hennig, Philipp,
and Feragen, Aasa

*In Medical Image Computing and Computer-Assisted Intervention (MICCAI)*
2015

Shortest-path tractography (SPT) algorithms solve global optimization problems defined from local distance functions. As diffusion MRI data is inherently noisy, so are the voxelwise tensors from which local distances are derived. We extend Riemannian SPT by modeling the stochasticity of the diffusion tensor as a “random Riemannian metric”, where a geodesic is a distribution over tracts. We approximate this distribution with a Gaussian process and present a probabilistic numerics algorithm for computing the geodesic distribution. We demonstrate SPT improvements on data from the Human Connectome Project.

## 2014

On solving Ordinary Differential Equations using Gaussian
Processes

Barber, D.

*ArXiv pre-print 1408.3807*
2014

We describe a set of Gaussian Process based approaches that
can be used to solve non-linear Ordinary Differential
Equations. We suggest an explicit probabilistic solver and
two implicit methods, one analogous to Picard iteration and
the other to gradient matching. All methods have greater
accuracy than previously suggested Gaussian Process
approaches. We also suggest a general approach that can yield
error estimates from any standard ODE solver.

Probabilistic Solutions to Differential Equations and their
Application to Riemannian Statistics

*In Proc. of the 17th int. Conf. on Artificial Intelligence and
Statistics (AISTATS)*
2014

We study a probabilistic numerical method for the solution of
both boundary and initial value problems that returns a joint
Gaussian process posterior over the solution. Such methods
have concrete value in the statistics on Riemannian
manifolds, where non-analytic ordinary differential equations
are involved in virtually all computations. The probabilistic
formulation permits marginalising the uncertainty of the
numerical solution such that statistics are less sensitive to
inaccuracies. This leads to new Riemannian algorithms for
mean value computations and principal geodesic
analysis. Marginalisation also means results can be less
precise than point estimates, enabling a noticeable speed-up
over the state of the art. Our approach is an argument for a
wider point that uncertainty caused by numerical calculations
should be tracked throughout the pipeline of machine learning
algorithms.

Probabilistic shortest path tractography in DTI using
Gaussian Process ODE solvers

Schober, Michael,
Kasenburg, Niklas,
Feragen, Aasa,

Hennig, Philipp,
and Hauberg, Søren

*In Medical Image Computing and Computer-Assisted Intervention –
MICCAI 2014*
2014

Tractography in diffusion tensor imaging estimates
connectivity in the brain through observations of local
diffusivity. These observations are noisy and of low
resolution and, as a consequence, connections cannot be found
with high precision. We use probabilistic numerics to
estimate connectivity between regions of interest and
contribute a Gaussian Process tractography algorithm which
allows for both quantification and visualization of its
posterior uncertainty. We use the uncertainty both in
visualization of individual tracts as well as in heat maps of
tract locations. Finally, we provide a quantitative
evaluation of different metrics and algorithms showing that
the adjoint metric combined with our algorithm produces paths
which agree most often with experts.

Probabilistic ODE Solvers with Runge-Kutta Means

2014

Runge-Kutta methods are the classic family of solvers for
ordinary differential equations (ODEs), and the basis for the
state of the art. Like most numerical methods, they return
point estimates. We construct a family of probabilistic
numerical methods that instead return a Gauss-Markov process
defining a probability distribution over the ODE solution. In
contrast to prior work, we construct this family such that
posterior means match the outputs of the Runge-Kutta family
exactly, thus inheriting their proven good
properties. Remaining degrees of freedom not identified by
the match to Runge-Kutta are chosen such that the posterior
probability measure fits the observed structure of the
ODE. Our results shed light on the structure of Runge-Kutta
solvers from a new direction, provide a richer, probabilistic
output, have low computational cost, and raise new research
questions.

## 2013

Simulation of stochastic network dynamics via entropic matching

Ramalho, Tiago,
Selig, Marco,
Gerland, Ulrich,
and Enßlin, Torsten A.

*Phys. Rev. E*
2013

The simulation of complex stochastic network dynamics arising, for instance, from models of coupled biomolecular processes remains computationally challenging. Often, the necessity to scan a model’s dynamics over a large parameter space renders full-fledged stochastic simulations impractical, motivating approximation schemes. Here we propose an approximation scheme which improves upon the standard linear noise approximation while retaining similar computational complexity. The underlying idea is to minimize, at each time step, the Kullback-Leibler divergence between the true time evolved probability distribution and a Gaussian approximation (entropic matching). This condition leads to ordinary differential equations for the mean and the covariance matrix of the Gaussian. For cases of weak nonlinearity, the method is more accurate than the linear method when both are compared to stochastic simulations.

Bayesian Uncertainty Quantification for Differential
Equations

Chkrebtii, O.,
Campbell, D.A.,
Girolami, M.A.,
and Calderhead, B.

*Bayesin Analysis (discussion paper)*
2013

This paper advocates expansion of the role of Bayesian
statistical inference when formally quantifying uncertainty
in computer models defined by systems of ordinary or partial
differential equations. We adopt the perspective that
implicitly defined infinite dimensional functions
representing model states are objects to be inferred
probabilistically. We develop a general methodology for the
probabilistic integration of differential equations via model
based updating of a joint prior measure on the space of
functions and their temporal and spatial derivatives. This
results in a posterior measure over functions reflecting how
well they satisfy the system of differential equations and
corresponding initial and boundary values. We show how this
posterior measure can be naturally incorporated within the
Kennedy and O’Hagan framework for uncertainty quantification
and provides a fully Bayesian approach to model
calibration. By taking this probabilistic viewpoint, the full
force of Bayesian inference can be exploited when seeking to
coherently quantify and propagate epistemic uncertainty in
computer models of complex natural and physical systems. A
broad variety of examples are provided to illustrate the
potential of this framework for characterising discretization
uncertainty, including initial value, delay, and boundary
value differential equations, as well as partial differential
equations. We also demonstrate our methodology on a large
scale system, by modeling discretization uncertainty in the
solution of the Navier-Stokes equations of fluid flow,
reduced to over 16,000 coupled and stiff ordinary
differential equations. Finally, we discuss the wide range of
open research themes that follow from the work presented.

## 2009

A quantitative probabilistic investigation into the accumulation of rounding errors in numerical ODE solution

Mosbach, Sebastian,
and Turner, Amanda G.

*Computers & Mathematics with Applications*
2009

We examine numerical rounding errors of some deterministic solvers for systems of ordinary differential equations (ODEs). We show that the accumulation of rounding errors results in a solution that is inherently random and we obtain the theoretical distribution of the trajectory as a function of time, the step size and the numerical precision of the computer. We consider, in particular, systems which amplify the effect of the rounding errors so that over long time periods the solutions exhibit divergent behaviour. By performing multiple repetitions with different values of the time step size, we observe numerically the random distributions predicted theoretically. We mainly focus on the explicit Euler and RK4 methods but also briefly consider more complex algorithms such as the implicit solvers VODE and RADAU5.

## 2003

Solving noisy linear operator equations by Gaussian
processes: Application to ordinary and partial differential
equations

Graepel, Thore

*In ICML*
2003

## 1991

Bayesian solution of ordinary differential equations

Skilling, J.

*Maximum Entropy and Bayesian Methods, Seattle*
1991

In the numerical solution of ordinary differential equations,
a function y(x) is to be reconstructed from knowledge of the
functional form of its derivative: dy/dx=f(x,y), together
with an appropriate boundary condition. The derivative f is
evaluated at a sequence of suitably chosen points (x_k,y_k),
from which the form of y(.) is estimated. This is an
inference problem, which can and perhaps should be treated by
Bayesian techniques. As always, the inference appears as a
probability distribution prob(y(.)), from which random
samples show the probabilistic reliability of the
results. Examples are given.

## 1973

A Statistical Study Of The Accuracy Of Floating Point Number Systems

Kuki, H.,
and Cody, W. J.

*Communications of the ACM*
1973

This paper presents the statistical results of tests of the accuracy of certain arithmetic systems in evaluating sums, products and inner products, and analytic error estimates for some of the computations. The arithmetic systems studied are 6-digit hexadecimal and 22-digit binary floating point number representations combined with the usual chop and round modes of arithmetic with various numbers of guard digits, and with a modified round mode with guard digits. In a certain sense, arithmetic systems differing only in their use of binary or hexadecimal number representations are shown to be approximately statistically equivalent in accuracy. Further, the usual round mode with guard digits is shown to be statistically superior in accuracy to the usual chop mode in all cases save one. The modified round mode is found to be superior to the chop mode in all cases.

## 1966

Test of Probabilistic Models for the Propagation of Roundoff Errors

Hull, T. E.,
and Swenson, J. R.

*Communications of the ACM*
1966

In any prolonged computation it is generally assumed that the accumulated effect of roundoff errors is in some sense statistical. The purpose of this paper is to give precise descriptions of certain probabilistic models for roundoff error, and then to described a series of experiments for testing the validity of these models. It is concluded that the models are in general very good. Discrepancies are both rare and mild. The test techniques can also be used to experiment with various types of special arithmetic.

### Inferring ODEs

## 2017

Scalable Variational Inference for Dynamical Systems

Gorbach, Nico S,
Bauer, Stefan,
and Buhmann, Joachim M

2017

Gradient matching is a promising tool for learning parameters and state dynamics of ordinary differential equations. It is a grid free inference approach, which, for fully observable systems is at times competitive with numerical integration. However, for many real-world applications, only sparse observations are available or even unobserved variables are included in the model description. In these cases most gradient matching methods are difficult to apply or simply do not provide satisfactory results. That is why, despite the high computational cost, numerical integration is still the gold standard in many applications. Using an existing gradient matching approach, we propose a scalable variational inference framework which can infer states and parameters simultaneously, offers computational speedups, improved accuracy and works well even under model misspecifications in a partially observable system.

## 2014

Gaussian Processes for Bayesian Estimation in Ordinary
Differential Equations

Wang, Yali,
and Barber, David

*In International Conference on Machine Learning – ICML*
2014

Bayesian parameter estimation in coupled ordinary
differential equations (ODEs) is challenging due to the high
computational cost of numerical integration. In gradient
matching a separate data model is introduced with the
property that its gradient may be calculated
easily. Parameter estimation is then achieved by requiring
consistency between the gradients computed from the data
model and those specified by the ODE. We propose a Gaussian
process model that directly links state derivative
information with system observations, simplifying previous
approaches and improving estimation accuracy.

## 2013

Adaptive Markov chain Monte Carlo forward projection for
statistical analysis in epidemic modelling of human
papillomavirus

Korostil, Igor A,
Peters, Gareth W,
Cornebise, Julien,
and Regan, David G

*Statistics in medicine*
2013

A Bayesian statistical model and estimation methodology based on
forward projection adaptive Markov chain Monte Carlo is
developed in order to perform the calibration of a
high-dimensional nonlinear system of ordinary differential
equations representing an epidemic model for human
papillomavirus types 6 and 11 (HPV-6, HPV-11). The model is
compartmental and involves stratification by age, gender and
sexual-activity group. Developing this model and a means to
calibrate it efficiently is relevant because HPV is a very
multi-typed and common sexually transmitted infection with
more than 100 types currently known. The two types studied in
this paper, types 6 and 11, are causing about 90% of
anogenital warts. We extend the development of a sexual
mixing matrix on the basis of a formulation first suggested
by Garnett and Anderson, frequently used to model sexually
transmitted infections. In particular, we consider a
stochastic mixing matrix framework that allows us to jointly
estimate unknown attributes and parameters of the mixing
matrix along with the parameters involved in the calibration
of the HPV epidemic model. This matrix describes the sexual
interactions between members of the population under study
and relies on several quantities that are a priori
unknown. The Bayesian model developed allows one to estimate
jointly the HPV-6 and HPV-11 epidemic model parameters as
well as unknown sexual mixing matrix parameters related to
assortativity. Finally, we explore the ability of an
extension to the class of adaptive Markov chain Monte Carlo
algorithms to incorporate a forward projection strategy for
the ordinary differential equation state
trajectories. Efficient exploration of the Bayesian posterior
distribution developed for the ordinary differential equation
parameters provides a challenge for any Markov chain sampling
methodology, hence the interest in adaptive Markov chain
methods. We conclude with simulation studies on synthetic and
recent actual data.

## 2009

Accelerating Bayesian Inference over Nonlinear Differential
Equations with Gaussian Processes

2009