Probabilistic-Numerics.orgA community website collecting research on algorithms that assign probability distributions to the unknown result of deterministic computations.Probabilistic Numerics for Computer Scientists<p><em>This post is the first of a series originating in a PhD student meeting
in Tübingen in April 2016.</em></p>
<p><em>It has been two years since the inaugural probabilistic numerics roundtable
meeting. While the field has started to generate some momentum, it is growing
more important to explain the concept of probabilistic numerics to all relevant
parties. This post will attempt to explain the basic ideas, research goals and
obstacles to a trained computer scientist.</em></p>
<p>Probabilistic numerics is an emerging research area with goals, problems and
tools from both applied mathematics – numerical analysis and probability theory
– and theoretical computer science – mostly algorithms and data structures.
Numerical algorithms are tools for solving equations that either are too big to
be solved manually or don’t possess a closed-form solution at all. In both
cases, an iterative procedure is implemented in a computer which is then proven
to converge to the correct solution. Common problems are finding optimal values
of some function, evaluating tricky integrals or solving differential equations.</p>
<p>However, in many cases even these algorithms are too expensive to be evaluated
to very high precision. For instance, when solving integrals of many variables,
the approximation quality depends exponentially on the number of function
evaluations – the so-called <em>curse of dimensionality</em>. Or, the function to be
optimized might be too costly to be evaluated accurately which is the case in
<em>deep learning</em> and <em>big data</em> (just to throw in some more buzz words). In these
situations, the (slightly modified) algorithms might still work, but guarantees
are harder to come by.</p>
<p>To draw an analogy to a classical computer science problem, I will use the A*
search algorithm. Imagine a huge weighted graph. The task is to compute the cost
from node X to node Y, but you are only given a small computational time budget,
a tiny fraction of the guaranteed worst case running time. Since the A*
algorithm uses an heuristic to estimate the remaining cost, the algorithm can be
terminated any time to produce an approximate output.</p>
<p>Now, the second key ingredient comes into play: probability theory. In
probabilistic numerics, probabilities can represent the uncertainties stemming
from approximations in the algorithm or finite run time. The calculus of
probability theory, most importantly Bayes’ rule, allows to incorporate and
extend algorithms in many ways in a consistent and well-studied framework. E.g.,
the early stopping A* algorithm might not only return an <em>expected value</em> for
the cost, but also give a <em>standard deviation</em>. Or the algorithm could spend
some time comparing the estimated remaining cost with the observed values from
the graph weights, adjusting the overall estimation if, for instance, all
previous estimates have been too high. Another possible extension: the weights
of the graph might be represented by probability distributions themselves. If
the graph represents a road network, there might be traffic jams with some
probability.</p>
<p>Note, however, that this does not necessarily mean that there is randomness or
stochastic elements in a probabilistic numerical algorithm. While some
researchers also use sampling based methods in their algorithms, other methods
are completely deterministic working on deterministic problems to produce
deterministic results.</p>
<p>While many problems in probabilistic numerics are of mathematical nature, there
are also good reasons to get interested in this area when your focus is more on
computer science. One common problem is to represent the necessary probability
distributions with low memory cost. What are good programming interfaces when
dealing with probability distributions? There are also challenges when dealing
with specialized hardware, e.g., GPUs for large-scale optimization.</p>
<p>In conclusion, probabilistic numerics is a young and active research area using
probability theory to describe iterative approximative algorithms. There are a
plethora of open problems and potential applications and lots of interesting
challenges covering the whole spectrum of applied mathematics and (theoretical)
computer science.</p>
Wed, 18 May 2016 12:00:00 +0200
/general/2016/05/18/PN-for-CompSci/
/general/2016/05/18/PN-for-CompSci/Probabilistic Integration<p>Recent months have been very exciting for the probabilistic numerics community, with a series of new interesting papers appearing and a scoping workshop at the Alan Turing Institute discussing avenues for future research. In particular, there has been a lot of interest in solving differential equations, which was the topic of two workshops at the University of Warwick and at SciCADE, but the focus is now slightly shifting towards quadrature with two workshops on Probabilistic Integration (not to be missed!) at <a href="/meetings/NIPS2015.html">NIPS 2015 in December</a> and <a href="/meetings/MCMSki2016.html">MCMSki V in January</a>.</p>
<p>Probabilistic Integration, and, in particular, Bayesian Quadrature (BQ), was one of the first areas to be investigated in probabilistic numerics. The overall idea is to fully handle our epistemic uncertainty over the numerical solution of the integral; this can, of course, be obtained by adopting a Bayesian approach. More precisely, we can model the integrand using a Gaussian Process and, using the linearity of Gaussian variables, obtain a Gaussian posterior distribution over the solution of the integral. We then usually consider the posterior mean of this Gaussian distribution to be our approximation of the integral, while the posterior variance can help us understand how uncertain we are about our solution.
This method has been extended numerous times in order to improve its applicability to statistical tasks, such as computing ratios of integrals, or considering cases where the integrand is a probability distribution and hence intrinsically non-negative.</p>
<p>Recent work has focussed on studying the links between BQ and other methods. For example, <a href="#2015arXiv150405994S">(Särkkä, Hartikainen, Svensson, & Sandblom, 2015)</a>, discussed how many popular quadrature methods can be obtained as the posterior mean of a BQ method with a specific kernel. Similarly, <a href="#bach2015equivalence">(Bach, 2015)</a> showed that performing BQ is equivalent to using a specific type of random features.</p>
<p>On the other hand, existing methods can, and sometimes should, also help to design more efficient BQ methods. In a recent paper by <a href="#briol_frank-wolfe_2015">(Briol, Oates, Girolami, & Osborne, 2015)</a>, the authors showed how a convex optimisation method, called the Frank-Wolfe algorithm, could provide a probabilistic solution to integration problems. This method also provided the first ever provable posterior convergence and contraction rates for BQ. Follow-up work <a href="#briol_probabilistic_2015">(Briol, Oates, Girolami, Osborne, & Sejdinovic, 2015)</a> also demonstrated that similar theory could be obtained when using Monte Carlo methods (such as Markov Chain or Quasi Monte Carlo) to pick the points at which the integrand is evaluated. This paper also provides rates for the newly proposed methods, as well as a framework for obtaining rates of any new future BQ methods.</p>
<p>Clearly, much remains to be done to be able to provide similar theoretical guarantees to alternative methods, but these early successes are very promising and the increasing popularity of workshops in probabilistic numerics clearly demonstrates the interest from potential users!</p>
<h2 id="references">References</h2>
<ol class="bibliography"><li><span id="briol_probabilistic_2015">Briol, F.-X., Oates, C. J., Girolami, M., Osborne, M. A., & Sejdinovic, D. (2015). Probabilistic Integration: A Role for Statisticians in Numerical Analysis? <i>ArXiv:1512.00933 [Cs, Math, Stat]</i>. Retrieved from http://arxiv.org/abs/1512.00933</span>
<span id="briol_probabilistic_2015_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#briol_probabilistic_2015_abstract" data-toggle="collapse" href="#briol_probabilistic_2015" onclick="return false">Abstract</a></li>
<li><a class="bib-materials" data-target="#briol_probabilistic_2015_bibtex" data-toggle="collapse" href="#briol_probabilistic_2015" onclick="return false">Bib</a></li>
<li><a class="bib-materials" href="http://arxiv.org/pdf/1512.00933.pdf">PDF</a></li>
</ul>
<p id="briol_probabilistic_2015_abstract" class="collapse">A research frontier has emerged in scientific computation, founded on the principle that numerical error entails epistemic uncertainty that ought to be subjected to statistical analysis. This viewpoint raises several interesting challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational pipeline. This paper examines thoroughly the case for probabilistic numerical methods in statistical computation and a specific case study is presented for Markov chain and Quasi Monte Carlo methods. A probabilistic integrator is equipped with a full distribution over its output, providing a measure of epistemic uncertainty that is shown to be statistically valid at finite computational levels, as well as in asymptotic regimes. The approach is motivated by expensive integration problems, where, as in krigging, one is willing to expend, at worst, cubic computational effort in order to gain uncertainty quantification. There, probabilistic integrators enjoy the "best of both worlds", leveraging the sampling efficiency of Monte Carlo methods whilst providing a principled route to assessment of the impact of numerical error on scientific conclusions. Several substantial applications are provided for illustration and critical evaluation, including examples from statistical modelling, computer graphics and uncertainty quantification in oil reservoir modelling.</p>
<pre id="briol_probabilistic_2015_bibtex" class="pre pre-scrollable collapse">@article{briol_probabilistic_2015,
title = {Probabilistic {Integration}: A Role for Statisticians in Numerical Analysis?},
url = {http://arxiv.org/abs/1512.00933},
urldate = {2015-07-22},
journal = {arXiv:1512.00933 [cs, math, stat]},
author = {Briol, François-Xavier and Oates, Chris J. and Girolami, Mark and Osborne, Michael A. and Sejdinovic, Dino},
year = {2015},
note = {arXiv: 1512.00933},
file = {http://arxiv.org/pdf/1512.00933.pdf}
}
</pre>
</span>
</li>
<li><span id="bach2015equivalence">Bach, F. (2015). On the Equivalence between Quadrature Rules and Random Features. <i>ArXiv Preprint ArXiv:1502.06800</i>.</span>
<span id="bach2015equivalence_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#bach2015equivalence_abstract" data-toggle="collapse" href="#bach2015equivalence" onclick="return false">Abstract</a></li>
<li><a class="bib-materials" data-target="#bach2015equivalence_bibtex" data-toggle="collapse" href="#bach2015equivalence" onclick="return false">Bib</a></li>
<li><a class="bib-materials" href="http://arxiv.org/pdf/1502.06800v2.pdf">PDF</a></li>
</ul>
<p id="bach2015equivalence_abstract" class="collapse">
We show that kernel-based quadrature rules for computing in tegrals can be seen as a special case of random feature expansions for positive definite kernels, for a particular decomposition that always exists for such kernels. We provide a theoretical analysis of the number of required samples for a given approximation error, leading to both upper and lower bounds that are based solely on the eigenvalues of the associated integral operator and match up to logarithmic terms. In particular, we show that the upper bound may be obtained from independent an d identically distributed samples from a specific non-uniform distribution, while the lower bo und if valid for any set of points. Applying our results to kernel-based quadrature, while our results are fairly general, we recover known upper and lower bounds for the special cases of Sobolev spaces. Moreover, our results extend to the more general problem of full function approxim ations (beyond simply computing an integral), with results in L2- and L∞-norm that match known results for special cases. Applying our results to random features, we show an improvement of the number of random features needed to preserve the generalization guarantees for learning with Lipshitz-continuous losses.
</p>
<pre id="bach2015equivalence_bibtex" class="pre pre-scrollable collapse">@article{bach2015equivalence,
title = {On the Equivalence between Quadrature Rules and Random Features},
author = {Bach, Francis},
journal = {arXiv preprint arXiv:1502.06800},
file = {http://arxiv.org/pdf/1502.06800v2.pdf},
year = {2015}
}
</pre>
</span>
</li>
<li><span id="briol_frank-wolfe_2015">Briol, F.-X., Oates, C. J., Girolami, M., & Osborne, M. A. (2015). Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees. In <i>Advances in Neural Information Processing Systems (NIPS)</i>. Retrieved from http://arxiv.org/abs/1506.02681</span>
<span id="briol_frank-wolfe_2015_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#briol_frank-wolfe_2015_abstract" data-toggle="collapse" href="#briol_frank-wolfe_2015" onclick="return false">Abstract</a></li>
<li><a class="bib-materials" data-target="#briol_frank-wolfe_2015_bibtex" data-toggle="collapse" href="#briol_frank-wolfe_2015" onclick="return false">Bib</a></li>
<li><a class="bib-materials" href="http://arxiv.org/pdf/1506.02681v1.pdf">PDF</a></li>
</ul>
<p id="briol_frank-wolfe_2015_abstract" class="collapse">There is renewed interest in formulating integration as an inference problem, motivated by obtaining a full distribution over numerical error that can be propagated through subsequent computation. Current methods, such as Bayesian Quadrature, demonstrate impressive empirical performance but lack theoretical analysis. An important challenge is to reconcile these probabilistic integrators with rigorous convergence guarantees. In this paper, we present the first probabilistic integrator that admits such theoretical treatment, called Frank-Wolfe Bayesian Quadrature (FWBQ). Under FWBQ, convergence to the true value of the integral is shown to be exponential and posterior contraction rates are proven to be superexponential. In simulations, FWBQ is competitive with state-of-the-art methods and out-performs alternatives based on Frank-Wolfe optimisation. Our approach is applied to successfully quantify numerical error in the solution to a challenging model choice problem in cellular biology.</p>
<pre id="briol_frank-wolfe_2015_bibtex" class="pre pre-scrollable collapse">@inproceedings{briol_frank-wolfe_2015,
title = {Frank-{Wolfe} {Bayesian} {Quadrature}: {Probabilistic} {Integration} with {Theoretical} {Guarantees}},
shorttitle = {Frank-{Wolfe} {Bayesian} {Quadrature}},
url = {http://arxiv.org/abs/1506.02681},
booktitle = {Advances in Neural Information Processing Systems (NIPS)},
author = {Briol, François-Xavier and Oates, Chris J. and Girolami, Mark and Osborne, Michael A.},
year = {2015},
keywords = {Statistics - Machine Learning},
file = {http://arxiv.org/pdf/1506.02681v1.pdf}
}
</pre>
</span>
</li>
<li><span id="2015arXiv150405994S">Särkkä, S., Hartikainen, J., Svensson, L., & Sandblom, F. (2015). On the relation between Gaussian process quadratures and sigma-point methods. <i>ArXiv Preprint Stat.ME 1504.05994</i>.</span>
<span id="2015arXiv150405994S_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#2015arXiv150405994S_abstract" data-toggle="collapse" href="#2015arXiv150405994S" onclick="return false">Abstract</a></li>
<li><a class="bib-materials" data-target="#2015arXiv150405994S_bibtex" data-toggle="collapse" href="#2015arXiv150405994S" onclick="return false">Bib</a></li>
<li><a class="bib-materials" href="http://arxiv.org/pdf/1504.05994v1.pdf">PDF</a></li>
</ul>
<p id="2015arXiv150405994S_abstract" class="collapse">This article is concerned with Gaussian process quadratures, which are numerical integration methods based on Gaussian process regression methods, and sigma-point methods, which are used in advanced non-linear Kalman filtering and smoothing algorithms. We show that many sigma-point methods can be interpreted as Gaussian quadrature based methods with suitably selected covariance functions. We show that this interpretation also extends to more general multivariate Gauss–Hermite integration methods and related spherical cubature rules. Additionally, we discuss different criteria for selecting the sigma-point locations: exactness for multivariate polynomials up to a given order, minimum average error, and quasi-random point sets. The performance of the different methods is tested in numerical experiments.</p>
<pre id="2015arXiv150405994S_bibtex" class="pre pre-scrollable collapse">@article{2015arXiv150405994S,
author = {{S{\"a}rkk{\"a}}, S. and {Hartikainen}, J. and {Svensson}, L. and {Sandblom}, F.},
title = {{On the relation between Gaussian process quadratures and sigma-point methods}},
journal = {arXiv preprint stat.ME 1504.05994},
year = {2015},
month = apr,
file = {http://arxiv.org/pdf/1504.05994v1.pdf}
}
</pre>
</span>
</li></ol>
Thu, 03 Dec 2015 10:00:00 +0100
/2015/12/03/probabilistic_integration/
/2015/12/03/probabilistic_integration/Connections, Part III: Bayesian Optimization<p><em>As I go around presenting the idea of probabilistic numerics to various
audiences, certain questions about related areas come up repeatedly. This post
explains how probabilistic numerics compares to the area of Bayesian
Optimization. Previous posts discussed connections to
<a href="/general/2015/01/14/UQ/">uncertainty quantification</a> and
<a href="/general/2015/01/15/Stochastics/">stochastic methods</a>.</em></p>
<p><em>A disclaimer: Obviously, everyone has different opinions about the scope and
purpose of certain concepts and academic fields. And I am not an expert in the
areas discussed here. This post relates a part of my own personal
justification, why I think certain ideas are novel and interesting. It is not
intended as a holistic overview of a field. If anyone disagrees with
characterizations I make here, I would be glad if you could relate your
opinion in the comments section below.</em></p>
<p>One of the most frequent questions regarding PN from the machine learning
community is “what about Bayesian Optimization? Shouldn’t it be part of PN,
too?” Some of things I write here arose from an email thread mainly driven by
<a href="http://www-kd.iai.uni-bonn.de/index.php?page=people*details&id=60">Roman Garnett</a>,
<a href="http://stat.columbia.edu/~cunningham/">John Cunningham</a>, Mike Osborne and
myself. </p>
<p><strong><a href="http://bayesianoptimization.org">Bayesian Optimization</a> (BO)</strong> is a
probabilistic description of the task of finding the global extremum of a
function that is not “directly” accessible, either because it is embodied in
some physical process, or because it has very high evaluation cost. A good
example is the search for good parameters of a robotic control problem (where
each function evaluation involves a physical experiment), or finding good
setups for a large machine learning algorithm, such as a deep net (where each
function evaluation involves training the net to convergence, which may take
weeks on a cluster).</p>
<p>BO is a relatively young community, but already very successful. A workshop on
BO is now an annual fixture at
NIPS. <a href="http://bayesopt.github.io">The recent 2014 installment</a> attracted around
50 people (my own rough guess), making it one NIPS’s larger satellites. And
there is some personal overlap between the PN and BO communities, for example
through Roman and Mike (together with Christian Schuler, I also contributed
a BO algorithm myself <a href="#HennigS2012">(Hennig & Schuler, 2012)</a>).
Nevertheless, in my personal opinion, there is
a difference in focusses that makes BO rather different from what we are trying
to achieve with probabilistic numerical methods.</p>
<p>There is a spectrum of algorithms in BO, but most Bayesian optimizers fit into
the following scheme: We aim to find the minimum of
<script type="math/tex">f(x):\mathbb{X}\to\mathbb{R}</script>. To decide at which <script type="math/tex">x_*</script> to collect the
next datapoint(s) <script type="math/tex">y_*</script>, the algorithm builds a regression posterior
<script type="math/tex">p(f\mid X,Y)</script> from previously collected evaluations <script type="math/tex">(X,Y)</script>. This is the
“Bayesian” bit, and virtually always involves Gaussian process models. Said
posterior is then transformed into a utility for the next evaluation,
<script type="math/tex">u[x_*,p(f\mid X,Y)]</script>. For example, this functional may predict how likely
<script type="math/tex">y_*</script> is to be lower than previous function values, or the expected distance
between <script type="math/tex">y_*</script> and the previous lowest achieved value, or something altogether
more complicated, like the expected information gain about the location of the
minimum from an evaluation at <script type="math/tex">x_*</script>. Then – and this is crucial in this
context – the Bayesian global optimizer <strong>performs a <em>numerical</em> global
optimization</strong> to find the global minimum <script type="math/tex">\operatorname{arg
min}_{x_*}(u)</script>. How exactly this is done varies across implementations. There
are two main differences between the original optimization task of the Bayesian
optimizer and this new numerical global optimization problem: The latter is
free of noise, and the objective may be a lot cheaper, because it only involves
computations on the surrogate, not physical experiments. The “structural
complexity” (the shape) of the two problems, may be quite similar though.</p>
<p>In this sense BO “just” turns a tough physical optimization problem into a
tough numerical optimization problem (the quotation marks are truly necessary,
the models used in this are are anything but trivial). This doesn’t mean BO is
pointless. It has been hugely successful recently at providing automated,
surprisingly smart strategies saving time and money on design problems. But
for our purposes it puts BO on a conceptual level “above” the base layer of
numerics.</p>
<p>In our email thread, Roman gave a pointed criticism to this characterization:
Don’t all numerical methods just turn a hard problem into an easier one?
Quasi-Newton methods cheaply approximate expensive Hessian functions, for
example. This is true, and one could have a philosophical debate about the
boundaries between a numerical method and an algorithm merely <em>using</em> numerics
(Bayesian quadrature methods, for example, may well use a numerical
optimization algorithm to design their evaluation strategy). But what is more
important from a practical standpoint is the difference in research focus:
Having attended most of the recent NIPS workshop on BO, I would say the focus
of work in this community is currently on more elaborate structured models
(high-dimensional, heteroscedastic, with varying evaluation cost, etc.), and on
identifying interesting application areas (molecular biology, automated machine
learning, robotics, etc.). There is only limited debate about the computational
cost of BO methods, and on the empirical robustness of these algorithms. This
make sense, because BO algorithms are quite elaborate; so their behaviour and
cost is difficult to analyse.</p>
<p>In contrast, numerical methods form the base layer, the <em>inner loop</em> of
algorithms across many problem domains. They have to be parsimonious, and
trustworthy. When building probabilistic numerical methods, we have to compare
ourselves with the elegant, rugged algorithms built in the applied mathematical
communities over the past century. In my opinion it would be a mistake to
demand a large computational budget from our potential users. Instead we need
to show that probabilistic functionality can deliver efficiency and
expressivity gains at very limited overhead. In so far, I think it is a good
strategy to keep the developments in BO and PN separate for the time being.</p>
<p>However, our tiny cottage industry can learn a lot from the great success that
BO has had in recent years. In about half a decade, BO has started from an
existing basis of just a handful of (much older) early papers to a thriving
community with many different competing key players. There are about as many
existing basic papers on PN ideas as there were on BO a few years ago. I would
be thrilled if, in 5 years, PN were where BO is today.</p>
<h3 id="references">References</h3>
<ol class="bibliography"><li><span id="HennigS2012">Hennig, P., & Schuler, C. J. (2012). Entropy Search for Information-Efficient Global Optimization. <i>Journal of Machine Learning Research</i>, <i>13</i>, 1809–1837.</span>
<span id="HennigS2012_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#HennigS2012_abstract" data-toggle="collapse" href="#HennigS2012" onclick="return false">Abstract</a></li>
<li><a class="bib-materials" data-target="#HennigS2012_bibtex" data-toggle="collapse" href="#HennigS2012" onclick="return false">Bib</a></li>
<li><a class="bib-materials" href="http://jmlr.csail.mit.edu/papers/volume13/hennig12a/hennig12a.pdf">PDF</a></li>
<li><a class="bib-materials" href="http://jmlr.csail.mit.edu/papers/v13/hennig12a.html">web</a></li>
<li><a class="bib-materials" href="http://probabilistic-optimization.org/Global.html">code</a></li>
</ul>
<p id="HennigS2012_abstract" class="collapse">Contemporary global optimization algorithms are based on
local measures of utility, rather than a probability measure
over location and value of the optimum. They thus attempt to
collect low function values, not to learn about the
optimum. The reason for the absence of probabilistic global
optimizers is that the corresponding inference problem is
intractable in several ways. This paper develops desiderata
for probabilistic optimization algorithms, then presents a
concrete algorithm which addresses each of the computational
intractabilities with a sequence of approximations and
explicitly adresses the decision problem of maximizing
information gain from each evaluation. </p>
<pre id="HennigS2012_bibtex" class="pre pre-scrollable collapse">@article{HennigS2012,
title = {Entropy Search for Information-Efficient Global Optimization},
author = {Hennig, P. and Schuler, CJ.},
month = jun,
volume = {13},
pages = {1809-1837},
journal = {Journal of Machine Learning Research},
year = {2012},
file = {http://jmlr.csail.mit.edu/papers/volume13/hennig12a/hennig12a.pdf},
link = {http://jmlr.csail.mit.edu/papers/v13/hennig12a.html},
code = {http://probabilistic-optimization.org/Global.html}
}
</pre>
</span>
</li></ol>
Fri, 16 Jan 2015 07:00:00 +0100
/optimization/2015/01/16/BO/
/optimization/2015/01/16/BO/Connections, Part II: Stochastic numerical methods<p><em>As I go around presenting the idea of probabilistic numerics to various
audiences, certain questions about related areas come up repeatedly. This post
explains how probabilistic numerics compares to existing ideas of using
stochasticity in numerical problems.
<a href="/general/2015/01/14/UQ/">A previous post</a> discussed connections to
uncertainty quantification. A subsequent one will look at
<a href="/optimization/2015/01/16/BO/">Bayesian Optimization</a>.</em></p>
<p><em>A disclaimer: Obviously, everyone has different opinions about the scope and
purpose of certain concepts and academic fields. And I am not an expert in the
areas discussed here. This post relates a part of my own personal
justification, why I think certain ideas are novel and interesting. It is not
intended as a holistic overview of a field. If anyone disagrees with
characterizations I make here, I would be glad if you could relate your
opinion in the comments section below.</em></p>
<p>Using <strong>stochasticity</strong> to solve deterministic problems is anything but a new idea
in numerical mathematics. Random numbers have been employed in at least two quite different
ways for numerical purposes, one lowering cost and precision, the other
increasing cost and robustness.</p>
<h3 id="randomized-methods">Randomized methods</h3>
<p><em>Projecting</em> a large problem onto a randomly chosen smaller space can reduce
computational cost while introducing a new kind of imprecision. Such methods
are called <em>randomized</em> or, unfortunately, also <em>probabilistic</em> algorithms
<a href="#liberty2007randomized">(Liberty, Woolfe, Martinsson, Rokhlin, & Tygert, 2007)</a>
<a href="#halko2011finding">(Halko, Martinsson, & Tropp, 2011)</a>.
The key idea is that given a particular numerical problem spanning variables
from a certain high-dimensional space, one chooses a random projection into a
space of much lower dimensionality, and solves the problem in that space. The
surprising aspect, the strength of this approach, is that it tends to cause a
lot less deficiencies than one would intuitively assume; and this has been
studied quite rigorously.</p>
<h3 id="perturbations">Perturbations</h3>
<p>Repeatedly solving randomly <em>perturbed</em> variants of a problem can help quantify
the robustness of a task. This is in some sense the counterpart to the above
projection approach: Instead of removing degrees of freedom to get drastically
lower cost at surprisingly low loss of precision, perturbation methods enrich
the description of a problem to probe new interesting aspects – at drastically
higher cost. (For clarity: I’m not talking about
<a href="http://en.wikipedia.org/wiki/Perturbation_theory">“perturbation methods”</a>
which are a theoretical tool, not a computational one, and random numbers only
play a conceptual role there).</p>
<h3 id="monte-carlo-methods">Monte Carlo methods</h3>
<p>A third, much-studied area that is of particular relevance for probabilistic
numerics are <a href="http://en.wikipedia.org/wiki/Monte_Carlo_method">Monte Carlo</a>
methods. In statistics and machine learning, MC methods are used primarily to
compute expectations and marginals – i.e., for quadrature. In contrast to the
two other uses of random numbers above, MC methods really solve an unmodified
given numerical integration task. Although they can be quite elaborate
algorithms, the basic idea is quickly explained: To compute the intractable
expectation <script type="math/tex">\langle f\rangle_p= \int f(x) p(x) dx</script>, draw <script type="math/tex">N</script> samples
<script type="math/tex">x_i\sim p</script>, and approximate <script type="math/tex">\langle f\rangle_p\approx \frac{1}{N}\sum_i
f(x_i)</script>. This is an unbiased statistical estimator which converges at the
<em>statistically</em> optimal rate of <script type="math/tex">\mathcal{O}(N^{-1/2})</script>. If you can’t draw
exact samples from <script type="math/tex">p</script>, you have to compute them approximately, and this is
where a lot of the research (particularly in Markov Chain MC methods) has
focussed over the past decades.</p>
<h3 id="uncertainty-does-not-need-random-numbers">Uncertainty does not need random numbers</h3>
<p>The area currently of biggest interest for probabilistic numerics are
complementary to the two former settings described above. With a probabilistic
numerical method, we do not necessarily want to reduce a given problem to a
less costly variant, but are interested in as good a solution to the actual
problem at hand (however, it is interesting that recent results suggest that
choosing projections in a non-random, “most informative” way can be done at
reasonable cost and may improve performance
<a href="#GarnettOH2013">(Garnett, Osborne, & Hennig, 2014)</a>). And
while we may well wonder about the sensitivity of the found solution to
perturbations of the task, our chief interest is in the error created by the
approximate computation itself, and the sensitivity of our computation’s result
to further steps.</p>
<p>The story is much more intricate—and exciting—with regards to MC
methods. Recently, I have found myself repeatedly in discussions with
colleagues about the helpfulness of random numbers for computations, in
connection with our recent NIPS paper on fast Bayesian quadrature
<a href="#gunter14-fast-bayesian-quadrature">(Gunter, Osborne, Garnett, Hennig, & Roberts, 2014)</a>. There is a famous
polemic by Tony O’Hagan <a href="#o1987monte">(O’Hagan, 1987)</a> pointing out that
randomness leads to serious inefficiencies when performing a deterministic
computation. It is well-known that, on low-dimensional problems, classic
quadrature rules can converge <em>much</em> faster than MC estimators: Depending on
the rule used, and the smoothness of the integrand, <script type="math/tex">\mathcal{O}(N^{-p})</script> for
<script type="math/tex">p\in\mathbb{N}</script> is not unusual. Classic quadrature rules are identified with maximum a
posteriori estimators under various Gaussian process priors over the integrand
<a href="#minka2000deriving">(Minka, 2000)</a>. Extending on this insight, our
recent NIPS paper presents a general purpose quadrature method for strictly
positive integrands (such as the probability distribution <script type="math/tex">p</script>) which
empirically outperforms (in wall-clock time) established MCMC methods. Given
the ubiquity of MC methods, this kind of result is very exciting, and once again
suggests an area of research that is only just beginning to take shape.</p>
<h3 id="references">References</h3>
<ol class="bibliography"><li><span id="gunter14-fast-bayesian-quadrature">Gunter, T., Osborne, M. A., Garnett, R., Hennig, P., & Roberts, S. (2014). Sampling for Inference in Probabilistic Models with Fast
Bayesian Quadrature. In C. Cortes & N. Lawrence (Eds.), <i>Advances in Neural Information Processing Systems (NIPS)</i>.</span>
<span id="gunter14-fast-bayesian-quadrature_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#gunter14-fast-bayesian-quadrature_abstract" data-toggle="collapse" href="#gunter14-fast-bayesian-quadrature" onclick="return false">Abstract</a></li>
<li><a class="bib-materials" data-target="#gunter14-fast-bayesian-quadrature_bibtex" data-toggle="collapse" href="#gunter14-fast-bayesian-quadrature" onclick="return false">Bib</a></li>
<li><a class="bib-materials" href="https://github.com/OxfordML/wsabi">code</a></li>
</ul>
<p id="gunter14-fast-bayesian-quadrature_abstract" class="collapse">We propose a novel sampling framework for inference in
probabilistic models: an active learning approach that
converges more quickly (in wall-clock time) than Markov chain
Monte Carlo (MCMC) benchmarks. The central challenge in
probabilistic inference is numerical integration, to average
over ensembles of models or unknown (hyper-)parameters (for
example to compute marginal likelihood or a partition
function). MCMC has provided approaches to numerical
integration that deliver state-of-the-art inference, but can
suffer from sample inefficiency and poor convergence
diagnostics. Bayesian quadrature techniques offer a
model-based solution to such problems, but their uptake has
been hindered by prohibitive computation costs. We introduce
a warped model for probabilistic integrands (likelihoods)
that are known to be non-negative, permitting a cheap active
learning scheme to optimally select sample locations. Our
algorithm is demonstrated to offer faster convergence (in
seconds) relative to simple Monte Carlo and annealed
importance sampling on both synthetic and real-world
examples.</p>
<pre id="gunter14-fast-bayesian-quadrature_bibtex" class="pre pre-scrollable collapse">@inproceedings{gunter14-fast-bayesian-quadrature,
author = {Gunter, Tom and Osborne, Michael A. and Garnett, Roman and Hennig, Philipp and Roberts, Stephen},
title = {Sampling for Inference in Probabilistic Models with Fast
Bayesian Quadrature},
booktitle = {Advances in Neural Information Processing Systems (NIPS)},
year = {2014},
editor = {Cortes, C. and Lawrence, N.},
code = {https://github.com/OxfordML/wsabi}
}
</pre>
</span>
</li>
<li><span id="GarnettOH2013">Garnett, R., Osborne, M., & Hennig, P. (2014). Active Learning of Linear Embeddings for Gaussian Processes. In N. L. Zhang & J. Tian (Eds.), <i>Proceedings of the 30th Conference on Uncertainty in
Artificial Intelligence</i> (pp. 230–239). AUAI Press. Retrieved from http://auai.org/uai2014/proceedings/individuals/152.pdf</span>
<span id="GarnettOH2013_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#GarnettOH2013_abstract" data-toggle="collapse" href="#GarnettOH2013" onclick="return false">Abstract</a></li>
<li><a class="bib-materials" data-target="#GarnettOH2013_bibtex" data-toggle="collapse" href="#GarnettOH2013" onclick="return false">Bib</a></li>
<li><a class="bib-materials" href="http://auai.org/uai2014/proceedings/individuals/152.pdf">PDF</a></li>
<li><a class="bib-materials" href="https://github.com/rmgarnett/mgp">code</a></li>
</ul>
<p id="GarnettOH2013_abstract" class="collapse">We propose an active learning method for discovering
low-dimensional structure in high-dimensional Gaussian
process (GP) tasks. Such problems are increasingly frequent
and important, but have hitherto presented severe practical
difficulties. We further introduce a novel technique for
approximately marginalizing GP hyperparameters, yielding
marginal predictions robust to hyperparameter
misspecification. Our method offers an efficient means of
performing GP regression, quadrature, or Bayesian
optimization in high-dimensional spaces.</p>
<pre id="GarnettOH2013_bibtex" class="pre pre-scrollable collapse">@inproceedings{GarnettOH2013,
title = {Active Learning of Linear Embeddings for Gaussian Processes},
author = {Garnett, R. and Osborne, M. and Hennig, P.},
booktitle = {Proceedings of the 30th Conference on Uncertainty in
Artificial Intelligence},
editor = {Zhang, NL and Tian, J},
publisher = {AUAI Press},
pages = {230-239},
year = {2014},
url = {http://auai.org/uai2014/proceedings/individuals/152.pdf},
url2 = {https://github.com/rmgarnett/mgp},
department = {Department Sch{\"o}lkopf},
file = {http://auai.org/uai2014/proceedings/individuals/152.pdf},
code = {https://github.com/rmgarnett/mgp}
}
</pre>
</span>
</li>
<li><span id="halko2011finding">Halko, N., Martinsson, P.-G., & Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms
for constructing approximate matrix decompositions. <i>SIAM Review</i>, <i>53</i>(2), 217–288.</span>
<span id="halko2011finding_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#halko2011finding_bibtex" data-toggle="collapse" href="#halko2011finding" onclick="return false">Bib</a></li>
</ul>
<pre id="halko2011finding_bibtex" class="pre pre-scrollable collapse">@article{halko2011finding,
title = {Finding structure with randomness: Probabilistic algorithms
for constructing approximate matrix decompositions},
author = {Halko, Nathan and Martinsson, Per-Gunnar and Tropp, Joel A},
journal = {SIAM review},
volume = {53},
number = {2},
pages = {217--288},
year = {2011},
publisher = {SIAM}
}
</pre>
</span>
</li>
<li><span id="liberty2007randomized">Liberty, E., Woolfe, F., Martinsson, P.-G., Rokhlin, V., & Tygert, M. (2007). Randomized algorithms for the low-rank approximation of
matrices. <i>Proceedings of the National Academy of Sciences</i>, <i>104</i>(51), 20167–20172.</span>
<span id="liberty2007randomized_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#liberty2007randomized_bibtex" data-toggle="collapse" href="#liberty2007randomized" onclick="return false">Bib</a></li>
</ul>
<pre id="liberty2007randomized_bibtex" class="pre pre-scrollable collapse">@article{liberty2007randomized,
title = {Randomized algorithms for the low-rank approximation of
matrices},
author = {Liberty, Edo and Woolfe, Franco and Martinsson, Per-Gunnar and Rokhlin, Vladimir and Tygert, Mark},
journal = {Proceedings of the National Academy of Sciences},
volume = {104},
number = {51},
pages = {20167--20172},
year = {2007}
}
</pre>
</span>
</li>
<li><span id="minka2000deriving">Minka, T. P. (2000). <i>Deriving quadrature rules from Gaussian processes</i>. Statistics Department, Carnegie Mellon University.</span>
<span id="minka2000deriving_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#minka2000deriving_abstract" data-toggle="collapse" href="#minka2000deriving" onclick="return false">Abstract</a></li>
<li><a class="bib-materials" data-target="#minka2000deriving_bibtex" data-toggle="collapse" href="#minka2000deriving" onclick="return false">Bib</a></li>
<li><a class="bib-materials" href="http://research.microsoft.com/en-us/um/people/minka/papers/quadrature.html">web</a></li>
</ul>
<p id="minka2000deriving_abstract" class="collapse">Quadrature rules are often designed to achieve zero error on
a small set of functions, e.g. polynomials of specified
degree. A more robust method is to minimize average error
over a large class or distribution of functions. If functions
are distributed according to a Gaussian process, then
designing an average-case quadrature rule reduces to solving
a system of 2n equations, where n is the number of nodes in
the rule (O’Hagan, 1991). It is shown how this very general
technique can be used to design customized quadrature rules,
in the style of Yarvin & Rokhlin (1998), without the need for
singular value decomposition and in any number of
dimensions. It is also shown how classical Gaussian
quadrature rules, trigonometric lattice rules, and spline
rules can be extended to the average-case and to multiple
dimensions by deriving them from Gaussian processes. In
addition to being more robust, multidimensional quadrature
rules designed for the average-case are found to be much less
ambiguous than those designed for a given polynomial degree.</p>
<pre id="minka2000deriving_bibtex" class="pre pre-scrollable collapse">@techreport{minka2000deriving,
author = {Minka, T.P.},
institution = {Statistics Department, Carnegie Mellon University},
title = {{Deriving quadrature rules from {G}aussian processes}},
year = {2000},
link = {http://research.microsoft.com/en-us/um/people/minka/papers/quadrature.html}
}
</pre>
</span>
</li>
<li><span id="o1987monte">O’Hagan, A. (1987). Monte Carlo is fundamentally unsound. <i>The Statistician</i>, 247–249.</span>
<span id="o1987monte_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#o1987monte_bibtex" data-toggle="collapse" href="#o1987monte" onclick="return false">Bib</a></li>
<li><a class="bib-materials" href="http://www.jstor.org/stable/2348519?seq=1#page_scan_tab_contents">web</a></li>
</ul>
<pre id="o1987monte_bibtex" class="pre pre-scrollable collapse">@article{o1987monte,
title = {Monte Carlo is fundamentally unsound},
author = {O'Hagan, Anthony},
journal = {The Statistician},
pages = {247--249},
year = {1987},
link = {http://www.jstor.org/stable/2348519?seq=1#page_scan_tab_contents}
}
</pre>
</span>
</li></ol>
Thu, 15 Jan 2015 07:00:00 +0100
/general/2015/01/15/Stochastics/
/general/2015/01/15/Stochastics/Connections, Part I: Uncertainty Quantification<p><em>As I go around presenting the idea of probabilistic numerics to various
audiences, certain questions about related areas come up repeatedly. This post
explains how probabilistic numerics compares to the established area of
Uncertainty Quantification. Subsequent posts will discuss connections to
certain kinds of
<a href="/general/2015/01/15/Stochastics/">stochastic numerical methods</a>,
and the young and popular area of
<a href="/optimization/2015/01/16/BO/">Bayesian Optimization</a>.</em></p>
<p><em>A disclaimer: Obviously, everyone has different opinions about the scope and
purpose of certain concepts and academic fields. And I am not an expert in the
areas discussed here. This post relates a part of my own personal
justification, why I think certain ideas are novel and interesting. It is not
intended as a holistic overview of a field. If anyone disagrees with
characterizations I make here, I would be glad if you could relate your
opinion in the comments section below. I’m grateful to Mike Osborne for some
comments on a draft for this post.</em></p>
<p><strong>Uncertainty Quantification (UQ)</strong> is a relatively young area (but
considerably older than probabilistic numerics) at the boundary of numerical
mathematics and statistics, dealing with, as
<a href="http://www.siam.org/journals/juq.php">SIAM and the ASA</a> define it: <em>“the
interface of complex modeling of processes and data, especially
characterizations of the uncertainties inherent in the use of such models.”</em>
This description is (probably deliberately) vague and might well describe all
of statistics. As Tony O’Hagan writes
<a href="#ohagan13-polyn-chaos">(O’Hagan, 2013)</a>:</p>
<blockquote>
<p>It is the AM [applied mathematics] community that is responsible for the term
UQ (Uncertainty Quantification). To a member of the Stat community,
however, UQ seems a curiously inappropriate term because (a) it sounds as
though it should cover the whole of Statistics, since one way of viewing the
science of Statistics is precisely as the science of quantifying uncertainty,
whereas (b) the uncertainties that are studied in the AM community do not even
cover the range of uncertainties that statisticians would recognise in the
predictions of models. Nevertheless, UQ is the de facto term that has been
accepted by the SIAM/ASA joint initiative.</p>
</blockquote>
<p>In my impression – and my knowledge of the field is very limited – UQ is
still evolving and continues to expand and address new questions. But the
principle problem at its core is the <em>propagation of input uncertainty</em>. In
very simple terms: Consider a complicated model <script type="math/tex">f:x\to y</script> mapping inputs
<script type="math/tex">x</script> to outputs <script type="math/tex">y</script>. If I “wiggle” <script type="math/tex">x</script> at a given value <script type="math/tex">x_0</script> according
to some perturbation <script type="math/tex">\delta x</script>, what is the effect on <script type="math/tex">y</script>? For example,
<script type="math/tex">f</script> may be a climate model, or a flow field over some complicated airframe. </p>
<p>There is a variety of methods used for this task in UQ, including the popular
concept of a <em>polynomial chaos</em> expansion, which, intriguingly, has a
connection to the Gaussian process surrogates used in Bayesian Optimization,
which will be the subject of a later blog post on connections. And clearly, the
uncertainty propagation described above addresses a certain type of
probabilistic uncertainty we are also concerned with in Probabilistic Numerics:
Given uncertain inputs to a problem and a particular algorithm used to solve
it, what should the uncertainty over the output be?</p>
<p>Nevertheless, our questions in probabilistic numerics take a different, and in
some cases broader view that warrants its own research. For us, the object of
interest is the computation itself, rather than its reaction to a change in
input. Questions we would like to answer in probabilistic numerics include, for
a particular computation task: Is it performed as efficiently as possible (from
an information theoretic perspective) by a particular algorithm? What kind of
information does it collect about related objects? And could it be made more
adaptive to salient prior knowledge about a certain task at hand?</p>
<h3 id="many-different-types-of-uncertainty-not-all-quantified">many different types of uncertainty, not all quantified</h3>
<p>To get an intuition for the many different ways that uncertainty can play a
role in computation, let’s look at an elementary problem: the linear problem of
finding <script type="math/tex">x\in\mathbb{R}^N</script> such that <script type="math/tex">Ax=b</script> with <script type="math/tex">b\in\mathbb{R}^M</script> and a
matrix <script type="math/tex">A\in\mathbb{R}^{M\times N}</script>. Here are some questions one could ask:</p>
<ul>
<li>
<p><strong>Uncertainty from ill-posedness</strong>: If <script type="math/tex">% <![CDATA[
M<N %]]></script>, there are generally many
<script type="math/tex">x</script> that solve the problem. What is the space of such admissible <script type="math/tex">x</script>? This
is (a base case of) the kind of uncertainty typically studied in statistics,
when a dataset is not sufficiently informative to identify the parameters of a
model.</p>
</li>
<li>
<p><strong>Uncertainty from imprecise inputs</strong>: If <script type="math/tex">b</script> is perturbed to <script type="math/tex">b+\delta
b</script>, what is the effect on <script type="math/tex">x</script>? This is the elementary version of a central
point of interest for Uncertainty Quantification. (But, as Tony O’Hagan says
above, this kind of uncertainty is of course also of interest in statistics
sometimes). </p>
</li>
</ul>
<p>These two questions are classic ones, and in this linear problem, they have
straightforward, or in fact trivial answers, while in nonlinear problems, they
can pose formidable challenges. With probabilistic numerics, we add another set
of questions:</p>
<ul>
<li><strong>Uncertainty from computation</strong>: To make the point very clear, let’s assume
<script type="math/tex">N=M</script> and <script type="math/tex">A</script> is nonsingular, so that there is only one unique <script type="math/tex">x</script>
satisfying <script type="math/tex">Ax=b</script>. Then the problem is ‘well-posed’, and perturbations in
<script type="math/tex">b</script> have linear effects on <script type="math/tex">x</script>. But there can still be computational
uncertainty: If <script type="math/tex">N</script> is too large for an exact solution (e.g. by Gauss-Jordan
elimination) we would use an iterative method. For example, if <script type="math/tex">A</script> happens
to also be symmetric positive definite, the
<a href="http://en.wikipedia.org/wiki/Conjugate_gradient_method">method of conjugate gradients (CG)</a>
would be a standard approach. CG is an iterative algorithm producing a
sequence of increasingly better <em>approximations</em> <script type="math/tex">\{x_i\}_{i=1,\dots,N}</script> to
the correct <script type="math/tex">x</script>. It is usually stopped long before its (theoretical)
convergence. If we stop CG after <script type="math/tex">k</script> iterations, what kind of information
have we effectively collected? The quality of the constructed approximation
itself can be assessed by the residual <script type="math/tex">r_k = Ax_k-b</script>, but there are
questions for which the residual alone is not sufficient: Maybe we also need
to solve the related problem <script type="math/tex">Ay=c</script>. How much do we learn about <script type="math/tex">y</script> from
computing <script type="math/tex">x_k</script>?<br />
Another question one might ask is whether CG is always the right method to
use. Maybe we have some vague prior information about <script type="math/tex">A</script>. For example we
may know that the elements in row and column 1 through 10 are a lot larger
than those in the remaining columns. Or we know that, while it is not quite a
Toeplitz matrix, it tends to have a banded structure. Can we use this
information to find a solution quicker than standard CG?<br />
These are the kinds of problem that entice us in probabilistic numerics
(shameless plug: answers for this linear positive definite setting can be
found in a recent paper by myself, about to come out in the SIAM JOPT
<a href="#2014arXiv14022058H">(Hennig, 2015)</a>).</li>
</ul>
<p>To summarize: Probabilistic numerics focusses on the computations used to solve
a particular problem: what is the uncertainty added by performing the
computation approximately, and how can this information be used in ways not yet
anticipated? PN and UQ can live very well next to each other. PN can learn much
from UQ, in particular in areas like uncertainty propagation. Conversely, I
hope colleagues working in UQ will be interested in how our results on PN can
help in large scale UQ. The existence of UQ as such is no reason not to study
PN.</p>
<h3 id="references">References</h3>
<ol class="bibliography"><li><span id="2014arXiv14022058H">Hennig, P. (2015). Probabilistic Interpretation of Linear Solvers. <i>SIAM J on Optimization</i>, <i>25</i>(1).</span>
<span id="2014arXiv14022058H_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#2014arXiv14022058H_abstract" data-toggle="collapse" href="#2014arXiv14022058H" onclick="return false">Abstract</a></li>
<li><a class="bib-materials" data-target="#2014arXiv14022058H_bibtex" data-toggle="collapse" href="#2014arXiv14022058H" onclick="return false">Bib</a></li>
<li><a class="bib-materials" href="http://probabilistic-numerics.org/assets/pdf/HennigLinear2015.pdf">PDF</a></li>
<li><a class="bib-materials" href="http://epubs.siam.org/doi/abs/10.1137/140955501?journalCode=sjope8">web</a></li>
</ul>
<p id="2014arXiv14022058H_abstract" class="collapse">This paper proposes a probabilistic framework for algorithms
that iteratively solve unconstrained linear problems Bx = b with positive
definite B for x. The goal is to replace the point estimates returned by
existing methods with a Gaussian posterior belief over the elements of the
inverse of B, which can be used to estimate errors. Recent probabilistic
interpretations of the secant family of quasi-Newton optimization algorithms
are extended. Combined with properties of the conjugate gradient algorithm,
this leads to uncertainty-calibrated methods with very limited cost overhead
over conjugate gradients, a self-contained novel interpretation of the
quasi-Newton and conjugate gradient algorithms, and a foundation for new
nonlinear optimization methods.</p>
<pre id="2014arXiv14022058H_bibtex" class="pre pre-scrollable collapse">@article{2014arXiv14022058H,
author = {{Hennig}, P.},
journal = {SIAM J on Optimization},
month = jan,
title = {{Probabilistic Interpretation of Linear Solvers}},
year = {2015},
link = {http://epubs.siam.org/doi/abs/10.1137/140955501?journalCode=sjope8},
volume = {25},
issue = {1},
file = {http://probabilistic-numerics.org/assets/pdf/HennigLinear2015.pdf}
}
</pre>
</span>
</li>
<li><span id="ohagan13-polyn-chaos">O’Hagan, A. (2013). <i>Polynomial Chaos: A Tutorial and Critique from a
Statistician’s Perspective</i>. University of Sheffield, UK.</span>
<span id="ohagan13-polyn-chaos_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#ohagan13-polyn-chaos_bibtex" data-toggle="collapse" href="#ohagan13-polyn-chaos" onclick="return false">Bib</a></li>
</ul>
<pre id="ohagan13-polyn-chaos_bibtex" class="pre pre-scrollable collapse">@techreport{ohagan13-polyn-chaos,
author = {O'Hagan, Anthony},
title = {Polynomial Chaos: A Tutorial and Critique from a
Statistician's Perspective},
institution = {University of Sheffield, UK},
year = {2013},
month = may
}
</pre>
</span>
</li></ol>
Wed, 14 Jan 2015 07:00:00 +0100
/general/2015/01/14/UQ/
/general/2015/01/14/UQ/Tübingen Manifesto: Community<p><em>We in Probabilistic Numerics face many unanswered questions in growing the field.
Our <a href="/workshops/2014/08/22/Roundtable-2014-in-Tuebingen/">roundtable in Tübingen</a> aimed to bring together our new community to begin to address some of these questions.
This is the final of a sequence of posts that attempt to collate some of what we spoke about, and to, hopefully, provoke further discussion.</em></p>
<h2 id="who-are-our-users">Who Are Our Users?</h2>
<p>Before we can address some of the questions considered in the <a href="/workshops/roundtable_manifesto/2014/09/03/Roundtable-Priors-and-Prior-Work/">last post</a>, there is a more fundamental question to be answered: <em>who are our users?</em>
Which other disciplines are likely to care most about the probabilistic numerics tools that we can potentially deliver?
Rather than waiting for users to come to us, it seems clear that we need to find a strong motivating application within which to deeply embed our approaches.
Even given we can find such an application: how are we to reach out to the relevant users?
These considerations would seem to have direct influence on the course of research undertaken within ProbNum.</p>
<p>One clear community we might wish to engage with is <em>numerical analysis,</em> for obvious reasons.
It is this much older field that has developed most<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> of the techniques against which ProbNum must compare itself.
To advertise our work to numerical analysts, we would need to engage on their terms.
This means that we must establish the reliability of ProbNum techniques according to the traditional notions of consistency and rates of convergence.</p>
<p>Working against establishing such theory for ProbNum is the very complexity of our algorithms.
Much of ProbNum is built upon Gaussian processes (GPs), for which existing theory is patchy at best: it’s difficult to say much of any significance if you don’t know the right covariance function.
Worse still, ProbNum often requires approximations to full (expensive) Gaussian process (GP) inference so that algorithms can serve as cheap, inner-loop, methods.
Producing firm results for these algorithms is likely to induce many headaches. However, it seems undesirable to sacrifice algorithmic complexity, and often performance, simply so as to establish theory.</p>
<p>Furthermore, using GPs ensures that the output of many ProbNum algorithms has unbounded support: any result is possible.
While it is, of course, possible to produce probabilistic bounds by thresholding the probability, this would seem to miss a central advantage of ProbNum: its production of full posteriors for quantities of interest.
Some rebels at the roundtable even posited that the bounds provided for many traditional numerical algorithms are far too loose, in practice.
The greater structure incorporable into ProbNum algorithms promises much tighter, better-calibrated, representations of uncertainty.</p>
<p>Uncertainty becomes a central concern if ProbNum thinks about turning to applications in the experimental sciences.
Most scientists are well aware of the importance of uncertainty management.
For example, climate scientists are well-versed in statistical reasoning, and climate science models are sensitive to numerical procedures:
quantifying the uncertainty introduced through the use of numerical algorithms seems a relatively easy sell.
<!-- Honest uncertainty estimates are required wherever experimental design, or active learning, is involved.
If an experiment requires digging time-consuming and expensive holes, it makes sense to spend a little more computation to incorporate uncertainty management into your numerical procedures.
Nobody wants to waste an experiment because
--></p>
<p>As an aside, the added output of uncertainty to numerical procedures adds value for users in interrogating performance. You could imagine ProbNum providing an enhanced profiler: able to assess the importance of every algorithm in your pipeline. Any user assembling a hierarchy of numerical procedures stands to benefit from a ProbNum approach to assessing sensitivity.</p>
<h2 id="whither-the-probnum-community">Whither the ProbNum Community?</h2>
<p>This final question we wrestled with at the roundtable was our collective direction as a community.
Here, again, <a href="http://stanford.edu/~ngoodman/">Noah Goodman’s</a> insights from the development of the <a href="http://probabilistic-programming.org/wiki/Home">Probabilistic Programming (ProbProg)</a> community were enlightening.</p>
<p>Noah pointed out that it had taken several years for ProbProg to achieve consensus on the prioritisation of research questions, and the question of who were the right group of users had yet to be satisfactorily answered: so we in ProbNum shouldn’t panic if it takes us a little longer.
Noah also highlighted the need not to <em>over-sell</em> what the field could deliver. I think all of us in ProbNum are aware that we have some way to go, and I think we agree that we must be careful to proceed with honesty about what our algorithms can achieve relative to the state-of-the-art.
Noah finally emphasised the need for a strong public presence, and the maintenance of an <em>open tent.</em>
This argument resonated with us, and has been in our minds in the establishment of this website and blog.
On that note: thanks for making it to the end of these blogs, and, if they have inflamed any interest within you, please do get in touch!</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>It’s probably fair to say that numerical analysis has, however, not been at the forefront of more modern means of quadrature, such as <a href="http://en.wikipedia.org/wiki/Monte_Carlo_integration">Monte Carlo</a>. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Fri, 05 Sep 2014 00:00:00 +0200
/workshops/roundtable_manifesto/2014/09/05/Roundtable-Community/
/workshops/roundtable_manifesto/2014/09/05/Roundtable-Community/Tübingen Manifesto: Priors and Prior Work<p><em>We in Probabilistic Numerics (ProbNum) face many unanswered questions in growing the field.
Our <a href="/workshops/2014/08/22/Roundtable-2014-in-Tuebingen/">roundtable in Tübingen</a> aimed to bring together our new community to begin to address some of these questions.
This is another of a sequence of posts that attempt to collate some of what we spoke about, and to, hopefully, provoke further discussion.</em></p>
<h2 id="generalisation-is-great">Generalisation is Great</h2>
<p>There is a tension within in ProbNum:
to what extent should we be in the business of reintepreting existing numerics as learning algorithms, and to what extent should we be developing ab initio solutions to numerical problems?
On this topic, the roundtable emerged at a reasonable, if dull, conclusion. </p>
<p>Existing algorithms have become successful for good reason, and the experience of the ProbNum community is that these approaches are often tough to beat.
As such, we should always, where possible, aim to <em>generalise existing approaches.</em></p>
<p>Doing so combines the best of both worlds: benefitting from the insights embedded in successful techniques, while still allowing the benefits of the ProbNum framework.
Further, adding a probabilistic intepretation to existing algorithms has proven to often require little overhead.
Even providing theory to match the convergence analysis supplied for existing ODE solvers is not too great a hurdle: we can develop probabilistic convergence proofs in the place of traditional theory.
Finally, communicating the benefits of new ProbNum approaches is substantially aided by describing them in the language of older techniques.</p>
<h2 id="how-much-structure-should-we-cram-into-probnum">How Much Structure Should We Cram Into ProbNum?</h2>
<p>Another key question for ProbNum is: when are richer, more structured, models, worth their associated computational cost? As Bayesians, we’re often keen to incorporate as much prior information (structure) as possible.
However, for numerics, this must be balanced against the computational overheads induced.
In fact, one lesson hard-learned from early ProbNum research is that it is all-too-easy to use excessive structure.</p>
<h2 id="traditional-approaches-use-either-all-the-structure-or-none">Traditional Approaches Use Either All the Structure or None</h2>
<p>Here it is worth revisiting traditional approaches to understand their own choices for structure.
For example, it’s remarkable how much structure traditional optimisation algorithms ignore – see
<a href="#hennig13_quasi_newton_method">(Hennig & Kiefel, 2013)</a> – while remaining exceptionally difficult to outperform with better-informed approaches.</p>
<p>The amount of structure used is also polarised, varying wildly from community to community.
Broadly speaking, Runge-Kutta methods aren’t much used in real world applications: most problems are solved with highly structured, bespoke, solvers.
In contrast, in control engineering, structure is often ignored: most control engineers, in practice, will simply call the Matlab function <code>ode45.m</code>.</p>
<p>Similarly, in quadrature, users will either: </p>
<ul>
<li>call <code>integrate.m</code>, a completely black-box solution uninformed by problem structure, or: </li>
<li>use heavily structure, bespoke, <a href="http://www.math.wsu.edu/faculty/genz/homepage">Genz-style</a> algorithms for particular problems. </li>
</ul>
<p>The problem of bespoke solutions, of course, is that their applicability is limited.
Most at the roundtable did not feel they had the stamina to spend their careers solving a single, narrowly-defined, problem, as the developers of such solutions have done.</p>
<h2 id="probnum-should-embrace-its-flexibility">ProbNum should embrace its flexibility</h2>
<p>A key advantage of ProbNum algorithms is their <em>configurability.</em>As ProbNum algorithms are explicitly model-based, the model can be selected to match the structure of the problem at hand;
Where Gaussian processes are used, we are free<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> to choose their covariance and mean functions to suit;
hence an algorithm like Bayesian Quadrature can be adapted to a wide variety of problems.
The roundtable reached the conclusion that ProbNum should fully exploit this flexibility, one of its unique selling points.</p>
<p>Exposing design choices allows users to more easily adapt ProbNum to solve previously untackled problems.
For example, incorporating knowledge of specific low-dimensional structure seems like the only way to solve high-dimensional problems.
Further, this flexibility accommodates our long-term goal of assembling modular algorithms into an integrated ProbNum system.
That is, structure can be either incorporated or ignored into an individual algorithm, using a <a href="/workshops/roundtable_manifesto/2014/09/01/Roundtable-ProbNum-ProbProg/">meta-reasoning</a> approach to meet the computational desiderata imposed by other algorithms in the ProbNum hierarchy.</p>
<h2 id="references">References</h2>
<ol class="bibliography"><li><span id="hennig13_quasi_newton_method">Hennig, P., & Kiefel, M. (2013). Quasi-Newton Methods – a new direction. <i>Journal of Machine Learning Research</i>, <i>14</i>, 834–865.</span>
<span id="hennig13_quasi_newton_method_materials">
<ul class="nav nav-pills">
<li><a class="bib-materials" data-target="#hennig13_quasi_newton_method_abstract" data-toggle="collapse" href="#hennig13_quasi_newton_method" onclick="return false">Abstract</a></li>
<li><a class="bib-materials" data-target="#hennig13_quasi_newton_method_bibtex" data-toggle="collapse" href="#hennig13_quasi_newton_method" onclick="return false">Bib</a></li>
<li><a class="bib-materials" href="http://jmlr.org/papers/volume14/hennig13a/hennig13a.pdf">PDF</a></li>
</ul>
<p id="hennig13_quasi_newton_method_abstract" class="collapse">Four decades after their invention, quasi-Newton methods are
still state of the art in unconstrained numerical
optimization. Although not usually interpreted thus, these
are learning algorithms that fit a local quadratic
approximation to the objective function. We show that many,
including the most popular, quasi-Newton methods can be
interpreted as approximations of Bayesian linear regression
under varying prior assumptions. This new notion elucidates
some shortcomings of classical algorithms, and lights the
way to a novel nonparametric quasi-Newton method, which is
able to make more efficient use of available information at
computational cost similar to its predecessors.</p>
<pre id="hennig13_quasi_newton_method_bibtex" class="pre pre-scrollable collapse">@article{hennig13_quasi_newton_method,
author = {Hennig, P. and Kiefel, M.},
journal = {Journal of Machine Learning Research},
month = mar,
pages = {834--865},
title = {Quasi-{N}ewton Methods -- a new direction},
volume = {14},
year = {2013},
file = {http://jmlr.org/papers/volume14/hennig13a/hennig13a.pdf}
}
</pre>
</span>
</li></ol>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>At least, we have some flexibility in incorporating such structure, although often the problem imposes constraints. For example, in Bayesian quadrature, we are limited by the requirement that the covariance function be integrable in closed-form. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Wed, 03 Sep 2014 00:00:00 +0200
/workshops/roundtable_manifesto/2014/09/03/Roundtable-Priors-and-Prior-Work/
/workshops/roundtable_manifesto/2014/09/03/Roundtable-Priors-and-Prior-Work/Tübingen Manifesto: Probabilistic Numerics and Probabilistic Programming<p><em>We in Probabilistic Numerics face many unanswered questions in growing the field.
Our <a href="/workshops/2014/08/22/Roundtable-2014-in-Tuebingen/">roundtable in Tübingen</a> aimed to bring together our new community to begin to address some of these questions.
This is another of a sequence of posts that attempt to collate some of what we spoke about, and to, hopefully, provoke further discussion.</em></p>
<p>We were very fortunate to benefit from the coincidental presence of <a href="http://stanford.edu/~ngoodman/">Noah Goodman</a> in Tübingen, who generously spent an afternoon at the roundtable talking with us.
Noah, of course, is a founding and deeply committed member of the <a href="http://probabilistic-programming.org/wiki/Home">Probabilistic Programming</a> community.
Noah had many fascinating reflections on developments within Probabilistic Programming (ProbProg), and how they might connect with Probabilistic Numerics (ProbNum). </p>
<p>ProbProg seems to be largely about allowing the design of complex generative models, and then ensuring that uncertainty is properly propagated in producing posteriors using the model.
This is certainly not the same as the propagation required to manage uncertainty introduced through the use of finite-precision or approximate numerical procedures, but there are some commonalities.
Below are a number of items drawing out some of the links between the fields.</p>
<h2 id="meta-reasoning">Meta-Reasoning</h2>
<p>ProbNum offers the attractive potential of performing decision-theoretic management of systems of probabilistic numerical algorithms.
That is, ProbNum could be used to select which part of a numerical pipeline to refine, that is, to decide when to stop a numerical algorithm achieving accuracy you don’t need.
Noah was interested in this process, which he likened to <a href="http://mlg.eng.cam.ac.uk/duvenaud/talks/tea_talk_metareasoning/index.html"><em>meta-reasoning,</em></a> and recommended making the connection explicit. </p>
<p>Noah also posed the excellent question of how much computation it was worth spending to perform this meta-reasoning.
Of course, this is a question we couldn’t readily answer.
Would a greedy selection of the numerical algorithm to spend the next unit of computation on be sufficient, or would more sophisticated strategies be required?
At this point, Noah quoted <a href="http://www.cs.berkeley.edu/~russell/">Stuart Russell</a> in recommending that, as a rule, “you should only do as much meta-reasoning as regular reasoning”. This seems sensible enough to me!</p>
<h2 id="lazy-evaluation">Lazy Evaluation</h2>
<p>Noah also mentioned links between ProbNum’s approach of returning numerical results of flexible degrees of accuracy to the notion of <a href="http://en.wikipedia.org/wiki/Lazy_evaluation"><em>lazy evaluation</em></a> common in functional languages like Haskell.
In either case, only as much computation is performed as is absolutely required.
For lazy evaluation, what is required can be determined exactly, whereas the ProbNum approach would treat this as a question to be answered with decision theory. </p>
<h2 id="overloading">Overloading</h2>
<p>Another fundamental concept in languages that we discussed was that of <a href="http://en.wikipedia.org/wiki/Function_overloading"><em>overloading</em></a>.
Treating, for example, a probability distribution over integers as a type that generalises the type <code>int</code>, ProbNum might well benefit from being implemented using overloading.
That is, functions could be overloaded to permit the optional input and output of variances in addition to the usual input and output estimates.</p>
<h2 id="the-benefits-of-knowing-a-functions-source-code">The Benefits of Knowing a Function’s Source Code</h2>
<p>In probabilistic programming, uncertainty is represented largely by bags of samples.
It would certainly be interesting to use Bayesian Quadrature downstream of a probabilistic program to try to make better use of those samples.
In such a setting, Bayesian quadrature might even benefit from access to the structure of the probabilistic model, available in the probabilistic program source code.
This structure might inform the mean and covariance functions chosen for the Gaussian process model used within Bayesian quadrature, for example.
A similar approach is at the heart of <a href="http://www.autodiff.org/">autodiff</a>, which analyses the code for a function so as to allow for the computation of its derivatives.
Why shouldn’t we do the same thing in computing integrals?</p>
Mon, 01 Sep 2014 00:00:00 +0200
/workshops/roundtable_manifesto/2014/09/01/Roundtable-ProbNum-ProbProg/
/workshops/roundtable_manifesto/2014/09/01/Roundtable-ProbNum-ProbProg/Tübingen Manifesto: Uncertainty<p><em>We in Probabilistic Numerics face many unanswered questions in growing the field.
Our <a href="/workshops/2014/08/22/Roundtable-2014-in-Tuebingen/">roundtable in Tübingen</a> aimed to bring together our new community to begin to address some of these questions.
This is the first of a sequence of posts that attempt to collate some of what we spoke about, and to, hopefully, provoke further discussion.</em></p>
<p>Our first major question was: <em>what is a well-defined notion of “uncertainty” for a probabilistic numerical method?</em>
Consider <script type="math/tex">\psi = \int_{0}^{1} \exp\bigl((\sin 3 x)^2 + x^2\bigr)\mathrm{d}x</script>.
In a sense, we know <script type="math/tex">\psi</script>: I was able to write it down using a small number of mathematical symbols.
However, of course, to find <script type="math/tex">\psi</script> to three significant figures, I, for one, would have to use a numerical algorithm (please let <a href="mailto:mosb@robots.ox.ac.uk">me</a> know if you do actually have a closed form expression for <script type="math/tex">\psi</script>!).
Given the finite precision of any numerical algorithm, can we really still be said to know <script type="math/tex">\psi</script>?
If not, how should we think about the uncertainty in <script type="math/tex">\psi</script>?</p>
<p>On this, I think the roundtable arrived at a helpful consensus.
We came to the conclusion that uncertainty for probabilistic numerical algorithms was <em>exactly the same quantity</em> that we are used to in statistics.
That is, a probabilistic numerical algorithm is doing exactly what we do in statistics: collecting data and estimating.
Specifically, a numerical integration algorithm, tasked with finding <script type="math/tex">\psi</script>, would collect evaluations of the integrand, and use them to estimate the integral.
For a probabilistic numerical integration algorithm, these evaluations will be used to inform a probabilistic model for the integrand, typically a Gaussian process, which can then be used to determine a posterior probability density for the integral.
As such, we can use exactly the notion of uncertainty we’re used to from any other form of statistical or probabilistic reasoning.
As with other probabilistic calculations, we could, if we desired, reframe the whole of probabilistic numerics as compression, using the language of coding theory.
Whichever language we use, uncertainty in probabilistic numerics is clearly building on solid, existing, foundations.</p>
<p>Of course, there is a caveat: the uncertainty returned by a probabilistic numerics algorithm is conditional on the prior information upon which it is constructed.
Whether the returned uncertainty makes any sense will depend critically on the quality of the prior information included.
For example, in ODE solving, standard numerical algorithms typically ignore the uncertainty in past evaluations: final distributions will thus typically be unreasonably confident.
The output of numerical integration algorithms will likewise be
reliant on the assumptions built into the inference for the probabilistic model for the integrand.
In constructing probabilistic numeric algorithms, we must be careful to ensure that our prior assumptions reflect, as best as possible, our true state of knowledge.</p>
<p>More notes from the workshop to follow!</p>
Wed, 27 Aug 2014 00:00:00 +0200
/workshops/roundtable_manifesto/2014/08/27/Roundtable-Uncertainty/
/workshops/roundtable_manifesto/2014/08/27/Roundtable-Uncertainty/Roundtable in Tübingen<p><strong>Researchers interested in probabilistic numerical methods gathered for a
two-day discussion forum on August 21-22 2014 at the Max Planck Institute for
Intelligent Systems in Tübingen.</strong></p>
<p><img src="/assets/images/roof.jpeg" width="100%" /></p>
<p>The roundtable was organised by Philipp Hennig (MPI Tübingen) and Michael
Osborne (Oxford). Mark Girolami (UCL / Warwick) presented an invited talk.</p>
<p>The roundtable took place from the morning of 21 August to the early afternoon
of 22 August. Hosted by the Max Planck Institute for Intelligent Systems in
Tübingen, Germany, it provided an informal setting for everyone interested in
the development of probabilistic numerical methods. The roundtable was neither
a workshop nor a conference. There are no proceedings, and attendees did not
have to submit a paper to attend. Apart from a small number of talks, the
schedule focused strongly on small-group discussions on specific aspects. Some
of the questions discussed in 2014 included</p>
<ul>
<li>
<p>What is a well-defined notion of “uncertainty” for a probabilistic numerical
method? What are the limits of error estimation?</p>
</li>
<li>
<p>What are good data-structures for the communication between numerical methods
in a pipeline? How can numerical methods convey requirements for precision
among each other?</p>
</li>
<li>
<p>To which degree should numerical methods be inspired by existing numerical
frameworks, and where should we deviate from established concepts? Is there a
place for ab initio probabilistic solutions to existing numerical problems?</p>
</li>
<li>
<p>In building probabilistic methods, when are richer models worth their
associated computational cost? Can we develop families of numerical methods
with cost/accuracy trade-offs tunable to the requirements of the problem at
hand?</p>
</li>
<li>
<p>Can we support new probabilistic numerical methods with the theory required
for them to find broad acceptance?</p>
</li>
</ul>
<h2 id="forming-a-community">Forming a community</h2>
<p>Over the past years, a number of researchers stemming largely from the areas of
machine learning and statistics have attempted to build such probabilistic
numerical methods. A first meeting point for this group was the 2013 NIPS
Workshop on Probabilistic Numerics. Probabilistic Numerics is still a fledgling
community, in fact many of us do not know each other well, and we do not always
know of each other’s work. The Probabilistic Numerics Roundtable hopes to
alleviate this problem.</p>
<p><img src="/assets/images/roundtable2014.JPG" width="70%" align="right" /></p>
<h2 id="participants">Participants</h2>
<ul>
<li><a href="http://www.imperial.ac.uk/AP/faces/pages/read/Home.jsp?person=b.calderhead&_adf.ctrl-state=14q2kzdbf_3&_afrRedirect=1867638372907362">Ben Calderhead</a></li>
<li><a href="http://mlg.eng.cam.ac.uk/frellsen/">Jes Frellsen</a></li>
<li><a href="http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/girolami/">Mark Girolami</a></li>
<li>Tom Gunter</li>
<li><a href="http://www2.compute.dtu.dk/~sohau/">Søren Hauberg</a></li>
<li><a href="http://www.is.tuebingen.mpg.de/nc/employee/details/phennig.html#=0">Philipp Hennig</a></li>
<li><a href="http://www.is.tuebingen.mpg.de/nc/employee/details/mkiefel.html">Martin Kiefel</a></li>
<li><a href="http://www.is.tuebingen.mpg.de/nc/employee/details/eklenske.html">Edgar Klenske</a></li>
<li><a href="http://www.is.tuebingen.mpg.de/nc/employee/details/mmahsereci.html">Maren Mahsereci</a></li>
<li><a href="http://www.linkedin.com/pub/ted-meeds/4/357/982">Ted Meeds</a></li>
<li><a href="http://www.robots.ox.ac.uk/~mosb">Michael A Osborne</a></li>
<li><a href="http://becs.aalto.fi/~ssarkka/">Simo Särkkä</a></li>
<li><a href="http://www.is.tuebingen.mpg.de/nc/employee/details/mschober.html">Michael Schober</a></li>
</ul>
Fri, 22 Aug 2014 11:00:00 +0200
/workshops/2014/08/22/Roundtable-2014-in-Tuebingen/
/workshops/2014/08/22/Roundtable-2014-in-Tuebingen/