Probabilistic Numerics | Linear Algebra

2022

Posterior and Computational Uncertainty in Gaussian processes

Wenger, Jonathan, Pleiss, Geoff, Pförtner, Marvin, Hennig, Philipp, and Cunningham, John P.

In Advances in Neural Information Processing Systems (NeurIPS) 2022

Bib link PDF
@inproceedings{wenger2022computational, author = {Wenger, Jonathan and Pleiss, Geoff and Pf{\"o}rtner, Marvin and Hennig, Philipp and Cunningham, John P.}, bibtex_show = {true}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, title = {Posterior and Computational Uncertainty in {G}aussian processes}, link = {https://arxiv.org/abs/2205.15449}, pdf = {https://arxiv.org/pdf/2205.15449.pdf}, year = {2022} }

2020

Probabilistic Iterative Methods for Linear Systems

Cockayne, Jon, Ipsen, Ilse CF, Oates, Chris J, and Reid, Tim W

arXiv preprint arXiv:2012.12615 2020

Bib
@article{cockayne2020probabilistic, author = {Cockayne, Jon and Ipsen, Ilse CF and Oates, Chris J and Reid, Tim W}, bibtex_show = {true}, journal = {arXiv preprint arXiv:2012.12615}, title = {Probabilistic Iterative Methods for Linear Systems}, year = {2020} }
A Probabilistic Numerical Extension of the Conjugate Gradient Method

Reid, Tim W, Ipsen, Ilse CF, Cockayne, Jon, and Oates, Chris J

arXiv preprint arXiv:2008.03225 2020

Bib
@article{reid2020probabilistic, author = {Reid, Tim W and Ipsen, Ilse CF and Cockayne, Jon and Oates, Chris J}, bibtex_show = {true}, journal = {arXiv preprint arXiv:2008.03225}, title = {A Probabilistic Numerical Extension of the Conjugate Gradient Method}, year = {2020} }

Probabilistic Linear Solvers for Machine Learning

Wenger, Jonathan, and Hennig, Philipp

In Advances in Neural Information Processing Systems (NeurIPS) 2020

Bib link PDF Code

@inproceedings{wenger2020problinsolve,
  author = {Wenger, Jonathan and Hennig, Philipp},
  bibtex_show = {true},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  code = {https://github.com/JonathanWenger/probabilistic-linear-solvers-for-ml},
  link = {https://arxiv.org/abs/2010.09691},
  pdf = {https://arxiv.org/pdf/2010.09691.pdf},
  title = {Probabilistic Linear Solvers for Machine Learning},
  year = {2020}
}

2019

A Bayesian conjugate gradient method (with discussion)

Cockayne, Jon, Oates, Chris J, Ipsen, Ilse CF, and Girolami, Mark

Bayesian Analysis 2019

Bib
@article{cockayne2019bayesian, author = {Cockayne, Jon and Oates, Chris J and Ipsen, Ilse CF and Girolami, Mark}, bibtex_show = {true}, journal = {Bayesian Analysis}, number = {3}, pages = {937--1012}, publisher = {International Society for Bayesian Analysis}, title = {A Bayesian conjugate gradient method (with discussion)}, volume = {14}, year = {2019} }

2017

Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity

Schäfer, Florian, Sullivan, T. J., and Owhadi, Houman

arXiv:1706.02205 [cs, math] 2017

Abs Bib

Dense kernel matrices {}Theta }in }mathbb{R}^{N }times N} obtained from point evaluations of a covariance function \G at locations {}{ x_{i} }}_{1 }leq i }leq N} arise in statistics, machine learning, and numerical analysis. For covariance functions that are Green’s functions elliptic boundary value problems and approximately equally spaced sampling points, we show how to identify a subset \S }subset }{ 1 , }dots , N }} }times }{ 1 , }dots , N }} with {}# S = O ( N }log (N) }log^{d} ( N / }epsilon ) ) such that the zero fill-in block-incomplete Cholesky decomposition of {}Theta_{i,j} 1_{( i,j ) }in S} is an {}epsilon\-approximation of {}Theta\. This block-factorisation can provably be obtained in Ø}left(N }log^{2} ( N ) }left( }log (1/}epsilon ) + }log^{2} ( N ) }right)^{4d+1} }right) complexity in time. Numerical evidence further suggests that element-wise Cholesky decomposition with the same ordering constitutes an Ø}left( N }log^{2} ( N ) }log^{2d} ( N/}epsilon ) }right) solver. The algorithm only needs to know the spatial configuration of the \x_{i} and does not require an analytic representation of \G\. Furthermore, an approximate PCA with optimal rate of convergence in the operator norm can be easily read off from this decomposition. Hence, by using only subsampling and the incomplete Cholesky decomposition, we obtain at nearly linear complexity the compression, inversion and approximate PCA of a large class of covariance matrices. By inverting the order of the Cholesky decomposition we also obtain a near-linear-time solver for elliptic PDEs.
@article{schafer_compression_2017, author = {Schäfer, Florian and Sullivan, T. J. and Owhadi, Houman}, bibtex_show = {true}, file = {https://arxiv.org/pdf/1706.02205.pdf}, journal = {arXiv:1706.02205 [cs, math]}, keywords = {65F30, 42C40, 65F50, 65N55, 65N75, 60G42, 68Q25, 68W40, Computer Science - Computational Complexity, Computer Science - Data Structures and Algorithms, Mathematics - Numerical Analysis, Mathematics - Probability}, month = jun, note = {arXiv: 1706.02205}, title = {Compression, inversion, and approximate {PCA} of dense kernel matrices at near-linear computational complexity}, url = {http://arxiv.org/abs/1706.02205}, urldate = {2017-09-10}, year = {2017} }
Bayesian Inference of Log Determinants

Fitzsimons, Jack, Cutajar, Kurt, Osborne, Michael, Roberts, Stephen, and Filippone, Maurizio

In Uncertainty in Artificial Intelligence 2017

Abs Bib

The log-determinant of a kernel matrix appears in a variety of machine learning problems, ranging from determinantal point processes and generalized Markov random fields, through to the training of Gaussian processes. Exact calculation of this term is often intractable when the size of the kernel matrix exceeds a few thousand. In the spirit of probabilistic numerics, we reinterpret the problem of computing the log-determinant as a Bayesian inference problem. In particular, we combine prior knowledge in the form of bounds from matrix theory and evidence derived from stochastic trace estimation to obtain probabilistic estimates for the log-determinant and its associated uncertainty within a given computational budget. Beyond its novelty and theoretic appeal, the performance of our proposal is competitive with state-of-the-art approaches to approximating the log-determinant, while also quantifying the uncertainty due to budget-constrained evidence.
@inproceedings{fitzsimons_bayesian_2017, author = {Fitzsimons, Jack and Cutajar, Kurt and Osborne, Michael and Roberts, Stephen and Filippone, Maurizio}, bibtex_show = {true}, booktitle = {Uncertainty in {Artificial} {Intelligence}}, file = {http://probabilistic-numerics.org/assets/pdf/Fitzsimons et al. - 2017 - Bayesian Inference of Log Determinants.pdf}, title = {Bayesian {Inference} of {Log} {Determinants}}, url = {https://arxiv.org/abs/1704.01445}, urldate = {2017-06-21}, year = {2017} }

2016

Probabilistic Approximate Least-Squares

Bartels, S., and Hennig, P.

2016

Abs Bib link

Least-squares and kernel-ridge / Gaussian process regression are among the foundational algorithms of statistics and machine learning. Famously, the worst-case cost of exact nonparametric regression grows cubically with the data-set size; but a growing number of approximations have been developed that estimate good solutions at lower cost. These algorithms typically return point estimators, without measures of uncertainty. Leveraging recent results casting elementary linear algebra operations as probabilistic inference, we propose a new approximate method for nonparametric least-squares that affords a probabilistic uncertainty estimate over the error between the approximate and exact least-squares solution (this is not the same as the posterior variance of the associated Gaussian process regressor). This allows estimating the error of the least-squares solution on a subset of the data relative to the full-data solution. The uncertainty can be used to control the computational effort invested in the approximation. Our algorithm has linear cost in the data-set size, and a simple formal form, so that it can be implemented with a few lines of code in programming languages with linear algebra functionality.
@proceedings{BarHen16, author = {Bartels, S. and Hennig, P.}, bibtex_show = {true}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS 2016)}, editors = {Gretton, A. and Robert, C. C. }, file = {http://jmlr.org/proceedings/papers/v51/bartels16.pdf}, link = {http://jmlr.org/proceedings/papers/v51/bartels16.html}, pages = {676--684}, series = {JMLR Workshop and Conference Proceedings}, title = {Probabilistic Approximate Least-Squares}, volume = {51}, year = {2016} }

2015

Stochastic determination of matrix determinants

Dorn, Sebastian, and Enßlin, Torsten A.

Phys. Rev. E 2015

Abs Bib link

Matrix determinants play an important role in data analysis, in particular when Gaussian processes are involved. Due to currently exploding data volumes, linear operations—matrices—acting on the data are often not accessible directly but are only represented indirectly in form of a computer routine. Such a routine implements the transformation a data vector undergoes under matrix multiplication. While efficient probing routines to estimate a matrix’s diagonal or trace, based solely on such computationally affordable matrix-vector multiplications, are well known and frequently used in signal inference, there is no stochastic estimate for its determinant. We introduce a probing method for the logarithm of a determinant of a linear operator. Our method rests upon a reformulation of the log-determinant by an integral representation and the transformation of the involved terms into stochastic expressions. This stochastic determinant determination enables large-size applications in Bayesian inference, in particular evidence calculations, model comparison, and posterior determination.
@article{PhysRevE92013302, author = {Dorn, Sebastian and En\ss{}lin, Torsten A.}, bibtex_show = {true}, issue = {1}, journal = {Phys. Rev. E}, link = {https://link.aps.org/doi/10.1103/PhysRevE.92.013302}, month = jul, numpages = {8}, pages = {013302}, publisher = {American Physical Society}, title = {Stochastic determination of matrix determinants}, volume = {92}, year = {2015} }
Probabilistic Interpretation of Linear Solvers

Hennig, P.

SIAM J on Optimization 2015

Abs Bib link

This paper proposes a probabilistic framework for algorithms that iteratively solve unconstrained linear problems Bx = b with positive definite B for x. The goal is to replace the point estimates returned by existing methods with a Gaussian posterior belief over the elements of the inverse of B, which can be used to estimate errors. Recent probabilistic interpretations of the secant family of quasi-Newton optimization algorithms are extended. Combined with properties of the conjugate gradient algorithm, this leads to uncertainty-calibrated methods with very limited cost overhead over conjugate gradients, a self-contained novel interpretation of the quasi-Newton and conjugate gradient algorithms, and a foundation for new nonlinear optimization methods.
@article{2014arXiv14022058H, author = {{Hennig}, P.}, bibtex_show = {true}, file = {http://probabilistic-numerics.org/assets/pdf/HennigLinear2015.pdf}, issue = {1}, journal = {SIAM J on Optimization}, link = {http://epubs.siam.org/doi/abs/10.1137/140955501?journalCode=sjope8}, month = jan, title = {{Probabilistic Interpretation of Linear Solvers}}, volume = {25}, year = {2015} }

2012

Improving stochastic estimates with inference methods: Calculating matrix diagonals

Selig, Marco, Oppermann, Niels, and Enßlin, Torsten A.

Phys. Rev. E 2012

Abs Bib link

Estimating the diagonal entries of a matrix, that is not directly accessible but only available as a linear operator in the form of a computer routine, is a common necessity in many computational applications, especially in image reconstruction and statistical inference. Here, methods of statistical inference are used to improve the accuracy or the computational costs of matrix probing methods to estimate matrix diagonals. In particular, the generalized Wiener filter methodology, as developed within information field theory, is shown to significantly improve estimates based on only a few sampling probes, in cases in which some form of continuity of the solution can be assumed. The strength, length scale, and precise functional form of the exploited autocorrelation function of the matrix diagonal is determined from the probes themselves. The developed algorithm is successfully applied to mock and real world problems. These performance tests show that, in situations where a matrix diagonal has to be calculated from only a small number of computationally expensive probes, a speedup by a factor of 2 to 10 is possible with the proposed method.
@article{PhysRevE85021134, author = {Selig, Marco and Oppermann, Niels and En\ss{}lin, Torsten A.}, bibtex_show = {true}, issue = {2}, journal = {Phys. Rev. E}, link = {https://link.aps.org/doi/10.1103/PhysRevE.85.021134}, month = feb, numpages = {7}, pages = {021134}, publisher = {American Physical Society}, title = {Improving stochastic estimates with inference methods: Calculating matrix diagonals}, volume = {85}, year = {2012} }