Dense kernel matrices {}Theta }in }mathbb{R}^{N }times N} obtained from point evaluations of a covariance function \G at locations {}{ x_{i} }}_{1 }leq i }leq N} arise in statistics, machine learning, and numerical analysis. For covariance functions that are Green’s functions elliptic boundary value problems and approximately equally spaced sampling points, we show how to identify a subset \S }subset }{ 1 , }dots , N }} }times }{ 1 , }dots , N }} with {}# S = O ( N }log (N) }log^{d} ( N / }epsilon ) ) such that the zero fill-in block-incomplete Cholesky decomposition of {}Theta_{i,j} 1_{( i,j ) }in S} is an {}epsilon\-approximation of {}Theta\. This block-factorisation can provably be obtained in Ø}left(N }log^{2} ( N ) }left( }log (1/}epsilon ) + }log^{2} ( N ) }right)^{4d+1} }right) complexity in time. Numerical evidence further suggests that element-wise Cholesky decomposition with the same ordering constitutes an Ø}left( N }log^{2} ( N ) }log^{2d} ( N/}epsilon ) }right) solver. The algorithm only needs to know the spatial configuration of the \x_{i} and does not require an analytic representation of \G\. Furthermore, an approximate PCA with optimal rate of convergence in the operator norm can be easily read off from this decomposition. Hence, by using only subsampling and the incomplete Cholesky decomposition, we obtain at nearly linear complexity the compression, inversion and approximate PCA of a large class of covariance matrices. By inverting the order of the Cholesky decomposition we also obtain a near-linear-time solver for elliptic PDEs.

The log-determinant of a kernel matrix appears in a variety of machine learning problems, ranging from determinantal point processes and generalized Markov random fields, through to the training of Gaussian processes. Exact calculation of this term is often intractable when the size of the kernel matrix exceeds a few thousand. In the spirit of probabilistic numerics, we reinterpret the problem of computing the log-determinant as a Bayesian inference problem. In particular, we combine prior knowledge in the form of bounds from matrix theory and evidence derived from stochastic trace estimation to obtain probabilistic estimates for the log-determinant and its associated uncertainty within a given computational budget. Beyond its novelty and theoretic appeal, the performance of our proposal is competitive with state-of-the-art approaches to approximating the log-determinant, while also quantifying the uncertainty due to budget-constrained evidence.

Least-squares and kernel-ridge / Gaussian process regression are among the foundational algorithms of statistics and machine learning. Famously, the worst-case cost of exact nonparametric regression grows cubically with the data-set size; but a growing number of approximations have been developed that estimate good solutions at lower cost. These algorithms typically return point estimators, without measures of uncertainty. Leveraging recent results casting elementary linear algebra operations as probabilistic inference, we propose a new approximate method for nonparametric least-squares that affords a probabilistic uncertainty estimate over the error between the approximate and exact least-squares solution (this is not the same as the posterior variance of the associated Gaussian process regressor). This allows estimating the error of the least-squares solution on a subset of the data relative to the full-data solution. The uncertainty can be used to control the computational effort invested in the approximation. Our algorithm has linear cost in the data-set size, and a simple formal form, so that it can be implemented with a few lines of code in programming languages with linear algebra functionality.

Matrix determinants play an important role in data analysis, in particular when Gaussian processes are involved. Due to currently exploding data volumes, linear operations—matrices—acting on the data are often not accessible directly but are only represented indirectly in form of a computer routine. Such a routine implements the transformation a data vector undergoes under matrix multiplication. While efficient probing routines to estimate a matrix’s diagonal or trace, based solely on such computationally affordable matrix-vector multiplications, are well known and frequently used in signal inference, there is no stochastic estimate for its determinant. We introduce a probing method for the logarithm of a determinant of a linear operator. Our method rests upon a reformulation of the log-determinant by an integral representation and the transformation of the involved terms into stochastic expressions. This stochastic determinant determination enables large-size applications in Bayesian inference, in particular evidence calculations, model comparison, and posterior determination.

This paper proposes a probabilistic framework for algorithms
that iteratively solve unconstrained linear problems Bx = b with positive
definite B for x. The goal is to replace the point estimates returned by
existing methods with a Gaussian posterior belief over the elements of the
inverse of B, which can be used to estimate errors. Recent probabilistic
interpretations of the secant family of quasi-Newton optimization algorithms
are extended. Combined with properties of the conjugate gradient algorithm,
this leads to uncertainty-calibrated methods with very limited cost overhead
over conjugate gradients, a self-contained novel interpretation of the
quasi-Newton and conjugate gradient algorithms, and a foundation for new
nonlinear optimization methods.

2012

Improving stochastic estimates with inference methods: Calculating matrix diagonals

Selig, Marco,
Oppermann, Niels,
and Enßlin, Torsten A.

Estimating the diagonal entries of a matrix, that is not directly accessible but only available as a linear operator in the form of a computer routine, is a common necessity in many computational applications, especially in image reconstruction and statistical inference. Here, methods of statistical inference are used to improve the accuracy or the computational costs of matrix probing methods to estimate matrix diagonals. In particular, the generalized Wiener filter methodology, as developed within information field theory, is shown to significantly improve estimates based on only a few sampling probes, in cases in which some form of continuity of the solution can be assumed. The strength, length scale, and precise functional form of the exploited autocorrelation function of the matrix diagonal is determined from the probes themselves. The developed algorithm is successfully applied to mock and real world problems. These performance tests show that, in situations where a matrix diagonal has to be calculated from only a small number of computationally expensive probes, a speedup by a factor of 2 to 10 is possible with the proposed method.