Gaussian Process-Based Modelling and Prediction of Image Time Series

be the 3-dimensional spatial coordinate system and t the temporal dimension. We consider the image time series I as a discretely sampled spatio-temporal signal of dimensions $N \times N \times N \times N_T$ , where N is the dimension of the sampling grid on a single spatial axis, and is the number of time points¹. In the following sections we represent the image time-series I as a single dimensional array of dimensions . We model the image time series I(u, t) as a realization of a latent spatio-temporal process f(u, t) with additive noise:

$\begin{aligned} I(u,t) = f(u,t) + \epsilon (u,t)\!\!\!. \end{aligned}$

(1)

The true signal will be modelled as a GP with zero mean and covariance $\varSigma$ , while $\epsilon$ is assumed to be i.i.d. Gaussian distributed measurement noise $\epsilon (u,t)\sim \mathcal {N}(0,\sigma ^2)$ . Here we first assume that spatial and temporal processes are separable, and thus that the covariance matrix $\varSigma$ can be factorised in the Kronecker product of independent spatial and temporal covariance matrices: $\varSigma = \varSigma _S\otimes \varSigma _T$ .

This is a valid modeling assumption when the temporal properties of the signal are similar across space; for instance, when analyzing within-subject time series of brain MRIs in AD the expected pathological change rates are generally mild and slowly varying across the brain. Second, a central assumption made in this paper is that the spatial dependencies of the signal are local, i.e. that the image intensities are smoothly varying and correlated within a spatial neighborhood of radius

. We note that our assumptions about separability and stationarity are compatible with the spatio-temporal correlation models commonly assumed by registration-based approaches.

A reasonable choice for such a local spatial covariance structure is a negative squared exponential model $\varSigma _S(u_1,u_2) = \lambda _s \exp (-\frac{\Vert u_1-u_2\Vert ^2}{2l_s})$ , where $\lambda _s$ is the global spatial amplitude parameter, and

is the length-scale of the Gaussian spatial neighborhood. We observe that such a covariance structure is stationary with respect to the space parameters. Furthermore we can exploit the separability properties of the negative exponential function to note that given two separate spatial locations

and

we have

$\begin{aligned} \varSigma _S(u_1,u_2) = \lambda _s \exp (-\frac{(x_1-x_2)^2}{2l_s})\exp (-\frac{(y_1-y_2)^2}{2l_s})\exp (-\frac{(z_1-z_2)^2}{2l_s})\!\!\!. \end{aligned}$

For this reason the covariance matrix $\varSigma _S$ can be further decomposed as the Kronecker product of covariance matrices of 1-dimensional processes: $\varSigma = K_x \otimes K_y \otimes K_z \otimes \varSigma _T$ . We observe that the model is here conveniently represented by the product of independent covariances of significantly smaller size, and is completely identified by the spatial, temporal and noise parameters. In particular the proposed model is flexible with respect to the temporal covariance matrix $\varSigma _T$ , which can be expressed in terms of complex mixed-effects structure, and can account for covariates and different progression models. For instance, in this work the matrix $\varSigma _T$ is first specified in order to model the temporal progression observed in time series of images (Sect. 4), and then is used to model the influence of anatomical, genetic, clinical, and sociodemographic covariates on individual atrophy rates modelled by non-linear registration (Sect. 5).

3 Inference in Gaussian Processes with Kronecker Structure

The GP-based generative model with Kronecker covariance structure outlined in this work provides a powerful and efficient framework for prediction using image time series. Here we provide the main results concerning the marginal likelihood computation, the hyper-parameter optimization and the posterior prediction.

Let $(U_{K_x},S_{K_x} = {\text {diag}}(\lambda _{1}^x, \ldots ,\lambda _{N}^x))$ and $(U_{T},S_{T} = {\text {diag}}(\lambda _{1}^t,\ldots ,\lambda _{N_T}^t))$ be the eigenvectors and eigenvalues associated to the one-dimensional spatial and temporal covariance matrices

and $\varSigma _T$ . This eigendecomposition problem can be easily and efficiently solved beforehand offline. We further introduce the shortform notation $\bigotimes A = {A_x}\otimes {A_y}\otimes A_z$ .

Log-Marginal Likelihood. The marginal likelihood of the model (1) is the following:

$\begin{aligned} \log \mathcal {L}&= - \frac{1}{2} \displaystyle \sum _{i,j,k,l} \log (\lambda _{i}^x\lambda _{j}^y\lambda _{k}^z\lambda _{l}^t + \sigma ^2) - \frac{1}{2}{V_I}^T (\bigotimes S_{K} \otimes S_T +\sigma ^2 Id)^{-1}V_I + const \!\!\!, \end{aligned}$

(2)

with $const = -\frac{N^3N_T}{2} \log (2\pi )$ , $V_I = {\text {vec}}\left[ (U_{K_z}^T \otimes U_{T}^T) \tilde{I} (U_{K_x} \otimes U_{K_y})\right]$ , and where $\tilde{I}$ is the matricization of I into a 2 dimensional matrix of dimension $N^2 \times NN_T$ , and $\lambda _i^{x},\lambda _j^{y},\lambda _k^{z}$ and $\lambda _l^{t}$ are the eigenvalues of respectively

and $\varSigma _t$ . The computation of the vector

requires the storage and multiplication of matrices of relatively small sizes, respectively $N^2 \times N^2$ , $N^2 \times NN_T$ and $NN_T\times NN_T$ . The product $(\bigotimes S_{K} \otimes S_T +\sigma ^2 Id)^{-1} V_I$ can be finally computed as the solution of the linear system $(\bigotimes S_{K} \otimes S_T +\sigma ^2 Id) \mathbf {X} = V_I$ , which is straightforward since $(\bigotimes S_{K} \otimes S_T +\sigma ^2 Id)$ is diagonal.

Hyperparameter Optimization. The derivative of the log-likelihood (2) with respect to the model parameters $\varvec{\theta }$ is:

$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}\varvec{\theta }}\log \mathcal {L}&= -\frac{1}{2} Tr\left( (\bigotimes {K} \otimes \varSigma _T +\sigma ^2 Id )^{-1} \frac{\mathrm {d}}{\mathrm {d}\varvec{\theta }} ( \bigotimes {K} \otimes \varSigma _T +\sigma ^2 Id ) \right) \nonumber \\&-\frac{1}{2} \frac{\mathrm {d}}{\mathrm {d}\varvec{\theta }} I^T (\bigotimes {K} \otimes \varSigma _T +\sigma ^2 Id)^{-1} I\!\!\!. \end{aligned}$

(3)

It can be shown that formula (3) can be efficiently computed with respect to each model parameters. For instance, the gradient with respect to the noise parameter can be expressed in the form:

$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}\sigma ^2}\log \mathcal {L} = -\frac{1}{2} \sum _{i,j,k,l} (\lambda _{i}^x\lambda _{j}^y\lambda _{k}^z\lambda _{l}^t + \sigma ^2)^{-1} + \frac{1}{2} {V_I}^T (\bigotimes S_{K} \otimes S_T +\sigma ^2 Id)^{-2} V_I\!\!\!. \end{aligned}$

(4)

Prediction. A major strength of a GP framework for image time series is that it easily enables probabilistic predictions based on given observations. The proposed generative model allows us to consider the predictive distributions of the latent spatio-temporal process at any testing locations

and timepoints

. Given image time series I(u, t), we now aim at predicting the image

at $N^*\times N_T^*$ testing coordinates $\{u^*,t^*\}$ . Let us define $\varSigma _{I,I^*} = \varSigma (u,t,u^*,t^*)$ the cross-covariance matrix of training and testing data, and $\varSigma _{I^*,I^*} = \varSigma (u^*,t^*,u^*,t^*)$ the covariance evaluated on the new coordinates. The joint GP model of training and testing data is:

$\begin{aligned} \begin{pmatrix}I(u,t)\\ I^*(u^*,t^*) \end{pmatrix}\sim & {} \mathcal {N}\left[ \left( \begin{array}{c} 0\\ 0 \end{array}\right) ,\left( \begin{array}{cc} \varSigma +\sigma ^2 Id&{} \varSigma _{I,I^*}\\ \varSigma _{I^*,I} &{} \varSigma _{I^*,I^*}+\sigma ^2 Id \end{array}\right) \right] \!\!\!\!\!, \end{aligned}$

(5)

and it can be easily shown that the posterior distribution of

conditioned on the observed time series I and parameters $\varvec{\theta }$ is [11]:

$\begin{aligned} I^*|I,\{u^*,t^*\},&\varvec{\theta }\sim \mathcal {N}\Big (\mu ^*, \varSigma ^*\Big ) \text {, where } \mu ^*=\varSigma _{I,I^*} \varSigma ^{-1} I \nonumber \\&\text {and }\;\,\varSigma ^* = \varSigma _{I^*,I^*} - \varSigma _{I,I^*} \varSigma ^{-1} \varSigma _{I^*,I} +\sigma ^2 Id\!\!. \end{aligned}$

(6)

From the practical perspective, we notice that by definition the new covariance matrices still have a Kronecker product form: $\varSigma _{I,I^*} = K_{x,x^*} \otimes K_{y,y^*} \otimes K_{z,z^*} \otimes \varSigma _{t,t^*}$ , and $\varSigma _{I^*,I^*} = K_{x^*,x^*} \otimes K_{y^*,y^*} \otimes K_{z^*,z^*} \otimes \varSigma _{t^*,t^*}$ . The predicted mean $\mu ^*$ at coordinates $\{u^*,t^*\}$ is then

$\begin{aligned} \mu ^* = \left( K_{x,x^*} U_{K_x} \otimes K_{y,y^*} U_{K_y}\otimes K_{z,z^*} U_{K_z} \otimes \varSigma _{t,t^*} U_T \right) (\bigotimes S_{K} \otimes S_T +\sigma ^2 Id)^{-1} V_I\!\!\!, \end{aligned}$