Computing on Non-Linear Spaces for Computational Anatomy

by specifying neighboring points and tangent vectors, which allows to differentiate smooth functions on the manifold. The topology also impacts the global structure as it determines if there exists a connected path between two points. However, we need an additional structure to quantify how far away two connected points are: a distance. By restricting to distances which are compatible with the differential structure, we enter into the realm of Riemannian geometry. A Riemannian metric is defined by a continuous collection of scalar products 
$$\left < \:.\:\left \vert \:.\right.\right >_{p}$$
” src=”/wp-content/uploads/2016/09/A151032_1_En_8_Chapter_IEq2.gif”></SPAN> (or equivalently quadratic norms <SPAN id=IEq3 class=InlineEquation><IMG alt=) on each tangent space 
$$T_{p}\mathcal{M}$$
at point 
$$p$$
of the manifold. Thus, if we consider a curve on the manifold, we can compute at each point its instantaneous speed vector (this operation only involves the differential structure) and its norm to obtain the instantaneous speed (the Riemannian metric is needed for this operation). To compute the length of the curve, this value is integrated as usual along the curve. The distance between two points of a connected Riemannian manifold is the minimum length among the curves joining these points. The curves realizing this minimum are called geodesics. The calculus of variations shows that geodesics are the solutions of a system of second order differential equations depending on the Riemannian metric. In the following, we assume that the manifold is geodesically complete, i.e. that all geodesics can be indefinitely extended. This means that the manifold has neither boundary nor any singular point that we can reach in a finite time. As an important consequence, the Hopf-Rinow-De Rham theorem states that there always exists at least one minimizing geodesic between any two points of the manifold (i.e. whose length is the distance between the two points).




2.2 Exponential chart


Let 
$$p$$
be a point of the manifold that we consider as a local reference and 
$$\vec{v}$$
a vector of the tangent space 
$$T_{p}\mathcal{M}$$
at that point. From the theory of second order differential equations, we know that there exists one and only one geodesic 
$$\gamma _{(p,\vec{v})}(t)$$
starting from that point with this tangent vector. This allows to wrap the tangent space onto the manifold, or equivalently to develop the manifold in the tangent space along the geodesics (think of rolling a sphere along its tangent plane at a given point), by mapping to each vector 
$$\vec{v} \in T_{p}\mathcal{M}$$
the point 
$$q$$
of the manifold that is reached after a unit time by the geodesic 
$$\gamma _{(p,\vec{v})}(t)$$
. This mapping 
$$\mbox{ Exp}_{p}(\vec{v}) =\gamma _{(p,\vec{v})}(1)$$
is called the exponential map at point 
$$p$$
. Straight lines going through 0 in the tangent space are transformed into geodesics going through point 
$$p$$
on the manifold and distances along these lines are conserved (Fig. 1).

A151032_1_En_8_Fig1_HTML.gif


Fig. 1
Left: The tangent planes at points p and q of the sphere 
$$\mathcal{S}_{2}$$
are different: the vectors v and w of 
$$T_{p}\mathcal{M}$$
cannot be compared to the vectors 
$$t$$
and u of 
$$T_{q}\mathcal{M}$$
. Thus, it is natural to define the scalar product on each tangent plane. Right: The geodesics starting at 
$$x$$
are straight lines in the exponential map and the distance along them is conserved

The exponential map is defined in the whole tangent space 
$$T_{p}\mathcal{M}$$
(since the manifold is geodesically complete) but it is generally one-to-one only locally around 0 in the tangent space (i.e. around 
$$p$$
in the manifold). In the sequel, we denote by 
$$\overrightarrow{pq} = \mbox{ Log}_{p}(q)$$
the inverse of the exponential map: this is the smallest vector (in norm) such that 
$$q = \mbox{ Exp}_{p}(\overrightarrow{pq})$$
. If we look for the maximal definition domain, we find out that it is a star-shaped domain delimited by a continuous curve 
$$C_{p}$$
called the tangential cut-locus. The image of 
$$C_{p}$$
by the exponential map is the cut locus 
$$\mathcal{C}_{p}$$
of point 
$$p$$
. This is (the closure of) the set of points where several minimizing geodesics starting from 
$$p$$
meet. On the sphere 
$$\mathcal{S}_{2}(1)$$
for instance, the cut locus of a point 
$$p$$
is its antipodal point and the tangential cut locus is the circle of radius 
$$\pi$$
.

The exponential and log maps within this domain realize a chart (a local parameterization of the manifold) called the exponential chart at point 
$$p$$
. It covers all the manifold except the cut locus of the reference point 
$$p$$
, which has a null measure. In this chart, geodesics starting from 
$$p$$
are straight lines, and the distance from the reference point are conserved. This chart is somehow the “most linear” chart of the manifold with respect to the reference point 
$$p$$
. The set of all the exponential charts at each point of the manifold realize an atlas which allows working very easily on the manifold, as we will see in the following.


2.3 Practical implementation


The exponential and logarithmic maps (from now on Exp and Log maps) are obviously different for each manifold and for each metric. Thus they have to be determined and implemented on a case by case basis. Example for rotations, rigid body transformations can be found for the left invariant metric in [60], and examples for tensors in [4, 58]. Exponential charts constitute very powerful atomic functions in terms of implementation on which we will be able to express practically all the geometric operations: the implementation of 
$$\mbox{ Log}_{p}$$
and 
$$\mbox{ Exp}_{q}$$
is the basis of programming on Riemannian manifolds, as we will see in the following.

In a Euclidean space, the exponential charts are nothing but one orthonormal coordinates system translated at each point: we have in this case 
$$\overrightarrow{pq} = \mbox{ Log}_{p}(q) = q - p$$
and 
$$\mbox{ Exp}_{p}(\vec{v}) = p +\vec{ v}$$
. This example is more than a simple coincidence. In fact, most of the usual operations using additions and subtractions may be reinterpreted in a Riemannian framework using the notion of bi-point, an antecedent of vector introduced during the 19th Century. Indeed, vectors are defined as equivalent classes of bi-points in a Euclidean space. This is possible because we have a canonical way (the translation) to compare what happens at two different points. In a Riemannian manifold, we can still compare things locally (by parallel transportation), but not any more globally. This means that each “vector” has to remember at which point of the manifold it is attached, which comes back to a bi-point.

A second way to see the vector 
$$\overrightarrow{pq}$$
is as a vector of the tangent space at point 
$$p$$
. Such a vector may be identified to a point on the manifold using the exponential map 
$$q = \mbox{ Exp}_{p}(\overrightarrow{pq})$$
. Conversely, the logarithmic map may be used to map almost any bi-point 
$$(p,q)$$
into a vector 
$$\overrightarrow{pq} = \mbox{ Log}_{p}(q)$$
of 
$$T_{p}\mathcal{M}$$
. This reinterpretation of addition and subtraction using logarithmic and exponential maps is very powerful to generalize algorithms working on vector spaces to algorithms on Riemannian manifolds, as illustrated in Table 1 and the in following sections.


Table 1
Re-interpretation of standard operations in a Riemannian manifold



































 
Euclidean space

Riemannian manifold

Subtraction


$$\overrightarrow{pq} = q - p$$


$$\overrightarrow{pq} = \mbox{ Log}_{p}(q)$$

Addition


$$p = q +\vec{ v}$$


$$q = \mbox{ Exp}_{p}(\vec{v})$$

Distance


$$\:\mbox{ dist}(p,q) =\| q - p\|$$


$$\:\mbox{ dist}(p,q) =\|\overrightarrow{ pq}\|_{p}$$

Mean value (implicit)


$$\sum _{i}(p_{i} -\bar{ p}) = 0$$


$$\sum _{i}\overrightarrow{\bar{p}p_{i}} = 0$$

Gradient descent


$$p_{t+\varepsilon } = p_{t} -\varepsilon \overrightarrow{\nabla C(p_{t})}$$


$$p_{t+\varepsilon } = \mbox{ Exp}_{p_{t}}(-\varepsilon \overrightarrow{\nabla C(p_{t})})$$

Geodesic interpolation


$$p(t) = p_{0} + t\:\overrightarrow{p_{0}p_{1}}$$


$$p(t) = \mbox{ Exp}_{p_{0}}(t\:\overrightarrow{p_{0}p_{1}})$$


2.4 Example of Metrics on Covariance matrices


Let us take an example with positive definite symmetric matrices, called tensors in medical image analysis. They are used for instance to encode the covariance matrix of the Brownian motion (diffusion) of water in Diffusion Tensor Imaging (DTI) [8, 40] or to encode the joint variability at different places (Green function) in shape analysis (see [2931] and Sect. 4). They are also widely used in image analysis to guide the segmentation, grouping and motion analysis [16, 47, 73, 74].

The main problem is that the tensor space is a manifold that is not a vector space with the usual additive structure. Indeed, the positive definiteness constraint delimits a convex half-cone in the vector space of symmetric matrices. Thus, convex operations (like the mean) are stable in this space but problems arise with more complex operations. For instance, there is inevitably a point in the image where the time step is not small enough when smoothing fields of tensors with gradient descents, and this results into negative eigenvalues.

To answer that problem, we proposed in [58] to endow the space of tensors with a Riemannian metric invariant by any change of the underlying space coordinates, i.e. invariant under the action of affine transformations on covariance matrices. A few mathematical developments showed that the Exp, Log and distance maps were given with quite simple formulas involving the matrix logarithm exp and log:



$$\displaystyle\begin{array}{rcl} \mbox{ Exp}_{\varSigma }(W) =\varSigma ^{1/2}\exp \left (\varSigma ^{-1/2}W\varSigma ^{-1/2}\right )\varSigma ^{1/2}& & {}\\ \mbox{ Log}_{\varSigma }(\varLambda ) =\varSigma ^{1/2}\log \left (\varSigma ^{-1/2}\varLambda \varSigma ^{-1/2}\right )\varSigma ^{1/2}& & {}\\ \:\mbox{ dist}^{2}(\varSigma,\varLambda ) = \mbox{ Tr}\left (\log (\varSigma ^{-1/2}\varLambda \varSigma ^{-1/2})^{2}\right )& & {}\\ \end{array}$$

This metric leads to a very regular Hadamard manifold structure, a kind of hyperbolic space without cut-locus, which simplifies the computations. Tensors with null and infinite eigenvalues are both at an infinite distance of any positive definite symmetric matrix: the cone of positive definite symmetric matrices is changed into a space of “constant” (homogeneous) non-scalar curvature without boundaries. Moreover, there is one and only one geodesic joining any two tensors, the mean of a set of tensors is uniquely defined, and we can even define globally consistent orthonormal coordinate systems of tangent spaces. Thus, the structure we obtain is very close to a vector space, except that the space is curved.

This affine-invariant Riemannian metric derives from affine invariant connections on homogeneous spaces [54]. It has been introduced in statistics to model the geometry of the multivariate normal family (the Fisher information metric) [18, 19, 64] and in simultaneously by many teams in medical image analysis to deal with DTI [9, 33, 42, 52, 58]. In [58], we showed that this metric could be used not only to compute distances between tensors, but also as the basis of a complete computational framework on manifold-valued images as will be detailed in Sect. 3.

By trying to put a Lie group structure on the space of tensors, Vincent Arsigny observed later that the matrix exponential was a diffeomorphism from the space of symmetric matrices to the tensor space. Thus, one can seamlessly transport all the operations defined in the vector space of symmetric matrices to the tensor space [4, 5]. This gives a commutative Lie group structure to the tensors, and the Euclidean metric on symmetric matrices is transformed into a bi-invariant Riemannian metric on the tensor manifold. As geodesics are straight lines in the space of symmetric matrices, the expression of the Exp, Log and distance maps for the Log-Euclidean metric is easily determined:



$$\displaystyle\begin{array}{rcl} \mbox{ Exp}_{\varSigma }(W) =\exp (\log (\varSigma ) + \partial _{W}\log (\varSigma ))& & {}\\ \mbox{ Log}_{\varSigma }(\varLambda ) = D\exp (\log (\varSigma ))\left (\log (\varLambda ) -\log (\varSigma )\right )& & {}\\ \:\mbox{ dist}_{LE}^{2}(\varSigma _{ 1},\varSigma _{2}) = \mbox{ Tr}\left ((\log (\varSigma _{1}) -\log (\varSigma _{2}))^{2}\right )& & {}\\ \end{array}$$

These formulas look more complex than for the affine invariant metric because they involve the differential of the matrix exponential and logarithm in order to transport tangent vectors from one space to another [59]. However, they are in fact nothing but the transport of the addition and subtraction through the exponential of symmetric matrices. In practice, the log-Euclidean framework consist in taking the logarithm of the tensor data, computing like usual in the Euclidean space of symmetric matrices, and coming back at the end to the tensor space using the exponential [3, 5].

From a theoretical point of view, geodesics through the identity are the same for both log-Euclidean and affine-invariant metrics, but this is not true any more in general at other points of the tensor manifold [4]. A careful comparison of both metrics in practical applications [3, 5] showed that there was very few differences on the results (of the order of 1%) on real DTI images, but that the log-Euclidean computations where 4 to 10 times faster. For other types of applications, like adaptive re-meshing [53], the anisotropy of the tensors can be much larger, which may lead to larger differences. In any case, initializing the iterative optimizations of affine-invariant algorithms with the log-Euclidean result drastically speeds-up the convergence. Important application example of this tensor computing framework were provided in [27, 28] with a statistically grounded estimation and regularization of DTI images. The white matter tractography that was allowed by these methods in clinical DTI images with very poor signal to noise ratios could lead to new clinical indications of DTI, for instance in the spinal chord [23].



3 Statistical Computing on Manifolds


The previous section showed how to derive the atomic Exp and Log maps from a Riemannian metric. We now summarize in this section how one generalizes on this basis many important statistical notions, like the mean, covariance and Principal Component Analysis (PCA), as well as many image processing algorithms like interpolation, diffusion and restoration of missing data (extrapolation). For details about the theory of statistics on Riemannian manifolds in itself, we refer the reader to [56, 57] and reference therein. Manifold-valued image processing is detailed in [58] with the example of tensors.


3.1 First statistical moment: the mean


The Riemannian metric induces an infinitesimal volume element on each tangent space, and thus a reference measure 
$$d\mathcal{M}(p)$$
on the manifold that can be used to measure random events on the manifold and to define the probability density function (the function ρ such that 
$$dP(p) =\rho (p)d\mathcal{M}(p)$$
, if it exists). It is worth noticing that the induced measure 
$$d\mathcal{M}$$
represents the notion of uniformity according to the chosen Riemannian metric. This automatic derivation of the uniform measure from the metric gives a rather elegant solution to the Bertrand paradox for geometric probabilities [38, 62]. With the probability measure of a random element, we can integrate functions from the manifold to any vector space, thus defining the expected value of this function. However, we generally cannot integrate manifold-valued functions. Thus, one cannot define the mean or expected “value” of a random manifold element that way.

One solution is to rely on a distance-based variational formulation: the Fréchet (resp. Karcher) expected features minimize globally (resp. locally) the variance:



$$\displaystyle{\sigma ^{2}(q) =\int \: \mbox{ dist}(p,q)^{2}\:dP(p) = \frac{1} {n}\sum _{i=1}^{n}\:\mbox{ dist}(p_{ i},q)^{2},}$$
written respectively in the continuous and discrete forms. One can generalize the variance to a dispersion at order α by changing the L 2 with an α-norm: 
$$\sigma _{\alpha }(p) = (\int \:\mbox{ dist}(p,q)^{\alpha }dP(p))^{1/\alpha }$$
. The minimizers are called the central Karcher values at order α. For instance, the median is obtained for α = 1 and the modes for α = 0, exactly like in the vector case. It is worth noticing that the median and the modes are not unique in general in the vector space, and that even the mean may not exists (e.g. for heavy tailed distribution). In Riemannian manifolds, the existence and uniqueness of all central Karcher values is generally not ensured as they are obtained through a minimization procedure. However, for a finite number of discrete samples at a finite distance of each other, which is the practical case in statistics, a mean value always exists and it is unique as soon as the distribution is sufficiently peaked [37, 39].

Local minima may be characterized as particular critical points of the cost function: at Karcher mean points, the gradient of the variance should be null. However, the distance is continuous but not differentiable at cut locus points where several minimizing geodesic meets. For instance, the distance from a point of the sphere to its antipodal point is maximal, but decrease continuously everywhere around it. One can show [56, 57] that the variance it differentiable at all points where the cut locus has a null measure and has gradient: 
$$\overrightarrow{\nabla \sigma ^{2}(}q) = -2\int \overrightarrow{qp}\:dP(p) = \frac{-2} {n} \sum _{i=1}^{n}\overrightarrow{qp_{ i}}$$
respectively in the continuous (probabilistic) and discrete (statistical) formulations. In practice, this gradient is well defined for all distributions that have a pdf since the cut locus has a null measure. For discrete samples, the gradient exists if there is no sample lying exactly on the cut-locus of the current test point. Thus, we end up with the implicit characterization of Karcher mean points as exponential barycenters which was presented in Table 1.

To practically compute the mean value, we proposed in [60] for rigid body transformations and in [56, 57] for the general Riemannian case to use a Gauss-Newton gradient descent algorithm. It essentially alternates the computation of the barycenter in the exponential chart centered at the current estimation of the mean value, and a re-centering step of the chart at the point of the manifold that corresponds to the computed barycenter (geodesic marching step). This gives the Newton iteration: 
$$\bar{p}^{t+1} = \mbox{ Exp}_{\bar{p}^{t}}\left ( \frac{1} {n}\sum _{i=1}^{n}\overrightarrow{\bar{p}^{t}p_{ i}}\right )$$
. One can actually show that its convergence is locally quadratic towards non degenerated critical points [22, 43, 55].


3.2 Covariance matrix and Principal Geodesic Analysis


Once the mean point is determined, using the exponential chart at the mean point is particularly interesting as the random feature is represented by a random vector with null mean in a star-shaped domain. With this representation, there is no difficulty to define the covariance matrix:



$$\displaystyle{\varSigma =\int \overrightarrow{\bar{p}q}.\overrightarrow{\bar{p}q}^{\text{T}}\:dP(q) = \frac{1} {n}\sum _{i=1}^{n}\overrightarrow{\bar{p}q_{ i}}.\overrightarrow{\bar{p}q_{i}}^{\text{T}}}$$
and potentially higher order moments. This covariance matrix can then be used to defined the Mahalanobis distance between a random and a deterministic feature: 
$$\mu _{(\bar{p},\varSigma )}(q) =\overrightarrow{ \bar{p}q}^{\text{T}}\varSigma ^{\text{(-1)}}\overrightarrow{\bar{p}q}$$
. Interestingly, the expected Mahalanobis distance of a random element is independent of the distribution and is equal to the dimension of the manifold, as in the vector case. This statistical distance can be used as a basis to generalize some statistical tests such as the mahalanobis D 2 test [57].

To analyze the results of a set of measurements in a Euclidean space, one often performs a principal component analysis (PCA). A generalization to Riemannian manifolds called Principal Geodesic Analysis (PGA) was proposed in [32] to analyze shapes based on the medial axis representations (M-reps). The basic idea is to find a low dimensional sub-manifold generated by some geodesic subspaces that best explain the measurements (i.e. such that the squared Riemannian distance from the measurements to that sub-manifold is minimized). Another point of view is to assume that the measurements are generated by a low dimensional Gaussian model. Estimating the model parameters amounts to a covariance analysis in order to find the k-dimensional subspace that best explains the variance. In a Euclidean space, these two definitions correspond thanks to Pythagoras’s theorem. However, in the Riemannian setting, geodesic subspaces are generally not orthogonal due to the curvature. Thus, the two notions differ: while the Riemannian covariance analysis (tangent PCA) can easily be performed in the tangent space of the mean, finding Riemannian sub-manifolds turns out to become a very difficult problem. As a matter of fact, the solution retained by [32] was finally to rely on the covariance analysis.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Sep 16, 2016 | Posted by in GENERAL RADIOLOGY | Comments Off on Computing on Non-Linear Spaces for Computational Anatomy

Full access? Get Clinical Tree

Get Clinical Tree app for offline access