Manifolds for Matching Disparate Medical Image Datasets

(e.g. a time series of N images with D pixels each) lies on a manifold $\mathcal {M}$ which can be described by much fewer dimensions. Manifold learning techniques map each point $\mathbf {x_i}$ in the high-dimensional dataset to a low-dimensional point $\mathbf {y_i} \in \mathbb {R}^d$ , where $d \ll D$ , while preserving the local neighbourhood structure of the high-dimensional points on $\mathcal {M}$ . The low-dimensional points form a new low-dimensional dataset $Y \in \mathbb {R}^{d\times N}$ .

LLE models the locality of the high-dimensional space by first constructing a graph of the data points and then calculating the weights which reconstruct each high-dimensional point as a linear combination of its k-nearest neighbours. The optimal contribution of each point j to the reconstruction of i is given by the weights $W_{ij}$ and can be calculated as described in [11]. The embedding preserving this structure is given by the Y minimising the cost function

$\begin{aligned} \phi (Y) = \sum _i || \mathbf {y}_i - \sum _{j\in \eta (i)} W_{ij} \mathbf {y}_j ||^2 = Tr(YMY^T). \end{aligned}$

(1)

Here, $\eta (i)$ is the neighbourhood of each point i and $Tr(\cdot )$ is the trace operator. This optimisation problem can be solved by calculating the eigendecomposition of the centred reconstruction weight matrix

. The embedding Y is given by the eigenvectors corresponding to the second smallest to

smallest eigenvalues of M [11].

It is well known that when manifold learning is applied to time series of medical images under free breathing, the low dimensional coordinates $\mathbf {y}_i$ contain useful information about the motion. In particular, when reducing the data to a one dimensional space (

) the embedding strongly correlates with a respiratory navigator obtained by traditional means [14]. Baumgartner et al. [1] showed that embeddings in higher dimensions can give additional information on respiratory variation such as hysteresis or respiratory drift over time.

2.2 Manifold Alignment by Simultaneous Embedding

Medical image datasets which have the same source of motion often lie on similar low-dimensional manifolds. Within the context of this work the different datasets are free breathing images acquired from different views, i.e. different slice positions (MR) or different probe locations (US). Bringing the manifolds into a globally consistent coordinate system allows the matching of images at similar respiratory positions between the different views.

Unfortunately, the embeddings obtained using manifold learning techniques are arbitrary with respect to the sign of the eigenvectors (i.e. arbitrary flipping). In addition, the manifolds obtained from different datasets often feature small but significant differences in shape such as small rotations or scaling differences, which further complicate their alignment. In order to relate the respiratory positions of L different datasets $X_1,\dots , X_L : X_{\ell } = [\mathbf {x^{(\ell )}_1},\dots ,\mathbf {x^{(\ell )}_N} ]$ (e.g. free breathing images obtained from L different views) in the low-dimensional space, first the underlying manifold embeddings $Y_1,\dots , Y_L$ must be aligned.

Our solution to the alignment problem is to formulate a joint cost function and then compute an embedding for all datasets simultaneously in one embedding step. This approach is the general case for L datasets of the formulation proposed in [1] for two datasets. However, to the best of our knowledge, LLE has not been extended to embed

$$L>2$$” src=”/wp-content/uploads/2016/09/A339424_1_En_28_Chapter_IEq17.gif”></SPAN> datasets simultaneously. This can be done by augmenting the cost function from Eq. (<SPAN class=InternalRef><A href=

1) as follows:

$\begin{aligned} \phi _{tot}(Y_1,...,Y_L) = \sum _{\ell =1}^L \phi _{\ell }(Y_{\ell }) + \frac{\mu }{2}\sum ^L_{ \begin{array}{c} n=1,m=1 \\ m \ne n \end{array} } \sum _{i,j}^{N} U_{ij}^{(nm)} ||\mathbf {y}^{(n)}_{i} - \mathbf {y}^{(m)}_{j}||^2, \end{aligned}$

(2)

where $\phi _{\ell }$ is given by Eq. (1). The second term of Eq. (2) is the cost function relating all datasets to each other. $U^{(nm)}$ is a similarity kernel relating points from dataset n to dataset m. Large values in $U^{(nm)}$ force corresponding points to be close together in the resulting embedding. $\mu$ is a weighting parameter which governs the influence of the inter-dataset terms. $\phi _{tot}$ can be rewritten in matrix form as $\phi _{tot} = Tr(VHV^T)$ , where V is a $d\times (N\cdot L)$ matrix containing the concatenated embeddings, $V = [Y_1,\dots ,Y_L]$ , and

$\begin{aligned} H = \left[ \begin{array}{cccc} M^{(1)} + \mu {\sum }_p D^{(1p)} &{} -\mu U^{(12)} &{} \ldots &{} -\mu U^{(1L)} \\ -\mu U^{(21)} &{} M^{(2)} + \mu {\sum }_p D^{(2p)} &{} \ldots &{} -\mu U^{(2L)} \\ \vdots &{} \ &{} \ddots &{} \vdots \\ -\mu U^{(L1)} &{} -\mu U^{(L2)} &{} \ldots &{} M^{(L)} + \mu {\sum }_p D^{(Lp)} \end{array} \right] \end{aligned}$

(3)

Here, the diagonal degree matrices $D^{(nm)}$ are given by $D_{ii}^{(nm)} = \sum _j U_{ij}^{(nm)}$ . Under the scaling constraint $\sum _{\ell =1}^L {Y_{\ell }}^T Y_{\ell } = 1$ , the embeddings V are given by the second smallest to

smallest eigenvectors of H.

2.3 Similarity Without Correspondences

All previous work on simultaneous embeddings has either used known prior correspondences to define the similarity kernels $U^{(nm)}$ [5], or has used similarity measures between the high-dimensional data to define it [1, 8]. The notable exception is [15] which we discussed in the introduction. For many applications, however, it is desirable to simultaneously embed medical image datasets which are not comparable in image space and for which prior correspondences are not easily obtainable. This problem is effectively one of finding a suitable inter-dataset similarity kernel $U^{(nm)}$ that connects datasets

but which is not based on comparisons between datasets in the high-dimensional space.

In this paper we propose a novel, robust method for deriving an inter-dataset similarity kernel. Instead of directly comparing the data from different datasets in the high-dimensional space we first analyse the internal graph structure of each dataset and derive a characteristic feature vector for each node. In the next step we compare these feature vectors across datasets. This approach is an extension of the method proposed in [3] for graph matching.

Deriving a Characteristic Feature Vector for Each Node. For each dataset X we form a fully connected, weighted graph $\mathcal {G} = \{V,E,C \}$ , connecting each node in V to every other with edges E. The edge weights $C_{ij}$ connecting node

are given by a Gaussian kernel

$C_{ij} = e^\frac{-||\mathbf {x}_i-\mathbf {x}_j||^2}{2\sigma _1^2},$

where $\mathbf {x}_i,\mathbf {x}_j$ are the high-dimensional data points of X, and $\sigma _1$ is a shape parameter. Note that $C_{ij}$ is formed by intra-dataset comparisons only.

As noted in [3], the steady-state distribution of a (lazy) random walk, i.e. the probability $\mathbf {p}_i$ of ending up at a particular node

after $t \rightarrow \infty$ time steps, can provide information about each node’s location in the graph. In a lazy random walk the walker remains at their current node

with probability 0.5 at each time step and walks to a random neighbour

with a probability given by the edge weights $C_{ij}$ the other half of the time. The state of the random walk in a weighted graph after

time steps is given by $\mathbf {p}^{(t+1)} = \frac{1}{2}(I + CB^{-1})\mathbf {p}^{(t)}$ , where B is the diagonal degree matrix given by $B_{ii} = \sum _j C_{ij}$ . The steady state distribution is stationary and is given by

$\begin{aligned} \mathbf {p}^{*} = \frac{B\mathbf {1}}{\mathbf {1}^TB\mathbf {1}}. \end{aligned}$

(4)

In [3], $\mathbf {p}^*$ was used to perform graph matching. In our experiments, $\mathbf {p}^*$ alone was not sufficiently robust to align our datasets. Therefore, to increase the robustness of this measure we propose to additionally analyse the neighbourhood of each node. For each