PET Reconstruction with Sparse Image Representation and Anatomical Priors

between two points i and j in the image domain, defined by

$\begin{aligned} D_{i,j} =&\ \sqrt{d_{f}^2 + \left( \frac{d_{s}}{S}\right) ^2 m^2}, \\ d_{s}=&\ \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2 + (z_i - z_j)^2}, \nonumber \\ d_{f} =&\ \sqrt{(f_i - f_j)^2}, \nonumber \end{aligned}$

(1)

where $d_{s}$ is the spatial Euclidean distance, $d_{f}$ is the image intensity similarity, and these two different measures are combined into a single one with $S= \root 3 \of {(N/K)}$ (N the number of voxels and K the number of supervoxels) being the mean spatial distance within a supervoxel as a normalisation factor, and m being a weight between the intensity similarity and spatial proximity. Note that the intensity similarity $d_{f}$ can be extended to include additional dimensions when there are a group of images or multi-channel information for clustering.

It can be seen that SLIC does not explicitly enforce connectivity, therefore in this work, connected-component labelling [16] was performed after SLIC supervoxel clustering to assign the originally disconnected groups of voxels within the same supervoxel to new supervoxels where all the voxels within the same supervoxel are spatially connected. Also the supervoxels generated by SLIC of extremely small size due to image noise were considered as “orphans” and were merged into the nearest supervoxels.

The over-segmentation generated by the supervoxel clustering leads to a sparse image representation when the number of supervoxels K is greatly smaller than the number of voxels N. Let $\mathbf {A}$ denote the representation matrix in the image domain, $\mathbf {A}$ is binary and sparse, which determines whether a voxel i belongs to a given supervoxel j, that is

$\begin{aligned} A_{i,j}=\bigg \{ \begin{array}{ccc} 1, &{} &{} i \in j \\ 0, &{} &{} i \notin j. \end{array} \end{aligned}$

Then from the supervoxel intensity values $\varvec{s}$ , the original image $\varvec{f}$ on the voxel grid can be established by $\varvec{f} = \mathbf {A}\varvec{s}$ where $\varvec{f}\in \mathbb {R}^{N \times 1}$ and $\varvec{s}\in \mathbb {R}^{K \times 1}$ . In PET reconstruction, using the image representation $\varvec{f} = \mathbf {A}\varvec{s}$ with a given $\mathbf {A}$ (not a square matrix) transfers the reconstruction of the original image $\varvec{f}$ to the estimation of $\varvec{s}$ with less number of the unknowns when

, without losing the image details preserved in $\mathbf {A}$ . The joint determination of $\mathbf {A}$ from both the anatomical prior images and the PET data will be discussed in Sect. 2.3. Notably, in contrary to embedding the anatomical information within a Bayesian reconstruction framework based on solely the image intensity such as joint entropy or mutual information, the proposed method avoids the potential bias by using the image geometry instead.

2.2 PET Reconstruction with Sparse Image Representation

The sparse image representation can be directly integrated into the forward model of PET reconstruction

$\begin{aligned} \bar{\varvec{g}} = \mathbf {P}\mathbf {A}\varvec{s} + \varvec{r}, \end{aligned}$

(2)

where $\bar{\varvec{g}}$ is the expected projection data, $\mathbf {P}$ is the system information matrix of the detection probabilities, and $\varvec{r}$ is the expected scatter and random events.

Within the maximum likelihood (ML) reconstruction framework, the estimate of the image $\varvec{f}$ (here $\varvec{f} = \mathbf {A}\varvec{s}$ ) is found by maximising the Poisson log likelihood [17]

$\begin{aligned} L(\varvec{g}| \bar{\varvec{g}})=\sum _i g_i\log \bar{g_i}-\bar{g_i} \end{aligned}$

(3)

with observed projection data $\varvec{g}$

$\begin{aligned} \hat{\varvec{s}} = \arg \max _{\varvec{s}\ge 0}L(\varvec{g}| \mathbf {A}\varvec{s}). \end{aligned}$

(4)

The iterative update to find the solution can be directly derived by the expectation-maximisation (EM) algorithm [8]

$\begin{aligned} \varvec{s}^{n+1}=\frac{\varvec{s}^{n}}{\mathbf {A}^{T}\mathbf {P}^{T}\mathbf {1}}\mathbf {A}^{T}\mathbf {P}^{T}\frac{\varvec{g}}{ \mathbf {P}\mathbf {A}\varvec{s}^n + \varvec{r}}, \end{aligned}$

(5)

where T denotes the matrix transpose and n denotes the iteration number.

With prior images, it is possible to have the sparse image representation matrix $\mathbf {A}$ defined on a denser voxel grid that does not match the PET imaging system characterised by $\mathbf {P}$ . Instead of downsampling the prior images to match the PET imaging resolution, in this work a resampling operator is introduced into the forward model to maintain the image at higher spatial resolution to avoid the loss of the edges and other image details. Let $\mathbf {R}$ denote the matrix form of the resampling operator in the image domain, then the forward model becomes $\bar{\varvec{g}} = \mathbf {P}\mathbf {R}\mathbf {A}\varvec{s} + \varvec{r}$ , and the iterative update becomes

$\begin{aligned} \varvec{s}^{n+1}=\frac{\varvec{s}^{n}}{\mathbf {A}^{T}\mathbf {R}^{T}\mathbf {P}^{T}\mathbf {1}}\mathbf {A}^{T}\mathbf {R}^{T}\mathbf {P}^{T}\frac{\varvec{g}}{ \mathbf {P}\mathbf {R}\mathbf {A}\varvec{s}^n + \varvec{r}}. \end{aligned}$

(6)

So far it has been demonstrated the use of the sparse image representation in reconstructing PET images. For dynamic PET data, directly reconstruct the parametric images from the raw projection data can achieve improved accuracy and robustness [9, 18, 19]. The sparse image representation is directly applicable to dynamic PET data as it is a linear operation in the image domain. For dynamic PET data, the sparse representation matrix $\mathbf {A}$ is consistent for all time frames, and in $\varvec{f} = \mathbf {A}\varvec{s}$ , $\varvec{f}$ and $\varvec{s}$ are expanded so that $\varvec{f}\in \mathbb {R}^{N \times nt}$ and $\varvec{s}\in \mathbb {R}^{K \times nt}$ where nt is the number of time frames. Using a linearised kinetic model [20], $\varvec{s}$ can be described as $\varvec{s}=\varvec{\theta }\mathbf {B}$ , where $\mathbf {B}\in \mathbb {R}^{nk \times nt}$ are the temporal basis functions and $\varvec{\theta }\in \mathbb {R}^{K \times nk}$ are the kinetic parameters for all supervoxels, with nk being the number of kinetic parameters for each supervoxel. The direct estimation of the kinetic parameters $\varvec{\theta }$ can be solved by applying the optimisation transfer technique [9] to obtain a closed-form update equation with improved convergence performance.

2.3 Aggregation of Multi-layer Supervoxels and Joint Clustering

A single layer of supervoxels provides a sparse representation of the image which is affected by the algorithm and parameters used to generated the over-segmentation. As suggested in [21], aggregation of multi-layer supervoxels generated by different algorithms with varying parameters can improve the performance of capturing the diverse and multi-scale visual features in a natural image. For PET reconstruction, to eliminate the bias introduced by a specific algorithm or parameter, the aggregation can be performed as an average of multiple PET images reconstructed from the same projection data and prior images using different over-segmentations generated by varying the supervoxel clustering algorithm and/or the parameters. In this work the multi-layer supervoxels were generated by varying the number of supervoxels N and the weight m between the intensity similarity and spatial proximity in the SLIC algorithm.

Only gold members can continue reading. Log In or Register to continue