of 3D Vertebral Models from a Single 2D Lateral Fluoroscopic Image

be m (m = 39) members of the aligned training surface models. Each member is described by a vector X_i with N (N = 5000) vertices:

$\text{x}_{i} = \{ x_{0} ,y_{0} ,z_{0} ,x_{1} ,y_{1} ,z_{1} , \ldots ,x_{N - 1} ,y_{N - 1} ,z_{N - 1} \}$

(1)

A PDM was then obtained by applying principal component analysis [24] to the aligned training surface models:

$\begin{aligned} {\mathbf{D}} = & ((m - 1)^{ - 1} ) \cdot \sum\limits_{i = 0}^{m - 1} {(\text{x}_{i} - {\bar{\mathbf{x}}})(\text{x}_{i} - {\bar{\mathbf{x}}})^{T} } \\ P = & ({\mathbf{p}}_{0} ,{\mathbf{p}}_{1} , \ldots );\quad {\mathbf{D}} \cdot {\mathbf{p}}_{k} = \sigma_{k}^{2} \cdot {\mathbf{p}}_{k} \\ \end{aligned}$

(2)

where ${\bar{\mathbf{x}}}$ and ${\mathbf{D}}$ are the mean vector and the covariance matrix of the PDM, respectively. $\{ \sigma_{k}^{2} \}$ are non-zero eigenvalues of the covariance matrix ${\mathbf{D}}$ , and $\{ {\mathbf{p}}_{k} \}$ are the corresponding eigenvectors. The descendingly sorted eigenvalues $\sigma_{k}^{2}$ and the corresponding eigenvector ${\mathbf{p}}_{k}$ are the principal directions spanning a shape space with ${\bar{\mathbf{x}}}$ representing its origin. Figure 1 shows the variability captured by the first two modes of variations of the PDM.

Fig. 1

The first two principal modes of variation of the PDM used in this investigation. The shape instances from left to right at each row were generated by evaluating ${\bar{\mathbf{x}}} + \alpha \sigma_{k} {\mathbf{p}}_{k} ,$ $\alpha \in \{ - 2, - 1,1,2\}$ with (a) Posterior view of the PDM (k = 1: the 1st row; k = 2: the 2nd row); and (b) Lateral view of the PDM (k = 1: the 1st row; k = 2: the 2nd row)

3 Statistically Deformable 2D/3D Reconstruction

Without loss of generality, here we assume that the input image is calibrated and image distortion is corrected. For more details about fluoroscopic image calibration, we refer to our previous work [25]. Thus, for a pixel in the input image we can always find a projection ray emitting from the focal point of the image through the pixel.

The single image based surface model reconstruction technique proposed in this paper is based on a hybrid 2D/3D deformable registration process coupling a landmark-based scaled rigid registration with an adapted SSM-based 2D/3D reconstruction algorithm [20, 21]. Different from the situation in our previous works [20, 21], where two or more calibrated X-ray images were required as the input for a successful reconstruction, here only a single lateral fluoroscopic image is available. Similar to the situation when multiple images are used, the convergence of the single image based 2D/3D reconstruction also depends on the initialization and on the image contour extraction. Thus, in the following we focus on the image contour extraction and on a landmark-based scaled rigid registration for initializing the single image based 2D/3D reconstruction.

3.1 Image Contour Extraction

As a feature-based 2D/3D reconstruction approach, our technique requires a pre-requisite image contour extraction. Explicit and accurate contour extraction is a challenging task, especially when the shapes involved become complex or when the background of the image becomes complex. In this paper, we feel that it is a far better choice to provide the user with a tool that supports interactive segmentation but at the same time speeds up the tedious manual segmentation process and makes the results repeatable. This leads us to developing a semi-automatic segmentation tool.

Our semi-automatic segmentation tool is based on the Livewire algorithm introduced by Mortensen and Barrett [26]. In their paper, graph edges are defined as the connection of two 8-adjacent image pixels. A local cost function is assigned to the graph edges to weight their probability of being included in an optimal path. In this work, we use two static feature components to form this cost function. The first component $f_{G}$ is calculated from Canny edge detectors [27] at three different scales (the standard deviations of the Gaussian smoothing operator in these three scales are 1.0, 2.0, and 3.0, respectively) as follows.

Let’s denote the edges extracted by the Canny edge detector at three different scales as $E^{1} ({\mathbf{q}})$ , $E^{2} ({\mathbf{q}})$ , and $E^{3} ({\mathbf{q}})$ , respectively. $\{ E^{i} ({\mathbf{q}});\;i = 1,2,3\}$ are defined as follows: if pixel ${\mathbf{q}}$ is a detected edge pixel at the ith scale, then $E^{i} ({\mathbf{q}}) = 1$ ; otherwise, it equals to zero. Let’s further denote the gradient magnitudes at different scales as $G^{1} ({\mathbf{q}})$ , $G^{2} ({\mathbf{q}})$ , and $G^{3} ({\mathbf{q}})$ , respectively. Then we have,

$f_{G} ({\mathbf{q}}) = (1.0 - \frac{{G^{1} ({\mathbf{q}})}}{{\text{max} (G^{1} ({\mathbf{q}}))}} \cdot E^{1} ({\mathbf{q}})) + (1.0 - \frac{{G^{2} ({\mathbf{q}})}}{{\text{max} (G^{2} ({\mathbf{q}}))}} \cdot E^{2} ({\mathbf{q}})) + (1.0 - \frac{{G^{3} ({\mathbf{q}})}}{{\text{max} (G^{3} ({\mathbf{q}}))}} \cdot E^{3} ({\mathbf{q}}))$

(3)

According to Eq. (3), if ${\mathbf{q}}$ is not a detected edge pixel at the ith scale, a constant cost of 1.0 will be added to the cost function. Otherwise, the cost depends on the gradient magnitude: the bigger the magnitude, the smaller the cost.

The second component, the gradient direction $f_{D} ({\mathbf{p}},{\mathbf{q}})$ , is calculated according to the form proposed in the original paper [26], which is used to add a smoothness term to the contour definition by assigning high costs to sharp changes.

Finally, these two static features are combined by weighted summation to form a single statistic local cost function as follows

$l({\mathbf{p}},{\mathbf{q}}) = 0.6f_{G} ({\mathbf{q}}) + 0.4f_{D} (p,{\mathbf{q}})$

(4)

where the weights for these two terms are empirically determined.

Based on the Livewire algorithm, the semi-automatic contour extraction starts with a seed point, which is interactively placed by the user with a click of the left mouse button. During the extraction, the user can add more seed points by clicking the left mouse button. A click of the right mouse button will finish the definition of one contour. After that, clicking the left mouse button again starts the extraction of a new contour. Figure 2 shows an example of how the livewire segmentation technique is used to extract contours from the input image.

Fig. 2

Example of using livewire segmentation algorithm to extract image contours. The white crosses show where the user clicks the mouse button

3.2 Landmark-Based Scaled Rigid Registration for Initialization

Initialization here means to estimate the initial scale and the rigid transformation between the mean model of the PDM and the input fluoroscopic image. For this purpose, we have adopted an iterative landmark-to-ray scaled rigid registration. The four anatomical landmarks that we used here are the center of the top surface of the vertebra body, the center of the bottom surface of the vertebra body, the geometrical center of the vertebra body, and the center of the spinal process tip. Their positions on the mean model of the PDM are obtained through point picking or center calculation (the center of the vertebra body is computed as the center of four boundary landmarks along the anterior-posterior direction as shown in Fig. 3a, while their positions on the fluoroscopic image are defined through interactive picking (see Fig. 3a, b for details).

Fig. 3

Definition of initialization landmarks. a Landmarks extracted from the mean model of the PDM. b Landmarks extracted from the fluoroscopic images

Let us denote those landmarks defined on the mean model of the PDM, i.e., the vertebra body center, the center of the top surface of the vertebra body, the center of the bottom surface of the vertebra body and the center of the spinal process tip, as $v_{Mean}^{1}$ , $v_{Mean}^{2}$ , $v_{Mean}^{3}$ , and $v_{Mean}^{4}$ , respectively; and their corresponding landmarks interactively picked from the fluoroscopic image as $v_{\text{X-ray}}^{1}$ , $v_{\text{X-ray}}^{2}$ , $v_{\text{X-ray}}^{3}$ , and $v_{\text{X-ray}}^{4}$ , respectively. And for each X-ray landmark, we can calculate a projection ray emitting from the focal point to the landmark. We then calculate the length between $v_{Mean}^{1}$ and $v_{Mean}^{4}$ and denote it as $l_{Mean}^{1,4}$ . Using the known image scale, we also calculate the length $l_{\text{X-ray}}^{1,4}$ between $v_{\text{X-ray}}^{1}$ and $v_{\text{X-ray}}^{4}$ . Then, we do:

Data Preparation. In this step, we assume that the line connecting the centers of the vertebra body and the center of the spinal process tip is parallel to the input fluoroscopic image and is certain distance away from the imaging plane. Using this assumption and the correspondences between the landmarks defined in the CT volume and those from the fluoroscopic image, we can compute two points $\bar{v}_{\text{X-ray}}^{1}$ and $\bar{v}_{\text{X-ray}}^{4}$ on the projection rays of $v_{\text{X-ray}}^{1}$ and $v_{\text{X-ray}}^{4}$ , respectively (see Fig. 4a), which satisfy:

Fig. 4

Iterative landmark-to-ray registration. a Schematic view of data preparation. b Schematic view of finding 3D point pairs

$\begin{aligned} & \bar{v}_{\text{X-ray}}^{1} \bar{v}_{\text{X-ray}}^{4} //v_{\text{X-ray}}^{1} v_{\text{X-ray}}^{4} ;\;and \\ & |\bar{v}_{\text{X-ray}}^{1} - \bar{v}_{\text{X-ray}}^{4} | = l_{\text{X-ray}}^{1,4} \cdot \frac{F - d}{F} \\ \end{aligned}$

(3)

where “//” symbol indicates that the two lines are parallel; $$F$$

is the calibrated distance from the focal point to the imaging plane and $$d$$

is the assuming distance from the line connecting the center of the vertebra body and the center of the spinal process tip to the imaging plane.

The current scale s between the mean model and the input image is then estimated as,

$s = |\bar{v}_{\text{X-ray}}^{1} - \bar{v}_{\text{X-ray}}^{4} |/l_{Mean}^{1,4}$

(4)

Using s, we scale all landmark positions on the mean model and denote them as $\{ \bar{v}_{Mean}^{i} ;\;i = 1,2,3,4\}$ . We then calculate the distances from $\bar{v}_{Mean}^{2}$ and $\bar{v}_{Mean}^{3}$ to the line $\bar{v}_{Mean}^{1} \bar{v}_{Mean}^{4}$ and denote it as $\bar{l}_{Mean}^{2,1 - 4}$ and $\bar{l}_{Mean}^{3,1 - 4}$ , respectively.

Next we find two points, point $\bar{v}_{\text{X-ray}}^{2}$ on the projection ray of $v_{\text{X-ray}}^{2}$ whose distance to the line $\bar{v}_{\text{X-ray}}^{1} \bar{v}_{\text{X-ray}}^{4}$ is equal to $\bar{l}_{Mean}^{2,1 - 4}$ , and point $\bar{v}_{\text{X-ray}}^{3}$ on the projection ray of $v_{\text{X-ray}}^{3}$ whose distance to the line $\bar{v}_{\text{X-ray}}^{1} \bar{v}_{\text{X-ray}}^{4}$ is equal to $\bar{l}_{Mean}^{3,1 - 4}$ . A paired-point matching based on $\{ \bar{v}_{Mean}^{i} ;i = 1,2,3,4\}$ and $\{ \bar{v}_{\text{X-ray}}^{i} ;i = 1,2,3,4\}$ is used to calculate an updated scale $s_{0}$ and a rigid transformation $\bar{T}_{Mean}^{\text{X-ray}}$ (see Fig. 4a for details). From now on, we assume that all information defined in the mean model coordinate frame has been transformed into the fluoroscopic image coordinate frame using $s_{0}$ and $\bar{T}_{Mean}^{\text{X-ray}}$ . We denote the transformed mean model landmarks as $\{ \tilde{v}_{Mean}^{i} ;\;i = 1,2,3,4\}$ .