Improved Shape-Constrained Deformable Model for Segmentation of Vertebrae from CT Lumbar Spine Images

computed tomography (CT) images of the lumbar spine. The overall segmentation performance of $0.43\,{\pm }\,0.14$ mm in terms of mean symmetric absolute surface distance and $93.76\,{\pm }\,1.61\,\%$ in terms of Dice coefficient, computed against corresponding reference vertebra segmentations, indicates that the proposed framework can accurately segment vertebrae from CT images of the lumbar spine.

1 Introduction

Accurate and robust segmentation of spinal and vertebral structures from medical images is a challenging task due to a relatively high degree of anatomical complexity (i.e. vertebrae consisting of the vertebral body, pedicles, laminae and spinous process) and due to the articulation of vertebrae with each other. In addition to the complexity and articulation, the problem also lies in insufficient image spatial resolution, inhomogeneity and low signal-to-noise ratio. Since skeletal structures have high contrast when observed in computed tomography (CT) images, CT is commonly the modality of choice for assessing three-dimensional (3D) skeletal structures, such as the spine and vertebrae.

In recent years, several automated and semi-automated methods focusing on the vertebra segmentation problem have been developed for CT images. Kim and Kim [8] proposed a fully automated method that constructs 3D fences to separate vertebrae from valley-emphasized Gaussian images, and then the region growing algorithm is applied within 3D fences to obtain the final segmentation. Klinder et al. [9] progressively adapted tube-shaped segments to extract the spine curve, performed vertebra detection on curved-planar reformatted images using the generalized Hough transform, identified vertebrae by rigid registration of appearance models to the detected candidates, and obtained the final segmentation by adapting shape-constrained models of the individual vertebrae. Kadoury et al. [6, 7] built an articulated shape manifold from a training database by embedding the data into a low-dimensional sub-space, and applied the Markov random field optimization to infer between the unseen target shape and shape manifold. Lim et al. [10] incorporated local geometrical features using the Willmore flow and prior shape knowledge by kernel density estimation into a level set segmentation framework. Ma and Lu [12] introduced a learning-based bone structure edge detection algorithm and hierarchical coarse-to-fine deformable surface-based segmentation that relied on response maps of a trained edge detector. Rasoulian et al. [14] developed a statistical multi-vertebrae model of shape and pose, and proposed a novel iterative expectation maximization registration technique to align the model to CT images of the spine. Ibragimov et al. [5] presented a segmentation framework, in which a novel landmark-based shape representation of vertebrae was combined with a landmark detection framework based on game theory.

In this paper, we describe an improved framework for vertebra segmentation that is based on the shape-constrained deformable model [9, 15]. Our framework is initialized by the results of a novel vertebra detection and alignment algorithm [4], and the segmentation of each vertebra is then obtained by a mesh deformation technique that moves mesh vertices to their optimal locations while preserving the underlying vertebral shape. The performance of the proposed segmentation framework was evaluated on vertebrae from CT images of the lumbar spine, and the obtained results with the mean error below $$0.5$$

mm indicate that accurate segmentation of vertebrae was achieved.

2 Methodology

2.1 Mean Shape Model of the Lumbar Spine

Let set $\fancyscript{T}$ contain 3D images of the lumbar spine, where each image is assigned a series of binary masks representing reference segmentations of each individual lumbar vertebra from level L1 to L5. To extract a shape model of each vertebra from each image in $\fancyscript{T}$ , the marching cubes algorithm [11] is applied to each corresponding binary mask, resulting in a 3D face-vertex mesh consisting of vertices with triangle connectivity information. The dependency of the number of vertices in each mesh on the size of the image voxel and of the observed vertebra is removed by isotropic remeshing [1]. In order to establish pointwise correspondences among vertices of the same vertebral level, the nonrigid transformation among sets of vertices is recovered using state-of-the-art coherent point drift algorithm [13] that outperforms other methods for point set registration. Finally, the generalized Procrustes alignment [3] is used to remove translation, rotation and scaling from corresponding meshes, yielding the mean shape model of each vertebra, represented by a 3D face-vertex mesh $\fancyscript{M}=\big \{\fancyscript{V},\fancyscript{F}\big \}$ of $\left| \fancyscript{V}\right|$ vertices and $\left| \fancyscript{F}\right|$ faces (i.e. triangles). The mean shape model of the whole lumbar spine, i.e. a chain of mean shape models of individual vertebrae, is further used for spine detection, while the mean shape models of individual vertebrae are used for vertebra detection and segmentation in an unknown 3D image $$I$$

2.2 Vertebra Detection

The detection of vertebrae in an unknown 3D image $$I$$

was performed by a novel optimization scheme that is based on interpolation theory [4]. The optimization scheme consists of three steps: spine detection, vertebra detection and vertebra alignment. To detect the spine in image $$I$$

, the pose of mesh $\fancyscript{M}$ representing the mean shape model of the lumbar spine (i.e. a chain of meshes representing individual vertebrae from L1 to L5) is optimized against three translations (i.e. coordinates $$x$$

and

representing sagittal, coronal and axial anatomical directions, respectively). The global maximum of the resulting interpolation represents the location of the spine in the 3D image, and is further used to initialize the detection of individual vertebrae. To detect individual vertebrae, the pose of mesh $\fancyscript{M}$ , now representing the mean shape model of the observed lumbar vertebra, is optimized against three translations, however, in this case all local maxima of the resulting interpolation are extracted, corresponding to locations of the observed and neighboring vertebrae. The correct location of each vertebra is determined by the optimal path that passes through a set of locations, where each location belongs to a local maxima at a different vertebral level. Finally, a more accurate alignment of the mean shape model of each observed vertebra is performed by optimizing the pose of each model against three translations, one scaling (i.e. factor $$s$$

) and three rotations (i.e. angles $\varphi _x$ , $\varphi _y$ and $\varphi _z$ about coordinate axes $$x$$

and

, respectively). The resulting alignment represents the final vertebra detection result. A detailed description of the interpolation-based optimization scheme can be found in [4].

2.3 Vertebra Segmentation

After the interpolation-based alignment [4] of the mean shape model of each lumbar vertebra to the unknown image $$I$$

, segmentation of each lumbar vertebra is performed by a mesh deformation technique that moves mesh vertices to their optimal locations while preserving the underlying vertebral shape [9, 15]. In this iterative procedure, the following two steps are executed in each iteration: image object detection for mesh face centroids that are represented by the centers of mass for mesh faces $\fancyscript{F} \in \fancyscript{M}$ , followed by reconfiguration of mesh vertices $\fancyscript{V} \in \fancyscript{M}$ . By combining the robust initialization resulting from vertebra detection (Sect. 2.2) with modifications to the mesh deformation technique, we improve the accuracy of the resulting vertebra segmentation.

2.3.1 Object Detection

By displacing each mesh face centroid $\mathbf {c}_i$ ; $i=1,2,\ldots ,\left| \fancyscript{F}\right|$ along its corresponding mesh face normal $\mathbf {n}(\mathbf {c}_i)$ , a new candidate mesh face centroid $\mathbf {c}_i^{*}$ is found in each $$k$$

th iteration:

$\begin{aligned} \mathbf {c}_i^{*} = \mathbf {c}_i + \delta \, j_i^{*} \, \mathbf {n}(\mathbf {c}_i), \end{aligned}$

(1)

where $\delta$ is the length of the unit displacement, and $j_i^{*}$ is an element from set $\fancyscript{J}$ ; $j_i^{*} \in \fancyscript{J}$ . Set $\fancyscript{J}$ represents the search profile along $\mathbf {n}(\mathbf {c}_i)$ , called the sampling parcel (Fig.1):

$\begin{aligned} \fancyscript{J} = \Big \{ -j, -j+1, \ldots , j-1, j\Big \}; \quad j = J - k + 1, \end{aligned}$

(2)

which is of size $$2J+1$$

at initial iteration $$k=1$$

and

at final iteration $$k=K$$

. The element $j_i^{*}$ that defines the location of $\mathbf {c}_i^{*}$ is determined by detecting vertebra boundaries:

$\begin{aligned} j_i^{*} = \underset{j \in \fancyscript{J}}{\arg \max } \Big \{F\big (\mathbf {c}_i, \mathbf {c}_i + \delta \, j \, n(\mathbf {c}_i)\big ) - D \, \delta ^2 \, j^2 \Big \}. \end{aligned}$

(3)

where $\mathbf {c}_i'=\mathbf {c}_i+\delta \,j_i\,\mathbf {n}(\mathbf {c}_i)$ is the candidate location for $\mathbf {c}_i^{*}$ (Eq. 1), and parameter $$D$$

controls the tradeoff between the response of the boundary detection operator $$F$$

(Eq. 4) and the distance from $\mathbf {c}_i$ to $\mathbf {c}_i'$ . In comparison to the original approach [9, 15], we propose an improved boundary detection operator $$F$$

that is based on image intensity gradients, weighted by an image appearance operator:

$\begin{aligned} F(\mathbf {c}_i, \mathbf {c}_i') = \frac{g_{\text {max}}\left( g_{\text {max}} + \left||\mathbf {g}_W(\mathbf {c}_i')\right||\right) }{g_{\text {max}}^2 + \left||\mathbf {g}_W(\mathbf {c}_i')\right||^2}\left\langle \mathbf {n}(\mathbf {c}_i),\mathbf {g}_W(\mathbf {c}_i')\right\rangle , \end{aligned}$

(4)

where $\left||\cdot \right||$ denotes the vector norm, $\left\langle \cdot ,\cdot \right\rangle$ denotes the dot product, $g_{\text {max}}$ is the estimated mean amplitude of intensity gradients at vertebra boundaries that is used to suppresses the weighted gradients, which may occur if the gradient magnitude at the boundary of the object of interest is considerably smaller than of another object in its neighborhood (e.g. pedicle screws), and $\mathbf {g}_W$ is the image appearance operator at candidate mesh centroid location $\mathbf {c}_i'$ :

$\begin{aligned} \mathbf {g}_W(\mathbf {c}_i') = \big (1 + C(\mathbf {c}_i')\big ) \, \mathbf {g}(\mathbf {c}_i'), \end{aligned}$

(5)

where $\mathbf {g}(\mathbf {c}_i')$ is the intensity gradient at $\mathbf {c}_i'$ and $C(\mathbf {c}_i') \in [0,1]$ is the continuous response to the Canny edge operator [2]. By adding additional weights to the image intensity gradients, vertebra boundary points are more likely to be detected. In contrast to the original technique [9, 15], the size of the sampling parcel $$J$$