Shape-Constrained Deformable Model Approach for Segmentation of Vertebrae from CT Spine Images

contain three-dimensional (3D) images of the thoracolumbar spine, where each image is assigned a series of binary masks representing reference segmentations of each individual thoracolumbar vertebra from level T1 to L5, and let each vertebral level be represented by a 3D face-vertex mesh $\fancyscript{M}=\{\fancyscript{V},\fancyscript{F}\}$ of $|\fancyscript{V}|$ vertices and $|\fancyscript{F}|$ faces (i.e. triangles). A chain of mean vertebra shape models represents the mean shape model of the whole thoracolumbar spine used for spine detection, while the mean shape models of individual vertebrae are used for vertebra detection and segmentation in an unknown 3D image .

1.1 Vertebra Detection

The detection of vertebrae in an unknown 3D image

was performed by a novel optimization scheme based on interpolation theory [3], which consists of three steps: spine detection, vertebra detection and vertebra alignment. To detect the spine in image

, the pose of the mean shape model of the thoracolumbar spine $\fancyscript{M}$ is optimized against three translations (i.e. coordinates

and

representing sagittal, coronal and axial anatomical directions, respectively), and the resulting global maximum represents the location of the spine in the 3D image, which is further used to initialize the vertebra detection. To detect each vertebra, the pose of the corresponding mean vertebra shape model $\fancyscript{M}$ is optimized against three translations, however, in this case all local maxima of the resulting interpolation are extracted, corresponding to locations of the observed and neighboring vertebrae. The correct location of each vertebra is determined by the optimal path that passes through a set of locations, where each location corresponds to a local maximum at a different vertebral level. Finally, a more accurate alignment of the mean vertebra shape model is performed by optimizing the pose of each model against three translations, one scaling (i.e. factor

) and three rotations (i.e. angles $\varphi _x$ , $\varphi _y$ and $\varphi _z$ about coordinate axes

and

, respectively). The resulting alignment represents the final vertebra detection result.

1.2 Vertebra Segmentation

After the interpolation-based alignment [3], segmentation of each vertebra in the unknown image

is performed by an improved mesh deformation technique [5] that moves mesh vertices to their optimal locations while preserving the underlying vertebral shape [4, 6]. In this iterative procedure, the image object detection for mesh face centroids that are represented by the centers of mass for mesh faces $\fancyscript{F} \in \fancyscript{M}$ and reconfiguration of mesh vertices $\fancyscript{V} \in \fancyscript{M}$ are executed in each iteration.

1.2.1 Object Detection

By displacing each mesh face centroid $\mathbf {c}_i$ ; $i=1,2,\ldots ,\left| \fancyscript{F}\right|$ along its corresponding mesh face normal $\mathbf {n}(\mathbf {c}_i)$ , a new candidate mesh face centroid $\mathbf {c}_i^{*}$ is found in each $$k$$

-th iteration:

$\begin{aligned} \mathbf {c}_i^{*} = \mathbf {c}_i + \delta \, j_i^{*} \, \mathbf {n}(\mathbf {c}_i), \end{aligned}$

(1)

where $\delta$ is the length of the unit displacement, and $j_i^{*}$ is an element from set $\fancyscript{J}$ ; $j_i^{*} \in \fancyscript{J}$ . Set $\fancyscript{J}$ represents the search profile along $\mathbf {n}(\mathbf {c}_i)$ , called the sampling parcel:

$\begin{aligned} \fancyscript{J} = \Big \{ -j, -j+1, \ldots , j-1, j\Big \}; \quad j = J - k + 1, \end{aligned}$

(2)

which is of size $$2J+1$$

at initial iteration $$k=1$$

and

at final iteration $$k=K$$

. The element $j_i^{*}$ that defines the location of $\mathbf {c}_i^{*}$ is determined by detecting vertebra boundaries:

$\begin{aligned} j_i^{*} = \underset{j \in \fancyscript{J}}{\arg \max } \Big \{F\big (\mathbf {c}_i, \mathbf {c}_i + \delta \, j \, n(\mathbf {c}_i)\big ) - D \, \delta ^2 \, j^2 \Big \}. \end{aligned}$

(3)

where $\mathbf {c}_i'=\mathbf {c}_i+\delta \,j_i\,\mathbf {n}(\mathbf {c}_i)$ is the candidate location for $\mathbf {c}_i^{*}$ (Eq. 1), and parameter $$D$$

controls the tradeoff between the distance from $\mathbf {c}_i$ to $\mathbf {c}_i'$ and the response of the boundary detection operator $$F$$

$\begin{aligned} F(\mathbf {c}_i, \mathbf {c}_i') = \frac{g_{\text {max}}\left( g_{\text {max}} + \left||\mathbf {g}_W(\mathbf {c}_i')\right||\right) }{g_{\text {max}}^2 + \left||\mathbf {g}_W(\mathbf {c}_i')\right||^2}\left\langle \mathbf {n}(\mathbf {c}_i),\mathbf {g}_W(\mathbf {c}_i')\right\rangle , \end{aligned}$

(4)

where $\left||\cdot \right||$ denotes the vector norm, $\left\langle \cdot ,\cdot \right\rangle$ denotes the dot product, $g_{\text {max}}$ is the estimated mean amplitude of intensity gradients at vertebra boundaries that is used to suppresses the weighted gradients, which may occur if the gradient magnitude at the boundary of the object of interest is considerably smaller than of another object in its neighborhood (e.g. pedicle screws), and $\mathbf {g}_W$ is the image apperance operator at candidate mesh centroid location $\mathbf {c}_i'$ :

$\begin{aligned} \mathbf {g}_W(\mathbf {c}_i') = \big (1 + \alpha C(\mathbf {c}_i') + (1-\alpha ) R(\mathbf {c}_i')\big ) \, \mathbf {g}(\mathbf {c}_i'), \end{aligned}$

(5)

where $C(\mathbf {c}_i')\in [0, 1]$ is the response to the Canny edge operator, $R(\mathbf {c}_i')\in [-1,1]$ is a random forest [1] regression model build upon an intensity-based descriptor and $\alpha$ is the weighting parameter.

1.2.2 Mesh Reconfiguration

Once the new candidate mesh face centroids $\mathbf {c}_i^{*}$ are detected, mesh $\fancyscript{M}=\{\fancyscript{V},\fancyscript{F}\}$ is reconfigured in each $$k$$

-th iteration by minimizing the weighted sum $$E$$

of energy terms:

$\begin{aligned} \underset{\fancyscript{M}}{\min }\big \{E\big \} = \underset{\fancyscript{M}}{\min }\big \{E_{\text {ext}} + \beta E_{\text {int}}\big \}, \end{aligned}$