Detection of Lumbar Vertebrae in CT Spine Images

computed tomography (CT) images of the lumbar spine. The resulting mean symmetric absolute surface distance of $1.25\,{\pm }\,0.41$ mm and Dice coefficient of $83.67\,{\pm }\,4.44$ %, computed from the final vertebra detection results against corresponding reference vertebra segmentations, indicate that the proposed framework can successfully detected vertebrae in CT images of the lumbar spine.

1 Introduction

Detection and segmentation of an object of interest can always be represented as a specific optimization problem. As optimization problems are usually non-deterministic polynomial-time hard (NP-hard) or non-analytically defined, they can be solved by brute force algorithms that are computationally demanding or by heuristic approaches that may not find the globally optimal solution. Moreover, if the optimization problem is ill-posed, there is no guarantee that the optimal detection or segmentation result is associated with the global optimum of the problem. Such lack of correspondence is observed in the case of vertebra detection in medical images of the spine, where neighboring vertebrae are of similar appearance and shape, which often causes mis-detections. We can therefore conclude that there is a need for an optimizer that will consider not only the global optimum but also local optima of the problem that represent candidate locations for each vertebra, and that can be further used for accurate and robust spine and vertebra detection. As standard optimization techniques, such as the Powell’s optimizer [1] or covariance matrix adaptation evolution strategy (CMA-ES) [2], detect only a single local optimum, we propose to use interpolation theory to overcome the above mentioned limitations and ensure the detection of all local optima. Interpolation theory estimates the behavior of smooth functions according to a small set of points, for which the behavior is known. In the case of object detection, interpolation theory can predict the position of the object of interest by computing a specified detector response for a sparse set of points, i.e. possible transformations of the object of interest. In this paper, we describe a novel framework for automated spine and vertebra detection in three-dimensional (3D) computed tomography (CT) images of the lumbar spine, where we first apply interpolation-based optimization to detect the location of the whole lumbar spine in the 3D image, and then detect the location of individual lumbar vertebrae within the spinal column.

2 Methodology

Detection of the object of interest (e.g. spine, vertebra) in an unknown 3D image $$I$$

can be, in general, performed by optimizing an objective function $f=f(\mathbf {\rho },I)$ , which maps the model of the object of interest, represented by a set $\mathbf {\rho }$ of $$n$$

parameters describing its geometrical properties (e.g. position, rotation, scaling etc.), to image $$I$$

-dimensional parameter space. The most straightforward but computationally demanding approach for optimizing $$f$$

is to apply brute force to compute its response for all $\mathbf {\rho } \in \varOmega ^n$ , where $\varOmega ^n$ is a bounded domain in the parameter space. Alternative approaches [1, 2] often compute function responses in a neighborhood of an initial set $\mathbf {\rho }'; \mathbf {\rho }' \in \varOmega ^n$ , predict the optimal $\mathbf {\rho }^{*}$ and iteratively move towards $\mathbf {\rho }^{*}$ . Although such optimization is computationally not demanding and may, especially if $\mathbf {\rho }'$ is relatively close to $\mathbf {\rho }^{*}$ , lead to the globally optimal solution, it may still converge to local optima. Therefore, there is a need to observe the whole $\varOmega ^n$ and, at the same time, minimize actual computations of $$f$$

. To overcome these problems, we propose a novel optimization scheme that is based on interpolation theory [3]. The proposed optimization scheme does not depend on initialization and observes the whole $\varOmega ^n$ , and we apply it for spine and vertebra detection in 3D spine images. Depending on the objective function $$f$$

, the scheme searches for optima in the form of either minima or maxima, however, we will refer to them as maxima in the following text.

Fig. 1

An illustration of the L3 vertebra detection. a Reference segmentation binary mask, overlayed onto a selected sagittal cross-section of the CT image. b The interpolation grid. c The objective function $$f$$

is computed only for nodes forming the interpolation grid. d The local maxima of the interpolation function $\bar{f}$ indicate candidate locations of the observed vertebra. In this case, the global maximum (in green) does not correspond to the location of L3 (Color figure online)

2.1 Interpolation Theory

By assuming that the objective function $$f$$

is smooth and its values are similar for similar sets $\mathbf {\rho }$ of parameters, $$f$$

can be approximated by function $\bar{f}$ if the response of $$f$$

is known for a limited set of nodes $\fancyscript{P}; \fancyscript{P} \subset \varOmega ^n$ (Fig. 1). If nodes in $\fancyscript{P}$ are selected properly, the difference between $$f$$

and its approximation $\bar{f}$ , i.e. the interpolation error, is relatively small, and the maximum of $\bar{f}$ corresponds to the maximum of $$f$$

at the same point $\mathbf {\rho }^{*}$ . The main concept of interpolation theory [3] is to select the interpolation function, in our case $\bar{f}$ , from a given class of functions so that its response passes through the nodes in $\fancyscript{P}$ , which form the interpolation grid over the observed domain $\varOmega ^n$ . However, interpolation of high-dimensional functions is computationally challenging, as the number of nodes grows exponentially with the increasing number of dimensions $$n$$

. To reduce the computational complexity, various strategies can be applied [4, 5], but in our case we take into account the properties of spine detection. The standard spine image acquisition procedure restricts the orientation of the imaged subject, and therefore the relative inclination of the spinal column in the image is more constant than its position, and similarly the orientation of a vertebra is restricted by the position of neighboring vertebrae. It can be concluded that dimensions related to object translation have to be associated with a larger number of interpolation nodes than dimensions related to object rotations. To optimize the objective function $$f$$

, defined on domain $\varOmega ^n$ of $$n$$

dimensions, we therefore propose the following algorithm:

Initialize set $\sigma$ of optimized dimensions as $\sigma =\big \{\big \}$ , and set $\bar{\sigma }$ of the remaining dimensions as $\bar{\sigma } = \big \{1,2,\ldots ,n\big \}$ .

Interpolate $$f$$

with $\bar{f}$ against the first $k \ll n$ dimensions.

Find

local maxima of $\bar{f}$ , then insert the observed dimensions into $\sigma$ ; $\sigma \leftarrow \sigma \cup \big \{1,2,\ldots ,k\big \}$ , and remove them from $\bar{\sigma }$ ; $\bar{\sigma } \leftarrow \bar{\sigma } \setminus \big \{1,2,\ldots ,k\big \}$ .

For each local maximum among $$t$$

, perform the following steps:

4a.

If $\bar{\sigma } \ne \big \{\big \}$ , take the next $$k$$

available dimensions and interpolate $$f$$

with $\bar{f}$ by considering dimensions from $\sigma$ to be optimized.

4b.

Find the global maximum of $\bar{f}$ , insert the observed dimensions into $\sigma$ and remove them from $\bar{\sigma }$ . Return to step 4a.

The resulting global maxima represent the locations of $$t$$

local maxima of $$f$$

In the proposed algorithm, the optimization dimensions have to be ordered according to the decreasing number of corresponding interpolation nodes, while the number of dimensions $$k$$

considered in each interpolation step may vary for different steps. Nevertheless, by taking into account only $$k$$

dimensions at once, the computational complexity of the high-dimensional optimization problem is reduced.

2.2 Mean Shape Model of the Lumbar Spine

Let set $\fancyscript{T}$ contain 3D images of the lumbar spine, where each image is assigned a series of binary masks representing reference segmentations of each individual lumbar vertebra from level L1 to L5. To extract a shape model of each vertebra from each image in $\fancyscript{T}$ , the marching cubes algorithm [6] is applied to each corresponding binary mask, resulting in a 3D face-vertex mesh consisting of vertices with triangle connectivity information. The dependency of the number of vertices in each mesh on the size of the image voxel and of the observed vertebra is removed by isotropic remeshing [7], and the coherent point drift algorithm [8] is used to recover the nonrigid transformation among sets of vertices and establish point wise correspondences among vertices of the same vertebral level. Finally, the generalized Procrustes alignment [9] is used to remove translation, rotation and scaling from corresponding meshes, yielding the mean shape model of each vertebra, represented by a 3D face-vertex mesh $\fancyscript{M}=\big \{V,F\big \}$ of $$|V|$$

vertices and $$|F|$$

faces (i.e. triangles). The mean shape model of the whole lumbar spine, i.e. a chain of mean shape models of individual vertebrae, is further used for spine detection, while mean shape models of individual vertebrae are used for vertebra detection in an unknown 3D image $$I$$

2.3 Spine Detection

Let the 3D mesh $\fancyscript{M} = \big \{V,F\big \}$ represent the mean shape model of the lumbar spine (Sect. 2.2), i.e. a chain of meshes representing individual vertebrae from L1 to L5. The objective function $$f$$

that measures the agreement between mesh $\fancyscript{M}$ and image $$I$$

is defined as:

$\begin{aligned} f(\fancyscript{M},I) = \sum _{i = 1}^{|V|} \left\langle \mathbf {g}_H\left( \mathbf {v}_i,I\right) ,\mathbf {n}\left( \mathbf {v}_i\right) \right\rangle , \end{aligned}$

(1)

where $\left\langle \cdot ,\cdot \right\rangle$ denotes the dot product, $$|V|$$

is the number of mesh vertices, $\mathbf {v}_i \in V$ is a mesh vertex, $\mathbf {g}_H(\mathbf {v}_i,I)$ is the Haar-like gradient [10] of image $$I$$

at the location of vertex $\mathbf {v}_i$ , and $\mathbf {n}(\mathbf {v}_i)$ is the mesh normal at vertex $\mathbf {v}_i$ . If mesh $\fancyscript{M}$ is correctly aligned with the spine in image $$I$$

, then mesh normals pointing outwards of the mesh are in maximal agreement with the Haar-like gradients pointing outwards of the spine, resulting in the maximum of the objective function $f(\fancyscript{M},I)$ . As Haar-like gradients are described as a difference between two neighboring cuboids instead of two neighboring voxels, they are not sensitive to local intensity fluctuations and therefore make spine detection more robust.