Shape and Appearance Models

apply a transformation defined by parameters $\mathbf{t}$ . In 2D, similarity transformations have 4 parameters, affine transformations are defined by 6 parameters. The configurations of points defined by $\mathbf{x}$ and $T_{\mathbf{t}}(\mathbf{x})$ are considered to have the same shape. Shape differences are those changes that cannot be explained by application of such a global transformation. If we use n 2D points, $\{(x_{j},y_{j})\}$ , to describe the shape, then we can represent the shape as the 2n element vector, $\mathbf{x}$ , where

$\displaystyle{ \mathbf{x} = (x_{1},\ldots,x_{n},y_{1},\ldots,y_{n})^{T} }$

(1)

For instance, Fig. 1 gives an example of a set of 46 points used to define the shape of the outline of a vertebra in a DXA image

Fig. 1

Set of points used to define the shape of a vertebra in a DXA image (image enhanced to aid visibility of structures)

Given s training examples, we generate s such vectors $\mathbf{x}_{i}$ ( $i = 1\ldots s$ ). Before we can perform statistical analysis on these vectors it is important that the shapes represented are in the same co-ordinate frame. This can be achieved by using Procrustes Analysis [15]. This transforms each shape in a set, $\mathbf{x}_{i}$ , so that the sum of squared distances of the shape to the mean ( $D =\sum \vert S(\mathbf{x}_{i}) -\bar{\mathbf{x}}\vert ^{2}$ ) is minimised.

Let the vector $\mathbf{x}_{i}$ contain the n coordinates of the i ^thshape. These vectors form a distribution in 2n dimensional space.

If we can model this distribution, we can generate new examples, similar to those in the original training set, and we can examine new shapes to decide whether they are plausible examples.

To simplify the problem, we first wish to reduce the dimensionality of the data from nd to something more manageable. An effective approach is to apply Principal Component Analysis (PCA) to the data [10]. The data form a cloud of points in the 2n-D space. PCA computes the main axes of this cloud, allowing one to approximate any of the original points using a model with fewer than 2n parameters. The result is a linear model of the form

$\displaystyle{ \mathbf{x} =\bar{ \mathbf{x}} + \mathbf{P}\mathbf{b} }$

(2)

where $\bar{\mathbf{x}}$ is the mean of the data, $\mathbf{P} = (\phi _{1}\vert \phi _{2}\vert \ldots \vert \phi _{t})$ contains the t eigenvectors of the covariance matrix of the training set, corresponding to the largest eigenvalues, $\mathbf{b}$ is a t dimensional parameter vector.

Equation (2) generates a shape in the model frame, where the effects of the global transformation have been removed by the Procrustes Analysis. A shape in the image frame, $\mathbf{X}$ , can be generated by applying a suitable transformation to the points, $\mathbf{x}$ : $\mathbf{X} = T_{\mathbf{t}}(\mathbf{x})$ . Typically $T_{\mathbf{t}}$ will be a similarity transformation described by a scaling, s, an in-plane rotation, $\theta$ , and a translation $(t_{x},t_{y})$ .

The best choice of parameters for a given shape $\mathbf{x}$ is given by

$\displaystyle{ \mathbf{b} = \mathbf{P}^{T}(\mathbf{x} -\bar{\mathbf{x}}) }$

(3)

The vector $\mathbf{b}$ defines a set of parameters of a deformable model. By varying the elements of $\mathbf{b}$ we can vary the shape, $\mathbf{x}$ , using Eq. (2). The variance of the i ^thparameter, b _i, across the training set is given by $\lambda _{i}$ . By applying suitable limits to the model parameters we ensure that the shape generated is similar to those in the original training set.

If the original data, $\{\mathbf{x}_{i}\}$ , is distributed as a multivariate Gaussian, then the parameters $\mathbf{b}$ are distributed as an axis-aligned Gaussian, $p(\mathbf{b}) = N(\mathbf{0},\varLambda )$ where $\varLambda = \mathit{diag}(\lambda _{1},\ldots,\lambda _{t})$ . In practise it has been found that the assumption of normality works well for many anatomical structures.

2.1 Examples of shape models

Figure 2 shows the effect of changing the first two shape parameters on a model of the outline of a vertebra, trained from 350 examples such as that shown in Fig. 1.

Fig. 2

First two modes of a shape model of a vertebra (Parameters varied by ± 3 s.d. from the mean)

Figure 3 shows the effect of varying the parameter controlling the first mode of shape variation of a 3D model of a set of deep brain structures, constructed from shapes extracted from MR images of 69 different subjects¹. Note that since the method only represents points, it easily model multiple structures.

Fig. 3

Two views of the first mode of a shape model of structures in the brain

3 Active Shape Models

Given a new image containing the structure of interest, we wish to find the pose parameters, $\mathbf{t}$ , and the shape parameters, $\mathbf{b}$ , which best approximate the shape of the object. To do this we require a model of how well a given shape would match to the image.

Suppose we have a set of local models, each of which can estimate how well the image information around a point matches that expected from the training set, returning a value $q_{i}(X_{i},Y _{i})$ . We then seek the shape and pose parameters which generate points $\mathbf{X} =\{ X_{i},Y _{i}\}$ so as to maximise

$\displaystyle{ Q_{asm}(\mathbf{t},\mathbf{b}) =\sum _{ i=1}^{n}q_{ i}(X_{i},Y _{i}) }$

(4)

where $\mathbf{X} = T_{\mathbf{t}}(\bar{\mathbf{x}} + \mathbf{P}\mathbf{b})$ .

The basic Active Shape Model algorithm is a simple iterative approach to locating this optima. Each iteration involves two steps

Search around each point $(X_{i},Y _{i})$ for a better position, $(X_{i}',Y _{i}')$

Update the pose and shape parameters to best fit the model to the new positions $\mathbf{X}'$

By decoupling the search and update steps the method can be fast. Since each point is searched for independently, it is simple to implement and can make use of parallel processing where available.

Because of the local nature of the search, there is a danger of falling into local minima. To reduce the chance of this it is common to use a multi-resolution framework, in which we first search at coarse resolutions, with low resolution models, then refine at finer resolutions as the search progresses. This has been shown to significantly improve the speed, robustness and final accuracy of the method.

The approach has been found to be very effective, and numerous variants explored. In the following we will summarise some of the more significant approaches to each step.

3.1 Searching for Model Points

The simplest approach is to have q _i(X, Y ) return the edge strength at a point. However we will usually have a direction and scale associated with each point, allowing the local models to be more specific. For instance, since most points are defined along boundaries (or surfaces in 3D), the normal to the boundary/surface defines a direction, and a scale can be estimated from the global pose transformation.

Since we have a training set, a natural approach is to learn statistical models of the local structure from that around each point in the known examples [10]. For each model point in each image, we can sample at a set of nearby positions. For instance, at points along smooth curves we can sample profiles (Fig. 4a), whereas at corner points we can sample on a grid around the point (Fig. 4b).

Fig. 4

ASM search requires models of the image around each point

The samples around model point i in training image j can be concatenated into a vector, $\mathbf{g}_{ij}$ . Typically each such vector is normalised to introduce invariance to overall intensity effects (for instance, by subtracting the mean intensity and scaling so the vector is of unit length). We then can estimate the probability density distribution $p_{i}(\mathbf{g})$ , which may be a Gaussian, or something more sophisticated if sufficient samples are available.

Given such a PDF for each point, we can evaluate a new proposed position by sampling the image around the point into a new vector $\mathbf{g}$ , and setting $q_{i}(\mathbf{X}_{i}) =\log p_{i}(\mathbf{g})$ .

To search for a better position, we evaluate the PDF at a number of displaced positions, and choose the best. For profile models, we typically search at positions along a profile normal to the current boundary. For grid based models, we would test points on a grid. There are natural extensions to 3D volume images.

Rather than simply sample image intensities, it is possible to sample other features. For instance van Ginneken et al. [36] sampled Gabor features at multiple scales.

This approach has been shown to be effective in many applications [10]. However, van Ginneken et al. [36] pointed out that improved estimates of the model point positions can be obtained by explicitly training classifiers for each point, to discriminate the correct position from incorrect nearby positions. The approach is to treat samples at the model points in the training set as positive examples, then create a set of negative examples by sampling nearby. These examples are used to train a classifier. During search samples around a set of candidate positions are evaluated, and the one most strongly classified as positive is used. It is assumed that the classifier gives a continuous response which is larger for positive examples, and can thus be used to rank the candidates.

In the case of [36] a nearest neighbour classifier was used, but encouraging results have been obtained with a range of methods. For instance, Li and Ito [37] used Adaboost to train a classifier for facial feature point detection.

A related approach is to attempt to learn an objective function which has its minimum at the correct position, and has a smooth form to allow efficient optimisation [28].

3.2 Updating the Model Parameters

After locating the best local match for each model point, $\mathbf{X}'$ , we update the model pose and shape parameters so as to best match these new points. This can be considered as a regularisation step, forcing points to form a ‘reasonable’ shape – one defined by the original model.

The simplest approach is to find $\mathbf{t}$ and $\mathbf{b}$ so as to minimise

$\displaystyle{ \vert \mathbf{X}' - T_{\mathbf{t}}(\bar{\mathbf{x}} + \mathbf{P}\mathbf{b})\vert ^{2} }$

(5)

For anything other than when T() is a translation, this is a non-linear sum of squares problem, so can be efficiently solved with Levenberg-Marquardt [29], or an iterative algorithm in which we alternate between solving for $\mathbf{b}$ with fixed $\mathbf{t}$ , and solving for $\mathbf{t}$ with fixed $\mathbf{b}$ , each of which has a closed form. To prevent the generation of extreme shapes, limits can be placed on the parameters. If we assume a normal distribution for the original shapes, then the parameters b _iare linearly independent with a variance $\lambda _{i}$ . A natural constraint is then to force

$\displaystyle{ M =\sum _{ i=1}^{t}\frac{b_{i}^{2}} {\lambda _{i}} < M_{\mathit{thresh}} }$

(6)

Since M will be distributed as $\chi ^{2}$ with t degrees of freedom, this distribution can be used to select a suitable value of M _thresh(for instance, one that includes 98% of the normal samples).

This approach effectively treats every choice of shape parameters within the threshold as equally likely. An alternative is to include a prior, rather than a hard threshold, for instance

$\displaystyle{ \log p(\mathbf{t},\mathbf{b}\vert \mathbf{X}') = \mathit{const} -\vert T_{\mathbf{t}}^{-1}(\mathbf{X}') - (\bar{\mathbf{x}} + \mathbf{P}\mathbf{b})\vert ^{2}/\sigma _{ r}^{2} -\sum _{ i=1}^{t}\frac{b_{i}^{2}} {\lambda _{i}} }$

(7)

where σ _r² is the variance of the residuals (those not explained by the model) in the model frame and we assume all pose parameters are equally likely. We can thus find the most likely values for $\mathbf{t},\mathbf{b}$ by optimising (7). Again, this is a non-linear sum of squares problem, for which fast iterative algorithms exist (for instance, see [38]).

Information about the uncertainty of the estimates of the best matching point positions can also be included into the optimisation [17].

Since the models for each point may return false matches (for instance by latching on to the wrong nearby structure), it can be effective to use a more robust method in which outliers are detected and discarded [27].

4 Active Appearance Models

The Active Shape Model treats the search for each model point as independent. However, there will usually be correlations between the local appearance around nearby points. This can be modelled, but leads to a more difficult optimisation problem. The Active Appearance Model algorithm is a local optimisation technique which can match models of appearance to new images efficiently.

Only gold members can continue reading. Log In or Register to continue