Segmentation and Registration Through the Duality of Congealing and Maximum Likelihood Estimate

Fig. 1.

Microscope images of Drosophila melanogaster imaginal discs with gene expression patterns (Color figure online).

A popular method which aligns images and simultaneously estimates a simple statistical shape model was proposed by E. Learned-Miller in [9] and is known as congealing. It considers the entropy of a simple, pixel-wise independent distribution as the objective function for searching the unknown transformations. This idea is indeed intuitive and appealing. Quite a few works were following it and have proposed different variants and generalisations of this approach [1, 5, 7, 10].

Besides from being intuitive and appealing, this idea raises several theoretical and practical questions. Firstly, entropy minimisation is not a commonly accepted estimator in mathematical statistics. Secondly, from the algorithmic point of view, one has to minimise a non convex function depending on many variables. Almost all methods that apply congealing solve the optimisation task by block-wise local coordinate descent without a guarantee to converge to a local minimum. And finally, from the conceptual view, we may wish to decouple the shape model from the signal domain and to formulate it in terms of segmentation labellings. This will require unsupervised estimation (learning).

In this paper we try to answer at least some of the raised questions. First, we show that the original congealing is in fact the DC-dual task (difference of convex functions) for a properly formulated Maximum Likelihood estimation task. This interpretation immediately leads to a different choice for the algorithm which is substantially simpler and faster than the known congealing algorithm.

The second contribution is to show how to generalise the task for models in which the shape is formulated in terms of segmentation labellings and is related to the signal domain via a parametric appearance model. We show how to estimate the parameters of the shape model together with the parameters of the appearance model and the collection of transformations given a set of object images. We call this generalisation unsupervised congealing.

In the experimental section we first apply the new, unsupervised congealing to artificially generated images. This allows to compare the results with the known ground truth. In the second, and main part of this section we apply the newly proposed method to the task of aligning and segmenting imaginal discs and compare it with a previously published approach [6].

2 Theory

In the first part of this section we will show that supervised congealing, as introduced in [9], is indeed the DC-dual of a Maximum Likelihood estimation task. To avoid overloading with technicalities like boundary effects, infinite dimensional function spaces etc., we consider a simple probabilistic model for binary images defined on a two dimensional discrete torus $\mathbb {T}^2_n$ .

2.1 Notations and Model

Let $\mathcal {S}(\mathbb {T}^2_n)$ denote the set of all binary images $\mathbf {s}:\mathbb {T}^2_n \rightarrow \{0,1\}$ defined on the two dimensional discrete torus $\mathbb {T}^2_n$ . We will denote the value of the image $\mathbf {s}$ in a particular node $i\in \mathbb {T}^2_n$ of the torus by $$s_i$$

Let us consider the following family of pixelwise independent probability distributions on $\mathcal {S}(\mathbb {T}^2_n)$

$\begin{aligned} p_{\mathbf {u},T} (\mathbf {s}) = \frac{1}{Z} \exp \left\langle \mathbf {u},T\mathbf {s}\right\rangle , \end{aligned}$

(1)

where $\mathbf {u}$ is a real valued field of parameters defined on $\mathbb {T}^2_n$ and $T \in \mathcal {T}$ denotes an orthogonal linear transformation $T:\mathcal {S}(\mathbb {T}^2_n) \rightarrow \mathcal {S}(\mathbb {T}^2_n)$ , for example rigid body transformations. Notice that the normalising factor Z is independent of T as a consequence. For the sake of simplicity we assume here that $\mathcal {T}$ is the set of all discrete translations of the torus.

Suppose now, that we are given a set of binary images $\mathbf {s}^\ell$ , $\ell \in L$ , each of them independently generated by one of the distributions $p_{\mathbf {u},T^\ell }$ . All of them share the parameter vector $\mathbf {u}$ , but each of them has a different $T^\ell \in \mathcal {T}$ . Both, the vector $\mathbf {u}$ and the collection of transformations $\mathbf {T} = (T^1,\ldots ,T^L)$ are unknown and should be estimated from the given sample of (binary) images.

2.2 DC-duality of MLE and Congealing

Applying the Maximum Likelihood Estimator (MLE) for this task, means to solve the following optimisation problem

$\begin{aligned} \frac{1}{L} \sum _\ell \left\langle \mathbf {u},T^\ell \mathbf {s}^\ell \right\rangle - \log Z(\mathbf {u}) \rightarrow \max _{\mathbf {u},\mathbf {T}}. \end{aligned}$

(2)

The normalisations factor (considered as a function of $\mathbf {u}$ ) is easy to compute

$\begin{aligned} \log Z(\mathbf {u}) = \sum _{i\in \mathbb {T}^2_n} \log (1 + e^{u_i} ). \end{aligned}$

(3)

We may rewrite the optimisation task equivalently by

(4)

This is a DC program, because we have to minimise a difference of convex functions. Every DC program has a related DC dual program which has the same optimal value as the primal one

$\begin{aligned} g(\mathbf {u}) - h(\mathbf {u})&\rightarrow \min _\mathbf {u}\end{aligned}$

(5)

$\begin{aligned} h^*(\mathbf {v}) - g^*(\mathbf {v})&\rightarrow \min _\mathbf {v} , \end{aligned}$

(6)

where for a given function g, $$g^*$$

is its Fenchel conjugate defined by

$\begin{aligned} g^*(\mathbf {v}) = \sup _{\mathbf {u}} \bigl \{\left\langle \mathbf {v},\mathbf {u}\right\rangle - g(\mathbf {u}) \bigr \} \end{aligned}$

(7)

See e.g. [2, 3] for the theory of DC programs, duality and related algorithms.

Let us have a closer look at the DC-dual program (6). It is easy to prove, that the Fenchel conjugate of $g(\mathbf {u}) = \log Z(\mathbf {u})$ is the (negative) entropy

$\begin{aligned} g^*(\mathbf {v}) = \sum _{i\in \mathbb {T}^2_n} v_i \log v_i + (1-v_i) \log (1 - v_i), \end{aligned}$

(8)

where we use the convention that $x\log x$ is zero if $$x=0$$

and is $+\infty$ if x is negative. Let us compute the Fenchel conjugate of the function $h(\mathbf {u})$

$\begin{aligned}&h^*(\mathbf {v}) = \max _\mathbf {u} \Bigl \{ \left\langle \mathbf {v},\mathbf {u}\right\rangle - \max _{\mathbf {T}} \frac{1}{L} \sum _\ell \left\langle \mathbf {u},T^\ell \mathbf {s}^\ell \right\rangle \Bigl \} \end{aligned}$

(9)

$\begin{aligned}&= \max _\mathbf {u} \min _\mathbf {T} \left\langle \mathbf {v} - \frac{1}{L}\sum _\ell T^\ell \mathbf {s}^\ell ,\mathbf {u}\right\rangle . \end{aligned}$

(10)

This problem has the form $\max _{\mathbf {u}} \min _{i\in I} \left\langle \mathbf {a}^\mathbf{i},\mathbf {u}\right\rangle$ . Its value is either infinity if the origin does not lie in the convex hull of the vectors $\mathbf {a}^i$ , $i\in I$ or zero in the opposite case. It follows from Gordan’s lemma (see e.g. [4]). Hence, we obtain $h^*(\mathbf {v}) = \delta _C(\mathbf {v})$ , where the indicator function $\delta _C(\mathbf {v})$ is 0 for $\mathbf {v}$ in C and $+\infty$ otherwise. The set C is the convex hull of all possible average images, i.e.

$\begin{aligned} C = \hbox {conv}\Bigl \{ \frac{1}{L}\sum _\ell T^\ell \mathbf {s}^\ell \Bigm | \forall T^1,\ldots ,T^L \in \mathcal {T} \Bigr \}. \end{aligned}$

(11)

From all that we see that (6) requires to minimise the entropy (8) on the convex polytope C. Since the entropy is a concave function, the minimum is attained at an extremal point (vertex) of C. This in turn is precisely the task which congealing aims to solve. Hence, we proved the following theorem.

Theorem 1

For the model family (1) and any (finite) set $\mathcal {T}$ of orthogonal transformations $T:\mathcal {S}(\mathbb {T}^2_n) \rightarrow \mathcal {S}(\mathbb {T}^2_n)$ , congealing and MLE are DC-duals of each other. In particular, their optimal values are equal.

2.3 A DC-algorithm

Congealing aims at solving the dual task (6) directly, i.e. to minimise the entropy (8) as a function of $\mathbf {v} = \frac{1}{L}\sum _\ell T^\ell \mathbf {s}^\ell$ w.r.t. to all possible collections of transformations $\mathbf {T}=(T^1,\ldots ,T^L)$ . This is usually done by block-wise local coordinate descent. Given the current $\mathbf {T}^{(k)}$ , the algorithm tries to improve the entropy by sequentially probing local changes for each $T^\ell$ . Such an approach lacks a guarantee to converge to a local minimum.

Another option is to apply a DC algorithm. This type of algorithm aims at solving the primal and the dual task simultaneously, by constructing a pair of sequences $\mathbf {u}^{(k)}$ , $\mathbf {v}^{(k)}$ in an alternating way

$\begin{aligned} (a) \quad&\mathbf {v}^{(k)} \in \partial h(\mathbf {u}^{(k)}) \end{aligned}$

(12)

$\begin{aligned} (b) \quad&\mathbf {u}^{(k+1)} \in \partial g^* (\mathbf {v}^{(k)}), \end{aligned}$

(13)

where $\partial h(\mathbf {u})$ denotes the sub-differential of a convex function h at point $\mathbf {u}$ . In the case being considered, the algorithm reads as follows. Choose an initial $\mathbf {u}^{(0)}$ and repeat applying the following two steps until convergence