registration using mutual information

and 
$$\mathcal{B}$$
be two images that are geometrically related by the registration transformation 
$$\mathbf{T}_{\boldsymbol{\alpha }}$$
with parameters 
$$\boldsymbol{\alpha }$$
such that voxels p in 
$$\mathcal{A}$$
with intensity a correspond to voxels 
$$\mathbf{q} = \mathbf{T}_{\boldsymbol{\alpha }}(\mathbf{p})$$
in 
$$\mathcal{B}$$
with intensity b. Taking random samples p in 
$$\mathcal{A}$$
, a and b can be considered as discrete random variables A and B with joint and marginal distributions p A B (a, b), p A (a) and p B (b) respectively. The mutual information I(A, B) of A and B measures the degree of dependence between A and B as the distance between the joint distribution p A B (a, b) and the distribution associated to the case of complete independence p A (a). p B (b), by means of the Kullback-Leibler measure [4], i.e.




$$\displaystyle{ I(A,B) =\sum _{a,b}p_{AB}(a,b)\log \frac{p_{AB}(a,b)} {p_{A}(a).p_{B}(b)} }$$

(1)
The relationship p A B (a, b) between a and b and hence their mutual information I(A, B) depends on 
$$\mathbf{T}_{\boldsymbol{\alpha }}$$
, i.e. on the registration of the images. The mutual information registration criterion postulates that the images are geometrically aligned by the transformation 
$$\mathbf{T}_{\boldsymbol{\alpha }^{{\ast} }}$$
for which I(A, B) is maximal:



$$\displaystyle{\boldsymbol{\alpha }^{{\ast}} =\arg \max _{\boldsymbol{\alpha }}I(A,B)}$$


Mutual information is related to the information theoretic notion of entropy by the equations 
$$I(A,B) = H(A) + H(B) - H(A,B) = H(A) - H(A\vert B) = H(B) - H(B\vert A)$$
with H(A) and H(B) being the entropy of A and B respectively, H(A, B) their joint entropy and H(A | B) and H(B | A) the conditional entropy of A given B and of B given A respectively. The entropy H(A) is a measure of the amount of uncertainty about the random variable A, while H(A | B) is the amount of uncertainty left in A when knowing B. Hence, I(A, B) measures the amount of information that one image contains about the other, which should be maximal at registration. If both marginal distributions p A (a) and p B (b) can be considered to be independent of the registration parameters 
$$\boldsymbol{\alpha }$$
, the MI criterion reduces to minimizing the joint entropy H A B (A, B). If either p A (a) or p B (b) is independent of 
$$\boldsymbol{\alpha }$$
, which is the case if one of the images is always completely contained in the other, the MI criterion reduces to minimizing the conditional entropy H(A | B) or H(B | A). However, if both images only partially overlap, which is very likely during optimization, the volume of overlap will change when 
$$\boldsymbol{\alpha }$$
is varied and both marginal distributions p A (a) and p B (b) and therefore also their entropies H(A) and H(B) will in general depend on 
$$\boldsymbol{\alpha }$$
. Hence, maximizing mutual information will tend to find as much as possible of the complexity that is in the separate data sets so that at the same time they explain each other well.

Other information-theoretic registration measures can be derived from the MI criterion presented above, such as the entropy correlation coefficient 
$$ECC(A,B) = 2. \frac{I(A,B)} {H(A)+H(B)}$$
[19] or the normalized mutual information 
$$NMI = \frac{H(A)+H(B)} {H(A,B)}$$
[38] with 0 ≤ E C C ≤ 1, 1 ≤ N M I ≤ 2 and 
$$ECC = 2.(1 - 1/NMI)$$
. Maximization of N M I or E C C may be superior to MMI itself in case the region of overlap of both images is relatively small at the correct registration solution, as MMI may be biased towards registration solutions with larger total amount of information H(A) + H(B) within the region of overlap [38]. MI is only one example of the more general fi n f o r m a t i o n measures of dependence, several of which were investigated for image registration in [31].



4 Implementation


The MMI registration criterion does not require any preprocessing or segmentation of the images. With each of the images is associated a 3-D coordinate frame in millimeter units, that takes the pixel size, inter-slice distance and the orientation of the image axes relative to the patient into account. One of the images to be registered is selected to be the floating image 
$$\mathcal{F}$$
from which samples s ∈ S are taken and transformed by the geometric transformation 
$$\mathbf{T}_{\boldsymbol{\alpha }}$$
with parameters 
$$\boldsymbol{\alpha }$$
into the reference image 
$$\mathcal{R}$$
. S may include all voxels in 
$$\mathcal{F}$$
or a subset thereof to increase speed performance.

The MMI method requires the estimation of the joint probability density p(f, r) of corresponding voxel intensities f and r in the floating and reference image respectively. This can be obtained from the joint intensity histogram of the region of overlap of the images. The joint image intensity histogram 
$$h_{\boldsymbol{\alpha }}(f,r)$$
of the volume of overlap 
$$s \in S_{\boldsymbol{\alpha }} \subset S$$
of 
$$\mathcal{F}$$
and 
$$\mathcal{R}$$
can be constructed by simple binning of the image intensity pairs 
$$(f(s),r(\mathbf{T}_{\boldsymbol{\alpha }}(s))$$
for all 
$$s \in S_{\boldsymbol{\alpha }}$$
. In order to do this efficiently, the floating and the reference image intensities are first linearly rescaled to the range 
$$[0,n_{\mathcal{F}}- 1]$$
and 
$$[0,n_{\mathcal{R}}- 1]$$
respectively, with 
$$n_{\mathcal{F}}$$
and 
$$n_{\mathcal{R}}$$
the number of bins assigned to the floating and reference image respectively and 
$$n_{\mathcal{F}}\times n_{\mathcal{R}}$$
being the total number of bins in the joint histogram. Estimations for the marginal and joint image intensity distributions 
$$p_{\mathcal{F},\boldsymbol{\alpha }}(f)$$
, 
$$p_{\mathcal{R},\boldsymbol{\alpha }}(r)$$
and 
$$p_{\mathcal{F}\mathcal{R},\boldsymbol{\alpha }}(f,r)$$
are obtained by normalization of 
$$h_{\boldsymbol{\alpha }}(f,r)$$



$$\displaystyle\begin{array}{rcl} p_{\mathcal{F}\mathcal{R},\boldsymbol{\alpha }}(f,r)& =& \frac{h_{\boldsymbol{\alpha }}(f,r)} {\sum _{f,r}h_{\boldsymbol{\alpha }}(f,r)}{}\end{array}$$

(2)




$$\displaystyle\begin{array}{rcl} p_{\mathcal{F},\boldsymbol{\alpha }}(f)& =& \sum _{r}p_{\mathcal{F}\mathcal{R},\boldsymbol{\alpha }}(f,r){}\end{array}$$

(3)




$$\displaystyle\begin{array}{rcl} p_{\mathcal{R},\boldsymbol{\alpha }}(r)& =& \sum _{f}p_{\mathcal{F}\mathcal{R},\boldsymbol{\alpha }}(f,r){}\end{array}$$

(4)
and the MI registration criterion is evaluated using



$$\displaystyle\begin{array}{rcl} I(\boldsymbol{\alpha })& =& \sum _{f,r}\,p_{\mathcal{F}\mathcal{R},\boldsymbol{\alpha }}(f,r)\,\log _{2} \frac{p_{\mathcal{F}\mathcal{R},\boldsymbol{\alpha }}(f,r)} {p_{\mathcal{F},\boldsymbol{\alpha }}(f)\ p_{\mathcal{R},\boldsymbol{\alpha }}(r)}.{}\end{array}$$

(5)

Typically, 
$$n_{\mathcal{F}}$$
and 
$$n_{\mathcal{R}}$$
need to be chosen much smaller than the number of different values in the original images in order to assure a sufficient number of counts in each bin. If not, the joint histogram 
$$h_{\mathcal{F}\mathcal{R},\boldsymbol{\alpha }}$$
would be rather sparse with many zero entries and entries that contain only one or a few counts, such that a small change in the registration parameters 
$$\boldsymbol{\alpha }$$
would lead to many discontinuous changes in the joint histogram, with non-zero entries becoming zero and vice versa, that propagate into 
$$p_{\mathcal{F}\mathcal{R},\boldsymbol{\alpha }}$$
. Such abrupt changes in 
$$p_{\mathcal{F}\mathcal{R},\boldsymbol{\alpha }}$$
induce discontinuities and many local maxima in 
$$I(\boldsymbol{\alpha })$$
, which deteriorates optimization robustness of the MI measure. Appropriate values for 
$$n_{\mathcal{F}}$$
and 
$$n_{\mathcal{R}}$$
can only be determined by experimentation. Moreover, 
$$\mathbf{T}_{\boldsymbol{\alpha }}(s)$$
will in general not coincide with a grid point of 
$$\mathcal{R}$$
, such that interpolation of the reference image is needed to obtain the image intensity value 
$$r(\mathbf{T}_{\boldsymbol{\alpha }}(s))$$
. Zeroth order or nearest neighbor interpolation of 
$$\mathcal{R}$$
is most efficient, but is insensitive to translations up to 1 voxel and therefore insufficient to guarantee subvoxel accuracy. But even when higher order interpolation methods are used, such as linear, cubic or B-spline interpolation, simple binning of the interpolated intensity pairs (f(s), r(T α s)) leads to discontinuous changes in the joint intensity probability 
$$p_{\mathcal{F}\mathcal{R},\boldsymbol{\alpha }}(f,r)$$
and in the marginal probability 
$$p_{\mathcal{R},\boldsymbol{\alpha }}(r)$$
for small variations of α when the interpolated values r(T α s) fall in a different bin. Note that post-processing the histogram obtained by binning by convolution with a Gaussian or other smoothing kernel is not sufficient to eliminate these discontinuities.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Sep 16, 2016 | Posted by in GENERAL RADIOLOGY | Comments Off on registration using mutual information

Full access? Get Clinical Tree

Get Clinical Tree app for offline access