Measures and Applications to Multimodal Variational Imaging

for the squared domain (0, 1)². Images are simultaneously considered as matrices or functions on Ω: A discrete image is an N × N-matrix $U \in \left \{0,\ \ldots,\ 255\right \}^{N\times N}$ . Each of the entries of the matrix represents an intensity value at a pixel. Therewith is associated a piecewise constant function

$\displaystyle{ u_{N}(x) =\sum \nolimits _{ i=1}^{N}\sum \nolimits _{ j=1}^{N}U^{\mathit{ij}}\chi _{ \Omega _{\mathit{ij}}}(x), }$

(1)

where

$\displaystyle{\Omega _{\mathit{ij}}:= \left (\frac{i - 1} {N}, \frac{i} {N}\right ] \times \left (\frac{j - 1} {N}, \frac{j} {N}\right ]\;\text{for}\;1 \leq i,j \leq N,}$

and $\upchi _{\Omega \mathit{ij}}$ is the characteristic function of $\varOmega _{\mathit{ij}}$ . In the context of image processing, U ^ijdenotes the pixel intensity at the pixel $\upchi _{\Omega \mathit{ij}}$ . A continuous image is a function $u:\ \varOmega \rightarrow \mathbb{R}$ .

We emphasize that the measures for comparing images, presented below, can be applied in a straightforward way to higher-dimensional domains, for example, voxel data. However, here, for the sake of simplicity of notation and readability, we restrict attention to a two-dimensional squared domain Ω. Even more, we restrict attention to intensity data and do not consider vector-valued data, such as color images or tensor data. By this restriction, we exclude, for instance, feature-based intensity measures.

2 Distance Measures

In the following, we review distance measures for comparing discrete and continuous images. We review the standard and a morphological distance measure; both of them are deterministic. Moreover, based on the idea to consider images as random variable, we consider in the last two subsections two statistical approaches.

Deterministic Pixel Measure

The most widely used distance measures for discrete and continuous images are the l ^p, L ^pdistance measures, respectively, in particular p = 2; see, for instance, the chapter Linear Inverse Problems in this handbook. There, two discrete images U ₁ and U ₂ are similar, if

$\displaystyle{\begin{array}{ll} &\left \|U_{1} - U_{2}\right \|_{p}:= \left (\sum \nolimits _{i=1}^{N}\sum \nolimits _{j=1}^{N}\left \vert U_{1}^{\mathit{ij}} - U_{2}^{\mathit{ij}}\right \vert ^{p}\right )^{\frac{1} {p} },\quad 1 \leq p < \infty, \\ &\qquad \quad \left \|U_{2} - U_{2}\right \|_{\infty }:=\sup _{i,j=1,\ldots,N}\left \vert U_{1}^{\mathit{ij}} - U_{2}^{\mathit{ij}}\right \vert,\quad p = \infty,\\ \end{array} }$

respectively, is small. Two continuous images $u_{1},\ u_{2}:\ \varOmega \rightarrow \mathbb{R}$ are similar if

$\displaystyle{\begin{array}{ll} &\left \|u_{1} - u_{2}\right \|_{p}:= \left (\int _{\Omega }\vert u_{1}(x) - u_{2}(x)\vert ^{p},\mathit{dx}\right )^{\frac{1} {p} }\quad 1 \leq p < \infty, \\ &\quad \left \|u_{2} - u_{2}\right \|_{\infty }:= \text{ess}\;\sup _{x,y}\vert u_{1}(x) - u_{2}(x)\vert,\,\,\,\,p = \infty,\\ \end{array} }$

is small. Here, ess sup denotes the essential supremum.

Morphological Measures

In this subsection, we consider continuous images $u_{i}:\ \varOmega \rightarrow [0,\ 255],\ i = 1,\ 2$ . u ₁ and u ₂ are morphologically equivalent (Fig. 1), if there exists a one-to-one gray value transformation $\upbeta:\ [0,\ 255] \rightarrow [0,\ 255]$ , such that

$\displaystyle{\beta \;\circ \;u_{1} = u_{2}.}$

Level sets of a continuous function u are defined as

$\displaystyle{\Omega _{t}(u):=\{ x \in \Omega: u(x) = t\}.}$

Fig. 1

The gray values of the images are completely different, but the images u ₁, u ₂ have the same morphology

The level sets $\varOmega _{\mathbb{R}}(u):=\{\varOmega _{t}(u):\ t \in [0,\ 255]\}$ form the objects of an image that remain invariant under gray value transformations. The normal field (Gauss map) is given by the normals to the level lines and can be written as

$\displaystyle{\begin{array}{ll} \mathbf{n}(u): \quad &\Omega \quad \rightarrow \quad \mathbb{R}^{d} \\ &x\quad \mapsto \quad \left \{\begin{array}{*{20}l} 0 &\text{if}\;\nabla u(x) = 0 \\ \frac{\nabla u(x)} {\left \|\nabla u(x)\right \|}&\text{else}.\\ \end{array} \right. \\ \end{array} }$

Droske and Rumpf [7] consider images as similar, if intensity changes occur at the same locations. Therefore, they compare the normal fields of the images with the similarity measure

$\displaystyle{ \mathcal{S}_{g}(u_{1},u_{2}) =\int _{\Omega }g(\mathbf{n}(u_{1})(x),\mathbf{n}(u_{2})(x))\,\mathit{dx}, }$

(2)

where they choose the function $g: \mathbb{R}^{2} \times \mathbb{R}^{2} \rightarrow \mathbb{R} \geq 0$ appropriately. The vectors $\mathbf{n}(u_{1})(x),\ \mathbf{n}(u_{2})(x)$ form an angle that is minimal if the images are morphologically equivalent. Therefore, an appropriate choice of the function g is an increasing function of the minimal angle between v ₁, v ₂, and $v_{1},\ -v_{2}$ . For instance, setting g to be the cross or the negative dot product, we obtain

$\displaystyle{\begin{array}{ll} \mathcal{S}_{\times }(u_{1},u_{2}) =&\frac{1} {2}\int _{\Omega }\vert \mathbf{n}(u_{1})(x) \times \mathbf{n}(u_{2})(x)\vert ^{2}\,\mathit{dx} \\ \mathcal{S}_{\circ }(u_{1},u_{2}) = &\frac{1} {2}\int _{\Omega }(1 -\mathbf{n}(u_{1})(x) \cdot \mathbf{n}(u_{2})(x))^{2}\,\mathit{dx}.\\ \end{array} }$

(The vectors n have to be embedded in $\mathbb{R}^{3}$ in order to calculate the cross product.)

Example 1.

Consider the following scaled images $u_{i}\:\ [0,\ 1]^{2} \rightarrow [0,\ 1]$ ,

$\displaystyle{u_{1}(x) = x_{1}x_{2},\;\;u_{2}(x) = 1 - x_{1}x_{2},\;\;u_{3}(x) = (1 - x_{1})x_{2},}$

with gradients

$\displaystyle{\nabla u_{1}(x) = \left (\begin{array}{*{20}c} x_{2} \\ x_{1}\\ \end{array} \right ),\quad \nabla u_{2}(x) = \left (\begin{array}{*{20}c} -x_{2} \\ -x_{1}\\ \end{array} \right ),\quad \nabla u_{3}(x) = \left (\begin{array}{*{20}c} -x_{2} \\ 1 - x_{1}\\ \end{array} \right ).}$

With $g(u,v):= \frac{1} {2}\vert u_{1}v_{2} - u_{2}v_{1}\vert$ , the functional $\mathcal{S}_{g}$ defined in (2) attains the following values for the particular images:

$\displaystyle{\begin{array}{ll} \mathcal{S}_{g}(u_{1},u_{2})& = \frac{1} {2}\int _{\Omega }\vert -x_{2}x_{1} + x_{2}x_{1}\vert \;dx = 0 \\ \mathcal{S}_{g}(u_{2},u_{3})& = \frac{1} {2}\int _{\Omega }\vert x_{2}x_{1} + x_{2}x_{1}\vert \;dx = \frac{1} {4} \\ \mathcal{S}_{g}(u_{3},u_{1})& = \frac{1} {2}\int _{\Omega }\vert -x_{2}x_{1} - (1 - x_{1})x_{2}\vert \;dx = \frac{1} {4}.\\ \end{array} }$

The similarity measure indicates that u ₁ and u ₂ are morphologically identical.

The normalized gradient field is set valued in regions where the function is constant. Therefore, the numerical evaluation of the gradient field is highly unstable. To overcome this drawback, Haber and Modersitzki [15] suggested to use regularized normal gradient fields:

$\displaystyle{\begin{array}{ll} \mathbf{n}_{\epsilon }(u): \quad \Omega \ & \rightarrow \quad \mathbb{R}^{d} \\ \quad \ \ \quad \quad \;\;x &\mapsto \quad \frac{\nabla u(x)} {\left \|\nabla u(x)\right \|_{\epsilon }}\\ \end{array} }$

where $\left \|v\right \|_{\epsilon }:= \sqrt{v^{T } v +\epsilon ^{2}}$ for every $v \in \mathbb{R}^{d}$ . The parameter ε is connected to the estimated noise level in the image. In regions where ε is much larger than the gradient, the regularized normalized fields n _ε(u) are almost zero and therefore do not have a significant effect of the measures $\mathcal{S}_{\times }$ or $\mathcal{S}_{{\mathrm{o}}}$ , respectively. However, in regions where ε is much smaller than the gradients, the regularized normal fields are close to the non-regularized ones (Fig. 2).

Fig. 2

Top: images u ₁, u ₂, u ₃. Bottom: n(u ₁), n(u ₂), n(u ₃)

Statistical Distance Measures

Several distance measures for pairs of images can be motivated from statistics by considering the images as random variables. In the following, we analyze discrete images from a statistical point of view. For this purpose, we need some elementary statistical definitions. Applications of the following measures are mentioned in section “Morphological Measures”:

Correlation Coefficient:

$\displaystyle{\bar{U}:= \frac{1} {N^{2}}\sum \limits _{i,j=1}^{N}U^{\mathit{ij}}\quad \text{and}\quad \text{Var}(U) =\sum \limits _{ i,j=1}^{N}(U^{\mathit{ij}} -\bar{ U})^{2}}$

denote the mean intensity and variance of the discrete image U.

$\displaystyle{\text{Cov}(U_{1},U_{2}) =\sum \limits _{ i=1}^{N}\sum \limits _{ j=1}^{N}\left (U_{ 1}^{\mathit{ij}} -\bar{ U}_{ 1}\right )\left (U_{2}^{\mathit{ij}} -\bar{ U}_{ 2}\right )}$

denotes the covariance of two images U ₁ and U ₂, and the correlation coefficient is defined by

$\displaystyle{\rho (U_{1},U_{2}) = \frac{\text{Cov}(U_{1},U_{2})} {\sqrt{\text{Var} (U_{1 } )\;\text{Var} (U_{2 } )}}.}$

The correlation coefficient is a measure of linear dependence of two images. The range of the correlation coefficient is [−1, 1], and if $\rho (U_{1},\ U_{2})$ is close to one, then it indicates that U ₁ and U ₂ are linearly dependent.

Correlation Ratio: In statistics, the correlation ratio is used to measure the relationship between the statistical dispersion within individual categories and the dispersion across the whole population. The correlation ratio is defined by

$\displaystyle{\eta (U_{2}\vert U_{1}) = \frac{\text{Var}(E(U_{2}\vert U_{1}))} {\text{Var}(U_{2})},}$

where $E(U_{2}\vert U_{1})$ is the conditional expectation of U ₂ subject to U ₁.

To put this into the context of image comparison, let

$\displaystyle{\Omega _{t}(U_{1}):= \left \{(i,j)\vert U_{1}^{\mathit{ij}} = t\right \}}$

be the discrete level set of intensity t ∈ {0, …, 255}. Then the expected value of U ₂ on the t-th level set of U ₁ is given by

$\displaystyle{E(U_{2}\vert U_{1} = t):= \frac{1} {\#(\Omega _{t}(U_{1}))}\sum \limits _{\Omega _{t}(U_{1})}U_{2}^{\mathit{ij}},}$

where $\#(\varOmega _{t}(U_{1}))$ denotes the number of pixels in U ₁ with gray value t. Moreover, the according conditional variance is defined by

$\displaystyle{V (U_{2}\vert U_{1} = t) = \frac{1} {\#(\Omega _{t}(U_{1}))}\sum \limits _{\Omega _{t}(U_{1})}\left (U_{2}^{\mathit{ij}} - E(U_{ 2}\vert U_{1} = t)\right )^{2}.}$

The function

$\displaystyle{\begin{array}{lll} &H(U_{1}):\{ 0,\ldots,255\}& \rightarrow \mathbb{N} \\ & \qquad \qquad \qquad \qquad t &\mapsto \#(\Omega _{t}(U_{1}))\\ \end{array} }$

is called the discrete histogram of U ₁.

The correlation ratio is nonsymmetric, that is, $\upeta (Y \vert X)\neq \upeta (X\vert Y )$ , and takes values in [0, 1]. It is a measure of (non)linear dependence between two images. If $U_{1} = U_{2}$ , then the correlation ratio is maximal.

Variance of Intensity Ratio, Ratio Image Uniformity: This measure is based on the definition of similarity that two images are similar, if the factor $R^{\mathit{ij}}\ (U_{1},\ U_{2}) = U_{1}^{\mathit{ij}}/U_{2}^{\mathit{ij}}$ has a small variance. The ratio image uniformity (or normalized variance of the intensity ratio) can be calculated by

$\displaystyle{{\mathrm{RIU}}(U_{1},U_{2}) = \frac{{\mathrm{Var}}(R)} {\bar{R}}.}$

It is not symmetric.

Example 2.

Consider the discrete images U ₁, U ₂, and U ₃ in Fig. 3. Table 1 shows a comparison of the different similarity measures. The variance of the intensity ratio is insignificant and therefore cannot be used to determine similarities. The correlation ratio is maximal for the pairing U ₁, U ₂, and in fact there is a functional dependence of the intensity values of U ₁ and U ₂. However, the dependence of the intensity values of U ₁ and U ₂ is nonlinear; hence, the absolute value of the correlation coefficient (measure of linear dependence) is close to one, but not identical to one.

Fig. 3

Images for Examples 2 and 6. Note that there is a dependence between U ₁ and U ₂: $U_{2} \sim 11 - (U_{1})^{3}$

Table 1

Comparison of the different pixel-based similarity measures. The images U ₁, U ₂ are related in a nonlinear way; this is reflected in a correlation ratio of 1. We see that the variance of intensity ratio is not symmetric and not significant to make a statement on a correlation between the images

	U ₁, U ₂	U ₂, U ₁	U ₂, U ₃	U ₃, U ₂	U ₃, U ₁	U ₁, U ₃
Correlation coefficient	−0.98	−0.98	0.10	0.10	-0.14	-0.14
Correlation ratio	1.00	1.00	0.28	0.32	0.29	0.64
Variance of intensity ratio	1.91	2.87	2.25	1.92	3.06	0.83

Statistical Distance Measures (Density Based)

In general, two images of the same object but of different modalities have a large L ^p, l ^pdistance. Hence, the idea is to apply statistical tools that consider images as similar if there is some statistical dependence. Statistical similarity measures are able to compare probability density functions. Hence, we first need to relate images to density functions. Therefore, we consider an image as a random variable. The basic terminology of random variables is as follows:

Definition 1.

A continuous random variable is a real-valued function $X:\varOmega ^{S} \rightarrow \mathbb{R}$ defined on the sample space Ω ^S. For a sample x, X(x) is called observation.

Remark 1 (Images as Random Variables).

When we consider an image $u:\varOmega \rightarrow \mathbb{R}$ as a continuous random variable, the sample space is Ω. For a sample x ∈ Ω, the observation u(x) is the intensity of u at x.

Regarding the intensity values of an image as an observation of a random process allows us to compare images via their intrinsic probability densities. Since the density cannot be calculated directly, it has to be estimated. This is outlined in section “Density Estimation”. There exists a variety of distance measures for probability densities (see, for instance, [31]). In particular, we review f-divergences in section “Csiszár-Divergences (f-Divergences)” and explain how to use the f-information as an image similarity measure in section “f-Information”.

Density Estimation

This section reviews the problem of density estimation, which is the construction of an estimate of the density function from the observed data.

Definition 2.

Let $X:\varOmega ^{S} \rightarrow \mathbb{R}$ be a random variable, that is, a function mapping the (measurable) sample space Ω ^Sof a random process to the real numbers.

The cumulated probability density function of X is defined by

$\displaystyle{P(t):= \frac{1} {{\mathrm{meas}}(\Omega ^{S})}{\mathrm{meas}}\{x: X(x) \leq t\}\;t \in \mathbb{R}.}$

The probability density function p is the derivative of P.

The joint cumulated probability density function of two random variables X ₁, X ₂ is defined by

$\displaystyle{\hat{P}(t_{1},t_{2}):= \frac{1} {{\mathrm{meas}}(\Omega ^{S})^{2}}{\mathrm{meas}}\{(x_{1},x_{2}): X_{1}(x_{1}) \leq t_{1},X_{2}(x_{2}) \leq t_{2}\}\;t_{1},t_{2}\, \in \mathbb{R}.}$

The joint probability density function $\hat{p}$ satisfies

$\displaystyle{\hat{P}(t_{1},t_{2}) =\int _{ 0}^{t_{1} }\int _{0}^{t_{2} }\hat{p}(s_{1},s_{2})\mathit{ds}_{1}\;\mathit{ds}_{2}.}$

Remark 2.

When we consider an image $u:\ \varOmega \rightarrow \mathbb{R}$ a random variable with sample space Ω, we write p(u)(t) for the probability density function of the image u. For the joint probability of two images u ₁ and u ₂, we write $\hat{p}(u_{1},u_{2})(t_{1},t_{2})$ to emphasize, as above, that the images are considered as random variables.

The terminology of Definition 2 is clarified by the following one-dimensional example:

Example 3.

Let Ω: = [0, 1] and

$\displaystyle{\begin{array}{lll} &u: \Omega \ & \rightarrow [0,255]\\ &\ \ \quad x\ &\rightarrow 255x^{2 }.\\ \end{array} }$

The cumulated probability density function

$P:\ [0,\ 255] \rightarrow [0,\ 1]$

is obtained by integration:

$\displaystyle{P(t):= \text{meas}\{x\,:\,255\;x^{2} \leq t\} = \text{meas}\left \{x\,:\,x \leq \sqrt{ \frac{t} {255}}\right \} =\int _{ 0}^{\sqrt{ \frac{t} {255}} } 1\;\mathit{dx} =\sqrt{ \frac{t} {255}}.}$

The probability density function of u is given by the derivative of P, which is

$\displaystyle{p(u)(t) = \frac{1} {2\sqrt{255}} \frac{1} {\sqrt{t}}.}$

In image processing, it is common to view the discrete image U(or u _Nas in (1)) as an approximation of an image u. We aim for the probability density function of u, which is approximated via kernel density estimation using the available information of u, which is U. A kernel histogram is the normalized probability density function according to the discretized image U, where for each pixel a kernel function (see (3)) is superimposed. Kernel functions depend on a parameter, which can be used to control the smoothness of the kernel histogram.

We first give a general definition of kernel density estimation:

Definition 3 (Kernel Density Estimation).

Let t ₁, $t_{2},\ \ldots,\ t_{M}$ be a sample of M independent observations from a measurable real random variable X with probability density function p. A kernel density approximation at t is given by

$\displaystyle{p_{\sigma }(t) = \frac{1} {M}\sum \nolimits _{i=1}^{M}k_{\sigma }(t - t_{ i}),\quad t \in [0,255]}$

where $k_{\upsigma }$ is a kernel function with bandwidth $\upsigma$ . $p_{\upsigma }$ is called kernel density approximation with parameter $\upsigma$ .

Let $t_{1},t_{2},\ldots,t_{M}$ and $s_{1},s_{2},\ldots,s_{M}$ be samples of M independent observations from measurable real random variables $X_{1},X_{2}$ with joint probability density function $\hat{p}$ ; then a joint kernel density approximation of $\hat{p}$ is given by

$\displaystyle{\hat{p}_{\sigma }(s,t) = \frac{1} {M}\sum \nolimits _{i=1}^{M}K_{\sigma }(s - s_{ i},t - t_{i}),}$

where $K_{\upsigma }(s,\ t)$ is a two-dimensional kernel function.

Remark 3 (Kernel Density Estimation of an Image, Fig. 4).

Let u be a continuous image, which is identified with a random variable. Moreover, let U be N × N samples of u. In analogy to Definition 3, we denote the kernel density estimation based on the discrete image U, by

$\displaystyle{p_{\sigma }(t) = \frac{1} {N^{2}}\sum \nolimits _{i,j=1}^{N}k_{\sigma }(t - U^{\mathit{ij}})}$

and remark that for u _Nas in (1)

$\displaystyle{ p_{\sigma }(u_{N})(t):=\int _{\Omega }k_{\sigma }(t - u_{N}(x))\mathit{dx} = \frac{1} {N^{2}}\sum \nolimits _{i,j=1}^{N}k_{\sigma }(t - U^{\mathit{ij}}). }$

(3)

Fig. 4

Density estimate for different parameters σ

The joint kernel density of two images u ₁, u ₂ with observations U ₁ and U ₂ is given by

$\displaystyle{\hat{p}_{\sigma }(s,t) = \frac{1} {N^{2}}\sum \nolimits _{i,j=1}^{N}K_{\sigma }\left (s - U_{ 1}^{\mathit{ij}},t - U_{ 2}^{\mathit{ij}}\right ),}$

where $K_{\upsigma }(s,\ t) = k_{\upsigma }(s)k_{\upsigma }(t)$ is the two-dimensional kernel function. Moreover, we remark that for $u_{1,N}$ , $u_{{2,}_{N}}$

$\displaystyle\begin{array}{rcl} \hat{p}_{\sigma }(u_{1,N},u_{2,N})(s,t)&:=& \int _{\Omega }K_{\sigma }(s - u_{1,N}(x),t - u_{2,N}(x))\;\mathit{dx} {}\\ & =& \frac{1} {N^{2}}\sum \nolimits _{i,j=1}^{N}K_{\sigma }\left (s - U_{ 1}^{ij},t - U_{ 2}^{ij}\right ). {}\\ \end{array}$

In the following, we review particular kernel functions and show that standard histograms are kernel density estimations.

Example 4.

Assume that $u_{i}:\ \varOmega \rightarrow [0,\ 255],\ i = 1,\ 2$ are continuous images, with discrete approximations u _i, Nas in (1):

We use the joint density kernel $K_{\upsigma }(s,\ t):= k_{\upsigma }(s)k_{\upsigma }(t)$ , where $k_{\upsigma }$ is the normalized Gaussian kernel of variance $\upsigma$ . Then for i = 1, 2, the estimates for the marginal densities are given by

$\displaystyle{p_{\sigma }(u_{i,N})(t):=\int _{\Omega }k_{\sigma }(u_{i,N}(x) - t)\;\mathit{dx} = \frac{1} {\sqrt{2\pi }\sigma }\int _{\Omega }\exp \left (\frac{-(u_{i,N}(x) - t)^{2}} {2\sigma ^{2}} \right )\;\mathit{dx},}$
and the joint density approximation reads as follows:

$\displaystyle{\begin{array}{ll} \hat{p}_{\sigma }(s,t):& =\int _{\Omega }K_{\sigma }((u_{1}(x),u_{2}(x)) - (s,t))\;\mathit{dx} \\ & = \frac{1} {2\pi \sigma ^{2}} \int _{\Omega }\exp \left (\frac{-(u_{1,N}(x)-s)^{2}} {2\sigma ^{2}} \right )\exp \left (\frac{-(u_{2,N}(x)-t)^{2}} {2\sigma ^{2}} \right )\mathit{dx}\\ \end{array} }$
Histograms: Assume that U only takes values in 0, 1, …

Only gold members can continue reading. Log In or Register to continue
Share this:
Click to share on Twitter (Opens in new window)
Click to share on Facebook (Opens in new window)
Related posts:

Segmentation with Shape Priors: Explicit Versus Implicit Representations Transform in Astronomical Data Processing Imaging Algorithms from a Non-stochastic Perspective Scattering Impedance Tomography

Stay updated, free articles. Join our Telegram channel