Minimization Methods

is represented by data $v \in \mathbb{R}^{q}$ according to an observation model, called also forward model

$\displaystyle{v = A(u_{o})\ \mbox{ with noise}, }$

(1)

where $A: \mathbb{R}^{p} \rightarrow \mathbb{R}^{q}$ is a (linear or nonlinear) transform. When u is an m × n image, its pixels are arranged columnwise into a p-length real vector, where p = mn and the original u[i, j] is identified with

. Some typical applications are, for instance, denoising, deblurring, segmentation, zooming and super-resolution, reconstruction in inverse problems, coding and compression, feature selection, and compressive sensing. In all these cases, recovering a good estimate ${\hat{u}}$ for u _oneeds to combine the observation along with a prior and desiderata on the unknown u _o. A common way to define such an estimate is

$\displaystyle\begin{array}{rcl} \mbox{ Find}\ \ {\hat{u}}\ \ \mbox{ such that}\ \ \ \ \mathcal{F}(\hat{u},v)& =& \min _{u\in U}\mathcal{F}(u,v),\ \ \ \ \ \ \ \ \ \ \ \ {}\end{array}$

(2)

$\displaystyle\begin{array}{rcl} \mathcal{F}(u,v)& =& \Psi (u,v) +\beta \Phi (u),{}\end{array}$

(3)

where $\mathcal{F}: \mathbb{R}^{p} \times \mathbb{R}^{q} \rightarrow \mathbb{R}$ is called an energy (or an objective), $U \subset \mathbb{R}^{p}$ is a set of constraints, $\Psi$ is a data-fidelity term, $\Phi$ brings prior information on u _o, and β > 0 is a parameter which controls the trade-off between $\Psi$ and $\Phi$ .

The term $\Psi$ ensures that $\hat{u}$ satisfies (1) quite faithfully according to an appropriate measure. The noise n is random and a natural way to derive $\Psi$ from (1) is to use probabilities; see, e.g., [7, 32, 37, 56]. More precisely, if $\pi (v\vert u)$ is the likelihood of data v, the usual choice is

$\displaystyle{ \Psi (u,v) = -\log \pi (v\vert u). }$

(4)

For instance, if A is a linear operator and

where n is additive independent and identically distributed (i.i.d.) zero-mean Gaussian noise, one finds that

$\displaystyle{ \Psi (u,v) \propto \| Au - v\|_{2}^{2}. }$

(5)

This remains quite a common choice partly because it simplifies calculations.

The role of $\Phi$ in (3) is to push the solution to exhibit some a priori known or desired features. It is called prior or regularization or penalty term. In many image processing applications, $\Phi$ is of the form

$\displaystyle{ \Phi (u) =\sum _{ i=1}^{r}\phi (\|{\mathrm{D}_{i}}u\|), }$

(6)

where for any i ∈ { 1, …, r}, ${\mathrm{D}_{i}}: \mathbb{R}^{p} \rightarrow \mathbb{R}^{s}$ , for s an integer $s\geqslant 1$ , are linear operators and $\|\cdot \|$ is usually the ℓ ₁ or the ℓ ₂ norm. For instance, the family $\{{\mathrm{D}_{i}}\} \equiv \left \{{\mathrm{D}_{i}}:\in \{ 1,\ldots,r\}\right \}$ can represent the discrete approximation of the gradient or the Laplacian operator on u or the finite differences of various orders or the combination of any of these with the synthesis operator of a frame transform or the vectors of the canonical basis of $\mathbb{R}^{r}$ . Note that s = 1 if {D_i} are finite differences or a discrete Laplacian; then

$\displaystyle{s = 1\ \ \ \Rightarrow \ \ \ \phi (\|{\mathrm{D}_{i}}u\|) =\phi (\vert {\mathrm{D}_{i}}u\vert ).}$

And if {D_i} are the basis vectors of $\mathbb{R}^{r}$ , one has ϕ( | D_iu | ) = ϕ( | u[i] | ). In (6), $\phi: \mathbb{R}_{+}\mapsto \mathbb{R}$ is quite a “general” function, often called a potential function (PF). A very standard assumption is that

$\phi: \mathbb{R}_{+} \rightarrow \mathbb{R}$ is proper, lower semicontinuous (l.s.c.) and increasing on $\mathbb{R}_{+}$ , with ϕ(t) > ϕ(0) for any t > 0.

Some typical examples for ϕ are given in Table 1 and their plots in Fig. 1.

Table 1

Commonly used PFs $\phi: \mathbb{R}_{+} \rightarrow \mathbb{R}$ where α > 0 is a parameter. Note that among the nonconvex PFs, (f8), (f10), and (f12) are coercive, while the remaining PFs, namely, (f6), (f7), (f9), (f11), and (f13), are bounded. And all nonconvex PFs with ϕ′(0⁺) > 0 are concave on $\mathbb{R}_{+}$ . Recall that (f6) is the discrete equivalent of the Mumford-Shah (MS) prior [17, 72]

Convex PFs
	$\phi '(0^{+}) = 0$		ϕ′(0⁺) > 0
(f1)	$\phi (t) = t^{\alpha },\ 1 <\alpha \leqslant 2$	(f5)	ϕ(t) = t
(f2)	$\phi (t) = \sqrt{\alpha +t^{2}}$
(f3)	$\phi (t) =\log (\cosh (\alpha t))$
(f4)	$\phi (t) = t/\alpha -\log \left (1 + t/\alpha \right )$
Nonconvex PFs
	$\phi '(0^{+}) = 0$		ϕ′(0⁺) > 0
(f6)	ϕ(t) = min{α t ², 1}	(f10)	ϕ(t) = t ^α, 0 < α < 1
(f7)	$\phi (t) = \frac{\alpha t^{2}} {1 +\alpha t^{2}}$	(f11)	$\phi (t) = \frac{\alpha t} {1 +\alpha t}$
(f8)	$\phi (t) =\log (\alpha t^{2} + 1)$	(f12)	$\phi (t) =\log (\alpha t + 1)$
(f9)	$\phi (t) = 1 -\exp (-\alpha t^{2})$	(f13)	$\phi (0) = 0,\ \phi (t) = 1\ \mbox{ if}\ t\neq 0$

Fig. 1

Plots of the PFs given in Table 1. PFs with $\phi '(0^{+}) = 0$ (- – -), PFs with ϕ′(0⁺) > 0 (—)

Remark 1.

If ϕ′(0⁺) > 0 the function t → ϕ( | t | ) is nonsmooth at zero in which case $\Phi$ is nonsmooth on $\cup _{i=1}^{r}[w \in \mathbb{R}^{p}:\mathrm{ D}_{i}w = 0]$ . Conversely, $\phi '(0^{+}) = 0$ leads to a smooth at zero t → ϕ( | t | ). With the PF (f13), $\Phi$ leads to the counting function, commonly called the ℓ ₀-norm.

For the human vision, an important requirement is that the prior $\Phi$ promotes smoothing inside homogeneous regions but preserves sharp edges. According to a fine analysis conducted in the 1990s and summarized in [7], ϕ preserves edges if H1 holds as if H2, stated below, holds true as well:

$\lim\limits_{t\rightarrow \infty }\frac{\phi '(t)} {t} = 0.$

This assumption is satisfied by all PFs in Table 1 except for (f1) in case if α = 2. Note that there are numerous other heuristics for edge preservation.

Background

Energy minimization methods, as described here, are at the crossroad of several well-established methodologies that are briefly sketched below.

Bayesian maximum a posteriori (MAP) estimation using Markov random field (MRF) priors. Such an estimation is based on the maximization of the posterior distribution $\pi (u\vert v) =\pi (v\vert u)\pi (u)/Z,$ where π(u) is the prior model for u _oand Z = π(v) can be seen as a constant. Equivalently, $\hat{u}$ minimizes with respect to u the energy

$\displaystyle{\mathcal{F}(u,v) = -\ln \pi (v\vert u) -\ln \pi (u).}$
Identifying these first term above with $\Psi (\cdot,v)$ and the second one with $\Phi$ shows the basis of the equivalence. Classical papers on MAP energies using MRF priors are [14–16, 20, 51, 56]. Since the pioneering work of Geman and Geman [56], various nonconvex PFs ϕ were explored in order to produce images involving neat edges; see, e.g., [54, 55, 65]. MAP energies involving MRF priors are also considered in many books, such as [32, 53, 64]. For a pedagogical account, see [96].
Regularization for ill-posed inverse problems was initiated in the book of Tikhonov and Arsenin [93] in 1977. The main idea can be stated in terms of the stabilization of this kind of problems. Useful textbooks in this direction are, e.g., [61, 69, 94] and especially the recent [91]. This methodology and its most recent achievements are nicely discussed from quite a general point of view in Chapter Regularization Methods for Ill-Posed Problems in this handbook.
Variational methods are related to PDE restoration methods and are naturally developed for signals and images defined on a continuous subset $\Omega \subset \mathbb{R}^{d}$ , d = 1, 2, …; for images d = 2. Originally, the data-fidelity term is of the form (5) for A = Id and $\Phi (u) =\int _{\Omega }\phi (\|Du\|_{2})dx$ , where ϕ is a convex function as those given in Table 1 (top). Since the beginning of the 1990s, a remarkable effort was done to find heuristics on ϕ that enable to recover edges and breakpoints in restored images and signals while smoothing the regions between them; see, e.g., [7, 13, 26, 31, 59, 64, 73, 85, 87]. One of the most successful is the Total Variation (TV) regularization corresponding to ϕ(t) = t, which was proposed by Rudin, Osher, and Fatemi in [87]. Variational methods were rapidly applied along with data-fidelity terms $\Psi$ . The use of differential operators D ^kof various orders $k\geqslant 2$ in the prior $\Phi$ has been recently investigated; see, e.g., [22, 23]. More details on variational methods for image processing can be found in several textbooks like [3, 7, 91].

For numerical implementation, the variational functional is discretized and $\Phi$ takes the form of (6). Different discretization approaches are considered; see, e.g., [2, 27, 95]

The equivalence between these approaches has been considered in several seminal papers; see, e.g., [37, 63]. The state of the art and the relationship among all these methodologies are nicely outlined in the recent book of Scherzer et al. [91]. This book gives a brief historical overview of these methodologies and attaches a great importance to the functional analysis of the presented results.

The Main Features of the Minimizers as a Function of the Energy

Pushing curiosity ahead leads to various additional questions. One observes that frequently data fidelity and priors are modeled separately. It is hence necessary to check if the minimizer $\hat{u}$ of $\mathcal{F}(\cdot,v)$ obeys all information contained in the data model $\Psi$ as well as in the prior $\Phi$ . Hence the question: how the prior $\Phi$ and the data-fidelity $\Psi$ are effectively involved in $\hat{u}$ – a minimizer of $\mathcal{F}(\cdot,v)$ . This leads to formulate the following inverse modeling problem:

(7)

This problem was posed in a systematic way and studied since [74, 75]. The point of view provided by (7) is actually adopted by many authors. Problem (7) is totally general and involves crucial stakes:

It yields rigorous and strong results on the minimizers $\hat{u}$ .
Such a knowledge enables a real control on the solution – the reconstructed image or signal $\hat{u}$ .
Conversely, it opens new perspectives for modeling.
It enables the conception of specialized energies $\mathcal{F}$ that fulfill the requirements in applications.
This kind of results can help to derive numerical schemes using knowledge on the solutions.

Problem (7) remains open. The results presented here concern images, signals, and data living on finite grids. In this practical framework, the results in this chapter are quite general since they hold for energies $\mathcal{F}$ which can be convex or nonconvex or smooth or nonsmooth, and results address local and global minimizers.

Organization of the Chapter

Some preliminary notions and results that help the reading of the chapter are sketched in Sect. 2. Section 3 is devoted to the regularity of the (local) minimizers of $\mathcal{F}(\cdot,v)$ with a special focus on nonconvex regularization. Section 4 shows how edges are enhanced using nonconvex regularization. In Sect. 5 it is shown that nonsmooth regularization leads typically to minimizers that are sparse in the space spanned by {D_i}. Conversely, Sect. 6 exhibits that the minimizers relevant to nonsmooth data fidelity achieve an exact fit for numerous data samples. Section 7 considers results when both $\Psi$ and $\Phi$ are nonsmooth. Illustrations and applications are presented.

2 Preliminaries

In this section we set the notations and recall some classical definitions and results on minimization problems.

Notation

We systematically denote by $\hat{u}$ a (local) minimizer of $\mathcal{F}(\cdot,v)$ . It is explicitly specified when $\hat{u}$ is a global minimizer.

D _jⁿ– The differential operator of order n with respect to the jth component of a function.
v[i] – The ith entry of vector v.
# J – The cardinality of the set J.
J ^c= I∖J – The complement of J ⊂ I in I where I is a set.
$K^{\perp }$ – The orthogonal complement of a sub-vector space $K \subset \mathbb{R}^{n}$ .
A ^∗ – The transpose of a matrix (or a vector) where A is real valued.
$A \succ 0$ ( $A\succeq 0$ ) – The matrix A is positive definite (positive semi-definite)
$\mathbb{1}_{n} \in \mathbb{R}^{n}$ –The n-length vector composed of ones, i.e., $\mathbb{1}_{n}[i] = 1$ , $1\leqslant i\leqslant n$ .
$\mathbb{L}^{n}$ – The Lebesgue measure on $\mathbb{R}^{n}$ .
$\mathrm{Id}$ – The identity operator.
$\|.\|_{\rho }$ – A vector or a matrix ρ-norm.
$\mathbb{R}_{+}\stackrel{\mathrm{def}}{=}\{t \in \mathbb{R}\:\ t\geqslant 0\}$ and $$$\mathbb{R}_{+}^{{\ast}}\stackrel{\mathrm{def}}{=}\{t \in \mathbb{R}\:\ t > 0\}$$” src=”/wp-content/uploads/2016/04/A183156_2_En_5_Chapter_IEq89.gif”></SPAN>.</DIV><br /> <LI><br /> <DIV class=Para>TV – Total Variation.</DIV><br /> <LI><br /> <DIV class=Para>{<SPAN class=EmphasisTypeItalic>e</SPAN> <SUB>1</SUB>, <SPAN class=EmphasisTypeItalic>…</SPAN>, <SPAN class=EmphasisTypeItalic>e</SPAN> <SUB><SPAN class=EmphasisTypeItalic>n</SPAN> </SUB>} – The canonical basis of <SPAN id=IEq90 class=InlineEquation><IMG alt=$$\mathbb{R}^{n}$$ src=$ , i.e., e _i[i] = 1 and e _i[j] = 0 if i ≠ j.

Reminders and Definitions

Definition 1.

A function $\mathcal{F}: \mathbb{R}^{p} \rightarrow \mathbb{R}$ is coercive if $\lim\limits_{\|u\|\rightarrow \infty }\mathcal{F}(u) = +\infty$ .

A special attention being dedicated to nonsmooth functions, we recall some basic facts.

Definition 2.

Given $v \in \mathbb{R}^{q}$ , the function $\mathcal{F}(\cdot,v): \mathbb{R}^{p} \rightarrow \mathbb{R}$ admits at $\hat{u} \in \mathbb{R}^{p}$ a one-sided derivative in a direction $w \in \mathbb{R}^{p}$ , denoted $\delta _{1}\mathcal{F}(\hat{u},v)(w)$ , if the following limit exists:

$\displaystyle{\delta _{1}\mathcal{F}(\hat{u},v)(w) =\lim\limits_{t\searrow 0}\frac{\mathcal{F}(\hat{u} + tw,v) -\mathcal{F}(\hat{u},v)} {t},}$

where the index 1 in δ ₁ means that derivatives with respect to the first variable of $\mathcal{F}$ are addressed.

Here $\delta _{1}\mathcal{F}(\hat{u},v)(w)$ is a right-side derivative; the left-side derivative is $-\delta _{1}\mathcal{F}(\hat{u},v)(-w)$ . If $\mathcal{F}(\cdot,v)$ is differentiable at $\hat{u}$ , then $\delta _{1}\mathcal{F}(\hat{u},v)(w) = D_{1}\mathcal{F}(\hat{u},v)w$ where D ₁ stands for differential with respect to the first variable (see paragraph “Notation”). For $\phi: \mathbb{R}_{+} \rightarrow \mathbb{R}$ , we denote by ϕ′(t ⁻) and ϕ′(t ⁺) its left-side and right-side derivatives, respectively.

The classical necessary condition for a local minimum of a (nonsmooth) function is recalled [60, 86]:

Theorem 1.

If $\mathcal{F}(\cdot,v)$ has a local minimum at $\hat{u} \in \mathbb{R}^{p}$ , then $\delta _{1}\mathcal{F}(\hat{u},v)(w)\geqslant 0$ , for every $w \in \mathbb{R}^{p}$ .

If $\mathcal{F}(\cdot,v)$ is Fréchet differentiable at $\hat{u}$ , one finds $D_{1}\mathcal{F}(\hat{u},v) = 0$ .

Rademacher’s theorem states that if $\mathcal{F}$ is proper and Lipschitz continuous on $\mathbb{R}^{p}$ , then the set of points in $\mathbb{R}^{p}$ at which $\mathcal{F}$ is not Fréchet differentiable form a set of Lebesgue measure zero [60, 86]. Hence $\mathcal{F}(\cdot,v)$ is differentiable at almost every u. However, when $\mathcal{F}(\cdot,v)$ is nondifferentiable, its minimizers are typically located at points where $\mathcal{F}(\cdot,v)$ is nondifferentiable; see, e.g., Example 1 below.

Example 1.

Consider $\mathcal{F}(u,v) = \frac{1} {2}\|u - v\|^{2} +\beta \vert u\vert$ for β > 0 and $u,v \in \mathbb{R}$ . The minimizer $\hat{u}$ of $\mathcal{F}(\cdot,v)$ reads as

$$$\displaystyle{\hat{u} = \left \{\begin{array}{ccc} 0 &\mbox{ if}& \vert v\vert \leqslant \beta \\ v -\mathrm{ sign }(v)\beta &\mbox{ if} &\vert v\vert >\beta \end{array} \right.\ \ \ \ \ \mbox{ $\mathsf{(\hat{u}isshrunkw.r.t.v.)}$}}$$” src=”/wp-content/uploads/2016/04/A183156_2_En_5_Chapter_Equd.gif”></DIV></DIV></DIV>Clearly, <SPAN id=IEq123 class=InlineEquation><IMG alt=$$\mathcal{F}(\cdot,v)$$ src=$ is not Fréchet differentiable only at zero. For any $\vert v\vert \leqslant \beta$ , the minimizer of $\mathcal{F}(\cdot,v)$ is located precisely at zero.

The next corollary shows what can happen if the necessary condition in Theorem 1 fails.

Corollary 1.

Let $\mathcal{F}$ be differentiable on $(\mathbb{R}^{p} \times \mathbb{R}^{q})\setminus \Theta _{0}$ where

$$$\displaystyle{ \Theta _{0}\stackrel{\mathrm{def}}{=}\{(u,v) \in \mathbb{R}^{p} \times \mathbb{R}^{q}: \exists w \in \mathbb{R}^{p},\ -\delta _{ 1}\mathcal{F}(u,v)(-w) >\delta _{1}\mathcal{F}(u,v)(w)\}. }$$” src=”/wp-content/uploads/2016/04/A183156_2_En_5_Chapter_Equ8.gif”></DIV></DIV><br /> <DIV class=EquationNumber>(8)</DIV></DIV><SPAN class=EmphasisTypeItalic>Given</SPAN> <SPAN id=IEq128 class=InlineEquation><IMG alt=$ , if $\hat{u}$ is a (local) minimizer of $\mathcal{F}(\cdot,v)$ then

$\displaystyle{(\hat{u},v)\not\in \Theta _{0}.}$

Proof.

If $\hat{u}$ is a local minimizer, then by Theorem 1, $\delta _{1}\mathcal{F}(\hat{u},v)(-w)\geqslant 0$ , hence

$\displaystyle{ -\delta _{1}\mathcal{F}(\hat{u},v)(-w)\leqslant 0\leqslant \delta _{1}\mathcal{F}(\hat{u},v)(w),\ \ \forall w \in \mathbb{R}^{p}. }$

(9)

If $(\hat{u},v) \in \Theta _{0}$ , the necessary condition (9) cannot hold. □

Example 2.

Suppose that $\Psi$ in (3) is a differentiable function for any $v \in \mathbb{R}^{q}$ . For a finite set of positive numbers, say θ ₁, …, θ _k, suppose that the PF ϕ is differentiable on $\mathbb{R}_{+}\setminus \cup _{j=1}^{k}\theta _{j}$ and that

$$$\displaystyle{ \phi '\left (\theta _{j}^{-}\right ) >\phi ‘\left (\theta _{ j}^{+}\right ),\quad 1\leqslant j\leqslant k. }$$” src=”/wp-content/uploads/2016/04/A183156_2_En_5_Chapter_Equ10.gif”></DIV></DIV><br /> <DIV class=EquationNumber>(10)</DIV></DIV>Given a (local) minimizer <SPAN id=IEq137 class=InlineEquation><IMG alt=$$\hat{u}$$ src=$ , denote

$\displaystyle{I =\{ 1,\ldots,r\}\ \ \ \mbox{ and}\ \ \ I_{\hat{u}} =\{ i \in I:\|\mathrm{ D}_{i}\hat{u}\|_{2} =\theta _{j},1\leqslant j\leqslant k\}.}$

Define $F(\hat{u},v) = \Psi (\hat{u},v) +\beta \sum _{i\in I\setminus I_{\hat{u}}}\phi (\|{\mathrm{D}_{i}}\hat{u}\|_{2})$ , which is differentiable at $\hat{u}$ . Clearly, $\mathcal{F}(\hat{u},v) = F(\hat{u},v) + \beta \sum _{i\in I_{\hat{u}}}\phi (\|{\mathrm{D}_{i}}\hat{u}\|_{2})$ . Applying the necessary condition (9) for $w =\hat{ u}$ yields

$\displaystyle{\beta \sum _{i\in I_{\hat{u}}}\phi '\left (\|{\mathrm{D}_{i}}\hat{u}\|_{2}^{-}\right )\leqslant - D_{ 1}F(\hat{u},v)(\hat{u})\leqslant \beta \sum _{i\in I_{\hat{u}}}\phi '\left (\|{\mathrm{D}_{i}}\hat{u}\|_{2}^{+}\right ).}$

In particular, one has $\sum _{i\in I_{\hat{u}}}\phi '\left (\|{\mathrm{D}_{i}}\hat{u}\|_{2}^{-}\right )\leqslant \sum _{i\in I_{\hat{u}}}\phi '\left (\|{\mathrm{D}_{i}}\hat{u}\|_{2}^{+}\right )$ , which contradicts the assumption on ϕ′ in (10). It follows that if $\hat{u}$ is a (local) minimizer of $\mathcal{F}(\cdot,v)$ , then $I_{\hat{u}} = \varnothing$ and

$\displaystyle{\|{\mathrm{D}_{i}}\hat{u}\|_{2}\neq \theta _{j},\ \ 1\leqslant j\leqslant k,\quad \forall i \in I.}$

A typical case is the PF (f6) in Table 1, namely, ϕ(t) = min{α t ², 1}. Then k = 1 and $\theta _{1} = \frac{1} {\sqrt{\alpha }}$ .

The following existence theorem can be found, e.g., in the textbook [35].

Theorem 2.

For $v \in \mathbb{R}^{q}$ , let $U \subset \mathbb{R}^{p}$ be a nonempty and closed subset and $\mathcal{F}(\cdot,v): U \rightarrow \mathbb{R}$ a lower semicontinuous (l.s.c.) proper function. If U is unbounded (with possibly $U = \mathbb{R}^{p}$ ), suppose that $\mathcal{F}(\cdot,v)$ is coercive. Then there exists $\hat{u} \in U$ such that $\mathcal{F}(\hat{u},v) =\inf\limits_{u\in U}\mathcal{F}(u,v)$ .

This theorem gives only sufficient conditions for the existence of a minimizer. They are not necessary, as seen in the example below.

Example 3.

Let $\mathcal{F}: \mathbb{R}^{2} \times \mathbb{R}^{2} \rightarrow \mathbb{R}$ involve (f6) in Table 1 and read

$\displaystyle{\mathcal{F}(u,v) = (u[1] - v[1])^{2} +\beta \phi (\vert \,u[1] - u[2]\,\vert )\ \ \ \mbox{ for}\ \ \ \phi (t) =\max \{\alpha t^{2},1\},\ \ \ 0 <\beta < \infty.}$

For any v, $\mathcal{F}(\cdot,v)$ is not coercive since it is bounded by β in the direction spanned by $\left \{(0,u[2])\right \}$ . However, its global minimum is strict and is reached for $\hat{u}[1] =\hat{ u}[2] = v[1]$ with $\mathcal{F}(\hat{u},v) = 0$ .

To prove the existence of optimal solutions for more general energies, we refer to the textbook [9].

Most of the results summarized in this chapter exhibit the behavior of the minimizer points $\hat{u}$ of $\mathcal{F}(\cdot,v)$ under variations of v. In words, they deal with local minimizer functions.

Definition 3.

Let $\mathcal{F}: \mathbb{R}^{p} \times \mathbb{R}^{q} \rightarrow \mathbb{R}$ and $O \subseteq \mathbb{R}^{q}$ . We say that $\mathcal{U}: O \rightarrow \mathbb{R}^{p}$ is a local minimizer function for the family of functions $\mathcal{F}(\cdot,O) = \left \{\mathcal{F}(\cdot,v): v \in O\right \}$ if for any v ∈ O, the function $\mathcal{F}(\cdot,v)$ reaches a strict local minimum at $\mathcal{U}(v)$ .

When $\mathcal{F}(\cdot,v)$ is proper, l.s.c., and convex, the standard results below can be evoked; see [35, 49].

Theorem 3.

Let $\mathcal{F}(\cdot,v): \mathbb{R}^{p} \rightarrow \mathbb{R}$ be proper, convex, l.s.c., and coercive for every $v \in \mathbb{R}^{q}$ .

(i)

Then $\mathcal{F}(\cdot,v)$ has a unique (global) minimum which is reached for a closed convex set of minimizers $\left \{\hat{\mathcal{U}}(v)\right \}\stackrel{\mathrm{def}}{=}\left \{\hat{u} \in \mathbb{R}^{p}: \mathcal{F}(\hat{u},v) =\inf \limits _{u\in U}\mathcal{F}(u,v)\right \}$ .

(ii)

If in addition $\mathcal{F}(\cdot,v)$ is strictly convex, then there is a unique minimizer $\hat{u} = \mathcal{U}(v)$ (which is also global). So $\mathcal{F}(\mathbb{R}^{p},v)$ has a unique minimizer function $v\mapsto \mathcal{U}(v)$ .

The next lemma, which can be found, e.g., in [52], addresses the regularity of the local minimizer functions when $\mathcal{F}$ is smooth. It can be seen as a variant of the implicit functions theorem.

Lemma 1.

Let $\mathcal{F}$ be $\mathcal{C}^{m}$ , $m\geqslant 2$ , on a neighborhood of $(\hat{u},v) \in \mathbb{R}^{p} \times \mathbb{R}^{q}$ . Suppose that $\mathcal{F}(\cdot,v)$ reaches at $\hat{u}$ a local minimum such that $D_{1}^{2}\mathcal{F}(\hat{u},v) \succ 0$ . Then there are a neighborhood $O \subset \mathbb{R}^{q}$ containing v and a unique $\mathcal{C}^{m-1}$ local minimizer function $\mathcal{U}: O \rightarrow \mathbb{R}^{p}$ , such that $D_{1}^{2}\mathcal{F}(\mathcal{U}(\nu ),\nu ) \succ 0$ for every ν ∈ O and $\mathcal{U}(v) =\hat{ u}$ .

This lemma is extended in several directions in this chapter.

Definition 4.

Let $\phi: [0,+\infty ) \rightarrow \mathbb{R}$ and $m\geqslant 0$ an integer. We say that ϕ is $\mathcal{C}^{m}$ on $\mathbb{R}_{+}$ , or equivalently that $\phi \in \mathcal{C}^{m}(\mathbb{R}_{+})$ , if and only if $t\mapsto \phi (\vert t\vert )$ is $\mathcal{C}^{m}$ on $\mathbb{R}$ .

By this definition, ϕ′(0) = 0. In Table 1, left, $\phi \in \mathcal{C}^{1}(\mathbb{R}_{+})$ for (f1) if α < 2, $\phi \in \mathcal{C}^{2}(\mathbb{R}_{+})$ for (f4), while for (f2), (f3), and (f7)–(f9) we find $\phi \in \mathcal{C}^{\infty }(\mathbb{R}_{+})$ .

3 Regularity Results

Here, we focus on the regularity of the minimizers of $\mathcal{F}: \mathbb{R}^{p} \times \mathbb{R}^{q} \rightarrow \mathbb{R}$ of the form

$\displaystyle\begin{array}{rcl} \mathcal{F}(u,v)& = & \|Au - v\|_{2}^{2} +\beta \sum _{ i\in I}\phi (\|{\mathrm{D}_{i}}u\|_{2}), \\ I& \stackrel{\mathrm{def}}{=}& \{1,\ldots,r\}, {}\end{array}$

(11)

where $A \in \mathbb{R}^{q\times p}$ and for any i ∈ I we have ${\mathrm{D}_{i}} \in \mathbb{R}^{s\times p}$ for $s\geqslant 1$ . Let us denote by D the following rs × p matrix:

$\displaystyle{\mathrm{D}\stackrel{\mathrm{def}}{=}\left [\begin{array}{@{}c@{}} \mathrm{D}_{1}\\ \ldots \\ {\mathrm{D}_{r}} \end{array} \right ].}$

When A in (11) is not injective, a standard assumption in order to have regularization is

$\ker (A) \cap \ker (\mathrm{D}) =\{ 0\}$ .

H3 is trivial if $\mathrm{rank}\,A = p$ or rank D = p. Often, $\ker (\mathrm{D}) =\mathrm{ span}(\mathbb{1}_{p})$ and $A\mathbb{1}_{p}\neq 0$ , so H3 holds.

Some General Results

We first verify the conditions on $\mathcal{F}(\cdot,v)$ in (11) that enable Theorems 2 and 3 to be applied. Since H1 holds, $\mathcal{F}(\cdot,v)$ in ( 11 ) is l.s.c. and proper.

$\mathcal{F}(\cdot,v)$ in (11) is coercive for any $v \in \mathbb{R}^{q}$ at least in one of the following cases:

Rank(A) = p and $\phi: \mathbb{R}_{+}\mapsto \mathbb{R}_{+}$ is nondecreasing.
H1 and H3 hold and $\lim\limits_{t\nearrow \infty }\phi (t) = \infty$ (e.g., (f1)–(f5),(f8), (f10), and (f12) in Table 1).

By Theorem 2, $\mathcal{F}(\cdot,v)$ has minimizers.

For any $v \in \mathbb{R}^{q}$ , the energy $\mathcal{F}(\cdot,v)$ in (11) is convex and coercive if H1 and H3 hold for a convex ϕ. Then the claim in Theorem 3(3) holds true.

Further, $\mathcal{F}(\cdot,v)$ in (11) is strictly convex and coercive for any $v \in \mathbb{R}^{q}$ if $\phi$ satisfies H1 and if one of the following assumptions holds:

Rank(A) = p and ϕ is convex.
H3 holds and ϕ is strictly convex.

Then the claim in Theorem 3(3) holds. Further, if $\mathcal{F}$ is $\mathcal{C}^{m}$ for $m\geqslant 2$ , then the minimizer function $\mathcal{U}: \mathbb{R}^{q} \rightarrow \mathbb{R}^{p}$ (see Definition 3) is $\mathcal{C}^{m-1}$ by Lemma 1.

However, the PFs involved in (11) used for signal and image processing are often nonconvex, bounded, or nondifferentiable. One extension of the standard results is given in the next section.

Stability of the Minimizers of Energies with Possibly Nonconvex Priors

Related questions have been considered in critical point theory, sometimes in semi-definite programming; the well-posedness of some classes of smooth optimization problems was addressed in [42]. Other results have been established on the stability of the local minimizers of general smooth energies [52]. Typically, these results are quite abstract to be applied directly to energies of the form (11).

Here the assumptions stated below are considered.

The operator A in (11) satisfies rank A = p, i.e., A ^∗ A is invertible.

The PF ϕ in (11) is $\mathcal{C}^{0}(\mathbb{R}_{+})$ and $\mathcal{C}^{m}$ , $m\geqslant 2$ , on $\mathbb{R}_{+}^{{\ast}}$ with $0\leqslant \phi '(0^{+}) < \infty$ .

Under H1, H2, H4, and H5, the prior $\Phi$ (and hence $\mathcal{F}(\cdot,v)$ ) in (11) can be nonconvex and in addition nonsmooth. By H1 and H4, $\mathcal{F}(\cdot,v)$ in (11) admits a global minimum $v \in \mathbb{R}^{q}$ – see Item 1 in section “Some General Results.” However, $\mathcal{F}(\cdot,v)$ can present numerous local minima.

Energies $\mathcal{F}$ with nonconvex and possibly nondifferentiable PFs ϕ are frequently used in engineering problems since they were observed to give rise to high-quality solutions $\hat{u}$ . It is hence important to have good knowledge on the stability of the obtained solutions.

The results summarized in this section provide the state of the art for energies of the form (11).

Local Minimizers

The stability of local minimizers is an important matter in its own right for several reasons. Often, a nonconvex energy is minimized only locally, in the vicinity of some initial guess. Second, the minimization schemes that guarantee the finding of the global minimum of a nonconvex objective function are exceptional. The practically obtained solutions are usually only local minimizers.

The statements below are a simplified version of the results established in [44].

Theorem 4.

Let $\mathcal{F}(\cdot,v)$ in (11) satisfy H1, H2, H 4, and H 5. Then there exists a closed subset $\Theta \subset \mathbb{R}^{q}$ whose Lebesgue measure is $\mathbb{L}^{q}(\Theta ) = 0$ such that for any $v \in \mathbb{R}^{q}\setminus \Theta$ , there exists an open subset $O \subset \mathbb{R}^{q}$ with v ∈ O and a local minimizer function (see Definition 3 ) $\mathcal{U}: O \rightarrow \mathbb{R}^{p}$ which is $\mathcal{C}^{m-1}$ on O and fulfills $\hat{u} = \mathcal{U}(v)$ .

Since $\Theta$ is closed in $\mathbb{R}^{q}$ and $\mathbb{L}^{q}(\Theta) = 0$ , the stated properties are generic.

Commentary on the Assumptions

All assumptions H1, H2, and H5 bearing on the PF ϕ are nonrestrictive; they address all PFs in Table 1 except for (f13) which is discontinuous at zero. The assumption H4 cannot be avoided, as seen in Example 4.

Example 4.

Consider $\mathcal{F}: \mathbb{R}^{2} \times \mathbb{R} \rightarrow \mathbb{R}$ given by

$\displaystyle{\mathcal{F}(u,v) = \left (u[1] - u[2] - v\right )^{2} + \vert u[1]\vert + \vert u[2]\vert,}$

where $v \equiv v[1]$ . The minimum is obtained after a simple computation.

$\displaystyle\begin{array}{rcl} v > \frac{1} {2}\quad & & \hat{u} = \left (c,c – v + \frac{1} {2}\right )\ \ \mbox{ for any}\ \ c \in \left [0,v -\frac{1} {2}\right ]\ \ \ \mbox{ (nonstrict minimizer)}. {}\\ \vert v\vert \leq \frac{1} {2}\quad & & \hat{u} = 0\ \ \ \mbox{ (unique minimizer)} {}\\ v < -\frac{1} {2}\quad & & \hat{u} = \left (c,c - v -\frac{1} {2}\right )\ \ \mbox{ for any}\ \ c \in \left [v + \frac{1} {2},0\right ]\ \ \ \mbox{ (nonstrict minimizer)}. {}\\ \end{array}$

In this case, assumption H4 fails and there is a local minimizer function only for $v \in \left [-\frac{1} {2}, \frac{1} {2}\right ]$ .

Other Results

The derivations in [44] reveal several other practical results.

If $\phi \in \mathcal{C}^{2}(\mathbb{R}_{+})$ , see Definition 4, then $\forall v \in \mathbb{R}^{q}\setminus \Theta$ , every local minimizer $\hat{u}$ of $\mathcal{F}(u,v)$ is strict and $D_{1}^{2}\mathcal{F}(\hat{u},v) \succ 0$ . Consequently, Lemma 1 is extended since the statement holds true $\forall v \in \mathbb{R}^{q}\setminus \Theta$ .

For real data v – a random sample of $\mathbb{R}^{q}$ – whenever $\mathcal{F}(\cdot,v)$ is differentiable and satisfies the assumptions of Theorem 4 , it is a generic property that local minimizers $\hat{u}$ are strict and their Hessians $D_{1}^{2}\mathcal{F}(\hat{u},v)$ are positive definite.

Using Corollary 1, the statement of Theorem 4 holds true if $\phi '(0^{+}) = 0$ and if there is τ > 0 such that $$$\phi '(\tau ^{-}) >\phi ‘(\tau ^{+})$$” src=”/wp-content/uploads/2016/04/A183156_2_En_5_Chapter_IEq264.gif”></SPAN>. This is the case of the PF (f6) in Table <SPAN class=InternalRef><A href=$ 1.

If ϕ′(0⁺) > 0, define

$\displaystyle{ \hat{J}\stackrel{\mathrm{def}}{=}\left \{i \in I\:\ {\mathrm{D}_{i}}\hat{u} = 0\right \}\ \ \mbox{ and}\quad K_{\hat{J}}\stackrel{\mathrm{def}}{=}\left \{w \in \mathbb{R}^{p}\:\ {{\mathrm{D}_{i}}}w = 0,\ \forall i \in \hat{ J}\right \}. }$

(12)

Then $\forall v \in \mathbb{R}^{q}\setminus \Theta$ , every local minimizer $\hat{u}$ of $\mathcal{F}(u,v)$ is strict and

(a)

$D_{1}\mathcal{F}\vert _{K_{\hat{J}}}(\hat{u},v) = 0\ \ \mbox{ and}\ \ D_{1}^{2}\mathcal{F}\vert _{K_{\hat{J}}}(\hat{u},v) \succ 0$ – a sufficient condition for a strict minimum on $K_{\hat{J}}$ .

(b)

$$$\delta _{1}\mathcal{F}(\hat{u},v)(w) > 0,\ \ \forall w \in K_{\hat{J}}^{\perp }\setminus \{0\}$$” src=”/wp-content/uploads/2016/04/A183156_2_En_5_Chapter_IEq270.gif”></SPAN> – a sufficient condition for a strict minimum on <SPAN id=IEq271 class=InlineEquation><IMG alt=$ .

Here (a) and (b) provide a sufficient condition for a strict (local) minimum of $\mathcal{F}(\cdot,v)$ at $\hat{u}$ (a direct consequence of [80, Theorem 1]). These conditions are satisfied at the (local) minimizers $\hat{u}$ of $\mathcal{F}(\cdot,v)$ for every $v \in \mathbb{R}^{q}$ , except for a negligible subset of $\mathbb{R}^{q}$ , in which case Lemma 1 can be applied.

One can interpret these results as follows:

Under the assumptions H1, H2, H 4 , and H 5 , given real data $v \in \mathbb{R}^{q}$ , the chance to get a nonstrict (local) minimizer or a (local) minimizer of the energy in (11) that does not result from a $\mathcal{C}^{m-1}$ local minimizer function is null.

Global Minimizers of Energies with for Possibly Nonconvex Priors

The results on the global minimizers of (11) presented next are extracted from [45].

Theorem 5.

Assume that $\mathcal{F}(\cdot,v)$ in (11) satisfy H1, H2, H 4, and H 5. Then there exists a subset $\hat{\Theta } \subset \mathbb{R}^{q}$ such that $\mathbb{L}^{q}(\hat{\Theta }) = 0$ and the interior of $\mathbb{R}^{q}\setminus \hat{\Theta }$ is dense in $\mathbb{R}^{q}$ , and for any $v \in \mathbb{R}^{q}\setminus \hat{\Theta }$ the energy $\mathcal{F}(\cdot,v)$ has a unique global minimizer. Furthermore, the global minimizer function $\hat{\mathcal{U}}: \mathbb{R}^{q}\setminus \hat{\Theta } \rightarrow \mathbb{R}^{p}$ is $\mathcal{C}^{m-1}$ on an open subset of $\mathbb{R}^{q}\setminus \hat{\Theta }$ which is dense in $\mathbb{R}^{q}$ .

Otherwise said, in a real-world problem there is no chance of getting data v such that the energy $\mathcal{F}(\cdot,v)$ (11) has more than one global minimizer.

Nonetheless, $\hat{\Theta }$ plays a crucial role for the recovery of edges; this issue is developed in Sect. 4.

Nonasymptotic Bounds on Minimizers

The aim here is to give nonasymptotic analytical bounds on the local and the global minimizers $\hat{u}$ of $\mathcal{F}(\cdot,v)$ in (11) that hold for all PFs ϕ in Table 1. Related questions have mainly been considered in particular cases or asymptotically; see, e.g., [4, 71, 92]. In [51] the mean and the variance of the minimizers $\hat{u}$ for strictly convex and differentiable functions ϕ have been explored.

The bounds provided below are of practical interest for the initialization and the convergence analysis of numerical schemes. The statements given below are extracted from [82].

Bounds on the restored data.

One compares the “restored” data $A\hat{u}$ with the given data v.

Consider the alternative assumptions:

$\phi '(0^{+}) = 0$ and $\phi \in \mathcal{C}^{1}(\mathbb{R}_{+}\setminus \Theta _{0})$ where the set $$$\Theta _{0} = \left \{t > 0:\phi ‘(t^{-}) >\phi ‘(t^{+})\right \}$$” src=”/wp-content/uploads/2016/04/A183156_2_En_5_Chapter_IEq302.gif”></SPAN> <SPAN class=EmphasisTypeItalic>is at most finite.</SPAN> </DIV><br /> <LI><br /> <DIV class=Para><SPAN class=EmphasisTypeItalic>ϕ′(0</SPAN> <SUP>+</SUP> <SPAN class=EmphasisTypeItalic>) > 0 and ϕ is</SPAN> <SPAN id=IEq303 class=InlineEquation><IMG alt=$$\mathcal{C}^{1}$$ src=$ on $\mathbb{R}_{+}^{{\ast}}$ .

The set $\Theta _{0}$ allows us to address the PF given in (f6). Let us emphasize that under H1 and H6, the PF ϕ can be convex or nonconvex.

Theorem 6.

Consider $\mathcal{F}(\cdot,v)$ of the form (11) where H1, H 3, and H 6 hold. For every $v \in \mathbb{R}^{q}$ , if $\mathcal{F}(\cdot,v)$ has a (local) minimum at $\hat{u}$ , then

$\displaystyle{\|A\hat{u}\|_{2}\leqslant \|v\|_{2}.}$

Comments on the results.

This bound holds for every (local) minimizer of $\mathcal{F}(\cdot,v)$ . If A is a uniform tight frame (i.e., A ^∗ A = Id), one has

$\displaystyle{\|\hat{u}\|_{2}\leqslant \|v\|_{2}.}$

The mean of restored data.

In many applications, the noise corrupting the data can be supposed to have a mean equal to zero. When A = Id, it is well known that mean $(\hat{u}) =$ mean(v); see, e.g., [7]. However, for a general A one has

$\displaystyle{ A\mathbb{1}_{p} \propto\mathbb{1}_{q}\quad \Rightarrow \quad \mbox{ mean}(\hat{u}) = \mbox{ mean}(v). }$

(13)

The requirement $A\mathbb{1}_{p} \propto\mathbb{1}_{q}$ is quite restrictive. In the simple case when ϕ(t) = t ², $\ker (\mathrm{D}) =\mathbb{1}_{rs}$ and A is square and invertible, it is easy to see that this is also a sufficient condition. Finally, if $A\neq \mathrm{Id}$ , then generally mean $(\hat{u})\neq$ mean $$(v)$$

The residuals for edge-preserving regularization.

A bound on the data-fidelity term at a (local) minimizer $\hat{u}$ of $\mathcal{F}(\cdot,v)$ shall be given. The edge-preserving H2 (see Sect. 1) is replaced by a stronger edge-preserving assumption:

$\|\phi '\|_{\infty }\stackrel{\mathrm{def}}{=}\max \left \{\sup _{t\geqslant 0}\vert \phi '(t^{+})\vert,\,\sup _{ t>0}\vert \phi ‘(t^{-})\vert \right \} < \infty$ .

Except for (f1) and (f13), all other PFs in Table 1 satisfy H7. Note that when ϕ′(0⁺) > 0 and H7 hold, one usually has $\|\phi '\|_{\infty } =\phi '(0^{+})$ .

Theorem 7.

Let $\mathcal{F}(\cdot,v)$ be of the form (11) where $\mathrm{rank}\,(A) = q\leqslant p$ , and H1, H 3, H 6, and H 7 hold. For every $v \in \mathbb{R}^{q}$ , if $\mathcal{F}(\cdot,v)$ has a (local) minimum at $\hat{u}$ , then

$\displaystyle{ \|A\hat{u} - v\|_{\infty }\leqslant \frac{\beta } {2}\|\phi '\|_{\infty }\|(AA^{{\ast}})^{-1}A\|_{ \infty }\|\mathrm{D}\|_{1}. }$

(14)

Let us emphasize that the bound in (14) is independent of data v and that it is satisfied for any local or global minimizer $\hat{u}$ of $\mathcal{F}(\cdot,v)$ . (Recall that for a real matrix C with entries C[i, j], one has $\|C\|_{1} =\max _{j}\sum _{i}\vert C[i,j]\vert$ and $\|C\|_{\infty } =\max _{i}\sum _{j}\vert C[i,j]\vert$ ; see, e.g., [35].)

If $\mathrm{D}$ corresponds to a discrete gradient operator for a two-dimensional image, $\|\mathrm{D}\|_{1} = 4$ . If in addition A = Id, (14) yields

$\displaystyle{\|v -\hat{ u}\|_{\infty }\leqslant 2\beta \|\phi '\|_{\infty }.}$

The result of this theorem may seem surprising. In a statistical setting, the quadratic data-fidelity term $\|Au - v\|_{2}^{2}$ in (11) corresponds to white Gaussian noise on the data, which is unbounded. However, if ϕ is edge preserving according to H7, any (local) minimizer $\hat{u}$ of $\mathcal{F}(\cdot,v)$ gives rise to a noise estimate $(v - A\hat{u})[i]$ , $1\leqslant i\leqslant q$ that is tightly bounded as stated in (14).