Solution Methods

(1)

where with domain . The exposition will be mainly restricted to the case of and being Hilbert spaces with inner products and norms . Some references for the Banach space case will be given.

We will assume attainability of the exact data y in a ball , i.e., the equation F(x) = y is solvable in . The element x 0 is an initial guess which may incorporate a-priori knowledge of an exact solution.

The actually available data y δ will in practice usually be contaminated with noise for which we here use a deterministic model, i.e.,

(2)
where the noise level δ is assumed to be known. For a convergence analysis with stochastic noise, see the references in section “Further Literature on Gauss–Newton Type Methods”.

2 Preliminaries

Conditions on F

For the proofs of well-definedness and local convergence of the iterative methods considered here we need several conditions on the operator F. Basically, we inductively show that the iterates remain in a neighborhood of the initial guess. Hence, to guarantee applicability of the forward operator to these iterates, we assume that

(3)
for some ρ > 0.

Moreover, we need that F is continuously Fréchet-differentiable, that is uniformly bounded with respect to , and that problem (1) is properly scaled, i.e., certain parameters occurring in the iterative methods have to be chosen appropriately in dependence of this uniform bound.

The assumption that F is Lipschitz continuous,

(4)
that is often used to show convergence of iterative methods for well-posed problems, implies that

(5)
However, this Taylor remainder estimate is too weak for the ill-posed situation unless the solution is sufficiently smooth (see, e.g., case (ii) in Theorem 9 below). An assumption on F that can often be found in the literature on nonlinear ill-posed problems is the tangential cone condition

(6)
which implies that

for all . One can even prove (see [70, Proposition 2.1]).

Proposition 1.

Let
for some , where if .

(i)

Then for all

and for all . Moreover,

where instead of equality holds if .

(ii)

If F(x) = y is solvable in , then a unique x 0 -minimum-norm solution exists. It is characterized as the solution of F(x) = y in satisfying the condition

(7)

If F(x) = y is solvable in but a condition like (6) is not satisfied, then at least existence (but no uniqueness) of an x 0-minimum-norm solution is guaranteed provided that F is weakly sequentially closed (see [36, Chapter 10]).

For the proofs of convergence rates one even needs stronger conditions on F than condition (6).

Source Conditions

It is well known by now that the convergence of regularized solutions can be arbitrarily slow. Rates can only be proven if the exact solution satisfies some regularity assumptions, so-called source conditions. They are usually of Hölder-type, i.e.,

(8)
for some exponent μ > 0. Due to typical smoothing properties of the linearized forward operator , they can be interpreted as smoothness assumptions on the initial error .

Since (8) is usually too strong for severely ill-posed problems, logarithmic source conditions, i.e.,

56] (cf. [30, Theorem 2.7] for Landweber iteration, [54] for the iteratively regularized Gauss–Newton method IRGNM, and [55] for generalized IRGNM).

Stopping Rules

In the context of ill-posed problems it is essential to stop iterative solution methods according to an appropriate rule to avoid an unbounded growth of the propagated noise. There are two possibilities, either a-priori rules or a-posteriori rules. A-priori rules [see, e.g., (58)] are computationally very effective. However, the disadvantage is that one has to know the smoothness index μ in (8) or (9) explicitly.

This is avoided in a-posteriori stopping rules. The most well-known a-posteriori criterion is the so-called discrepancy principle, i.e., the iteration is stopped after k  = k (δ, y δ ) steps with

(10)
where τ > 1.

The Lepskii type balancing principle is an interesting alternative a-posteriori rule, see [6, 8] and (62) below in the context of iterative methods.

If the noise level δ in (2) is unknown, heuristic stopping rules such as the quasioptimality principle, the Hanke–Raus rule, or the L-curve criterion still lead to convergence under certain structural assumptions on the noise, see [7, 74, 75, 89].

One way to derive iterative regularization methods is to apply gradient methods to the minimization problem

Since the negative gradient of this functional is given by F (x)(yF(x)) and taking into account that only noisy data y δ are available, this yields methods of the form

(11)
where x 0 δ  = x 0 is an initial guess of the exact solution. Choosing the factor in a special way we obtain well-known methods like Landweber iteration, the steepest descent method, and the minimal error method.

Nonlinear Landweber Iteration

If one chooses to be constant, one obtains Landweber iteration. As already mentioned in the introduction of this chapter, well-definedness and convergence can only be proven if problem (1) is properly scaled. Without loss of generality we may assume that and that

(12)
The nonlinear Landweber iteration is then given as the method

(13)
We want to emphasize that for fixed iteration index k the iterate depends continuously on the data y δ , since is the result of a combination of continuous operations.

The results on convergence and convergence rates for this method presented here were established in [49] (see also [70]). To begin with, we formulate the following monotonicity property that gives us a clue how to choose the number τ in the stopping rule (10) (see [70, Proposition 2.2]).

Proposition 2.

Assume that the conditions (12) and (6) hold and that the equation F(x) = y has a solution . If , a sufficient condition for to be a better approximation of x than is that

.

In view of this proposition, the number τ in the stopping rule (10) should be chosen as

with η as in (6). To be able to prove that the stopping index k in (10) is finite and hence well defined it turns out that τ has to be chosen slightly larger (see [70, Corollary 2.3]), i.e.,

Corollary 1.

Let the assumptions of Proposition 2 hold and let k be chosen according to the stopping rule (10), (14) . Then

In particular, if y δ = y (i.e., if δ = 0), then

(15)

Note that (15) implies that if Landweber iteration is run with precise data y = y δ , then the residual norms of the iterates tend to zero as k → . That is, if the iteration converges, then the limit is necessarily a solution of F(x) = y. The following convergence result holds (see [70, Theorem 2.4]):

Theorem 1.

Assume that the conditions (12) and (6) hold and that the equation F(x) = y is solvable in . Then the nonlinear Landweber iteration applied to exact data y converges to a solution of F(x) = y. If for all , then x k converges to as k →∞.

We emphasize that, in general, the limit of the Landweber iterates is no x 0-minimum-norm solution. However, since the monotonicity result of Proposition 2 holds for every solution, the limit of x k has to be at least close to . As can be seen below, it has to be the closer the larger ρ can be chosen (Fig. 1).

Fig. 1
The sketch shows the initial element x 0, the x 0-minimum-norm solution , the subset and in bold the region, where the limit of the iterates x k can be

It is well known that, if y δ does not belong to the range of F, then the iterates of (13) cannot converge but still allow a stable approximation of a solution of F(x) = y provided the iteration is stopped after k steps. The next result shows that the stopping rule (10), (14) renders the Landweber iteration a regularization method (see [70, Theorem 2.6]):

Theorem 2.

Let the assumptions of Theorem 1 hold and let k = k (δ,y δ ) be chosen according to the stopping rule (10), (14) . Then the Landweber iterates converge to a solution of F(x) = y. If for all , then converges to as δ → 0.

To obtain convergence rates the exact solution has to satisfy some source conditions. Moreover, one has to guarantee that the iterates remain in . In [49] rates were proven under the additional assumption that F satisfies

where is a family of bounded linear operators and c is a positive constant.

Unfortunately, these conditions are not always satisfied (see [49, Example 4.3]). Therefore, we consider instead of (13) the following slightly modified iteration method,

(16)
where, as above, is an initial guess, G δ (x): = G(x, y δ ), and G is a continuous operator mapping into . The iteration will again be stopped according to the discrepancy principle (10).

To obtain local convergence and convergence rates for this modification we need the following assumptions:

Assumption 9.

Let ρ be a positive number such that .

(i)

The equation F(x) = y has an x 0 -minimum-norm solution in .

(ii)

There exist positive constants c 1, c 2, c 3 and linear operators R x δ such that for all the following estimates hold:

(17)

(18)

(19)

(20)

(iii)

The scaling parameter ω in (16) satisfies the condition

Note that, if instead of (17) the slightly stronger condition

(21)
holds in for some c > 0, then the unique existence of the x 0-minimum-norm solution follows from Proposition 1 if F(x) = y is solvable in .

Convergence and convergence rates for the modification above are obtained as follows (see [70, Theorems 2.8 and 2.13]):

Theorem 3.

Let Assumption 9 hold and let k = k (δ,y δ ) be chosen according to the stopping rule (10).

(i)

If is so small and if the parameter τ in (10) is so large that

and

then the modified Landweber iterates converge to as δ → 0.

(ii)

If τ > 2 and if satisfies (8) with some 0 < μ ≤ 1∕2 and sufficiently small, then it holds that

and

Note that for the modified Landweber iteration we obtain the same convergence rates and the same asymptotical estimate for k as for linear ill-posed problems (compare [36, Theorem 6.5]) if μ ≤ 1∕2 in (8).

Under the Assumption 9 and according to the theorem above the best possible convergence rate is

provided that . Even if μ > 1∕2 we cannot improve this rate without an additional restriction of the nonlinearity of F.

We will show for the following parameter estimation problem that the conditions of Assumption 9 are satisfied if F (x) is replaced by a certain operator G δ (x).

Example 1.

We treat the problem of estimating the diffusion coefficient a in

(22)
where f ∈ L 2; the subscript s denotes derivative with respect to s.

In this example, F is defined as the parameter-to-solution mapping

22). One can prove that F is Fréchet-differentiable (see, e.g., [24]) with

where

and

note that B −1 is the adjoint of the embedding operator from H 1[0, 1] in L 2[0, 1].

First of all, we show that F satisfies condition (17): let F(a) = u, , and w ∈ L 2. Noting that and that A(a) is one-to-one and onto for we obtain that

This together with the fact that and that if g ∈ H 1 is such that g(ξ) = 0 for some ξ ∈ [0, 1] yields the estimate

(23)
This implies (17).

The conditions (18) and (19) are not fulfilled with G δ (x) = F (x). Noting that is the unique solution of the variational problem: for all v ∈ H 1

(24)
we propose to choose G δ in (16) as follows: G δ (a) w = G(a, u δ ) w is the unique solution g of the variational problem

(25)
This operator G δ obviously satisfies (18), since

with

The condition (19) is satisfied, since one can estimate as in (23) that

Note that a constant c 2 independent from can be found, since it is assumed that . Now we turn to condition (20): using (24) and (25) we obtain similarly to (23) the estimate

This together with and implies that

and hence (20) holds.

Thus, Theorem 3 is applicable, i.e., if ω and τ are chosen appropriately, then the modified Landweber iterates (cf. (16)) where k is chosen according to the stopping rule (10) converge to the exact solution with the rate provided that

with sufficiently small. Note that this means that

Basically this means that one has to know all rough parts of up to H 3. But without this knowledge one cannot expect to get the rate .

In [49] two other nonlinear problems were treated where conditions (18) and (19) are satisfied with G δ (x) = F (x).

Landweber Iteration in Hilbert Scales

We have mentioned in the last subsection that for classical Landweber iteration the rates cannot be better than under the given assumptions. However, better rates may be obtained for solutions that satisfy stronger smoothness conditions if the iteration is performed in a subspace of with a stronger norm. This leads us directly to regularization in Hilbert scales. On the other hand for solutions with poor smoothness properties the number of iterations may be reduced if the iteration is performed in a space with a weaker norm.

First of all, we shortly repeat the definition of a Hilbert scale: let L be a densely defined unbounded self-adjoint strictly positive operator in . Then denotes the Hilbert scale induced by L if is the completion of with respect to the Hilbert space norm ; obviously, (see [78] or [36, Section 8.4] for details).

The operator in (13) will now be replaced by the adjoint of considered as an operator from into . Usually s ≥ 0, but we will see below that there are special cases where a negative choice of s can be advantageous. Since by definition of this adjoint is given by , (13) is replaced by the iteration process

(26)
As in the previous chapter the iteration process is stopped according to the discrepancy principle (10).

Proofs of convergence and convergence rates for this method can be found in [34, 70, 88]. For an approach, where the Hilbert scale is chosen in the space Y, see [33].

The following basic conditions are needed.

Assumption 10.

(i)

is continuous and Fréchet-differentiable in .

(ii)

F(x) = y has a solution .

(iii)

for all and some a > 0, to is injective.

(iv)

is such that , where − a < s. If s < 0, has to be replaced by its extension to .

Usually, for the analysis of regularization methods in Hilbert scales a stronger condition than (iii) is used, namely (cf., e.g., [88])

(27)
where the number a can be interpreted as a degree of ill-posedness of the linearized problem in . However, this condition is not always fulfilled. Sometimes one can only prove that condition (iii) in Assumption 10 holds. It might also be possible that one can prove an estimate from below in a slightly weaker norm (see examples in [34]), i.e.,

10 and (28). The proof follows the lines of [36, Corollary 8.22] noting that the results there not only hold for s ≥ 0 but also for s > −a.

Proposition 3.

Let Assumption 10 hold. Then for all ν ∈ [0,1] it holds that

Note that condition (iii) is equivalent to

If in addition condition (28) holds, then for all ν ∈ [0,1] it holds that

Note that condition (28) is equivalent to

In our convergence analysis the following shifted Hilbert scale will play an important role

where a, s, and B are as in Assumption 10. Some properties of this shifted Hilbert scale can be found in [70, Proposition 3.3].

For the convergence rates analysis we need the following smoothness conditions on the solution and the Fréchet-derivative of F.

Assumption 11.

(i)

for some ρ > 0.

(ii)

for all and some b ∈ [0,a], β ∈ (0,1], and c > 0.

(iii)

for some , i.e., there is an element so that

Condition (iii) is a smoothness condition for the exact solution comparable to (8). Usually is used instead of . However, these conditions are equivalent if (27) holds.

For the proof of the next convergence rates result see [70, Theorem 3.8].

Theorem 4.

Let Assumptions 10 and 11 hold. Moreover, let k = k (δ,y δ ) be chosen according to the stopping rule (10) with τ > 2 and let |​|​|x 0 − x |​|​| u be sufficiently small. Then the following estimates are valid for δ > 0 and some positive constants c r:

(29)
and for − a ≤ r < u

As usual for regularization in Hilbert scales, we are interested in obtaining convergence rates with respect to the norm in .

Corollary 2.

Under the assumptions of Theorem 4 the following estimates hold:

(30)

If in addition (28) holds, then for s > 0 the rate can be improved to

Note that (29) implies that k is finite for δ > 0 and hence is a stable approximation of .

Moreover, it can be seen from (29) that the larger s the faster k possibly grows if δ → 0. As a consequence, s should be kept as small as possible to reduce the number of iterations and hence to reduce the numerical effort. If u is close to 0, it might be possible to choose a negative s. According to (30), we would still get the optimal rate, but, due to (29), k would not grow so fast. Choosing a negative s could be interpreted as a preconditioned Landweber method (cf. [34]).

We will now comment on the rates in Corollary 2: if only Assumption 10 (iii) is satisfied, i.e., if may be estimated through the norm in only from above, convergence rates in can only be given if s < u, i.e., only for the case of undersmoothing. If s > 0, the rates will not be optimal in general. To obtain rates also for s > u, i.e., for the case of oversmoothing, condition (28