, , ,

The area of mathematical inverse problems is quite broad and involves the qualitative and quantitative analysis of a wide variety of physical models. Moreover, a considerable number of problems arising in different scientific and technical fields belong to a class of ill-posed problems. For example, geophysicists scan the earth's subsurface by recording arrival times of waves reflected off different layers underneath the surface, and try to determine a meaningful solution and to understand which features in the solution are statistically significant.

From a statistical point of view, the problem can be seen as recovering an unobservable signal

\tilde{f}

based on observations

\begin{matrix} y (x_{i}) = A \tilde{f} (x_{i}) + ɛ_{i}, \end{matrix}

(1.1)

where

A : F \to Y

is some known compact linear operator defined over a separable Hilbert space

F

, with values in a separable Hilbert space

Y

and

x_{i}, i = 1, \dots, n

is a fixed observation scheme. We assume that the observations

y (x_{i}) \in I R

and that the observation noise

ɛ_{i}

are i.i.d. realizations of a certain random variable

ɛ

. Throughout the paper, we shall denote

y = (y (x_{i}))_{i = 1}^{n}

. In this article we study the problem of estimating

\tilde{f}

using fixed iterative methods.

The best possible accuracy, regardless of any discretization and noise corruption is determined by some a priori smoothness assumption on the exact solution

\tilde{f}

. Here, smoothness is given in terms of some index function

η

on the spectrum de

A^{*} A

A_{η, ρ} = {f \in F, f = η (A^{*} A) ω, ∥ ω ∥ \leq ρ}

where

A_{η, ρ}

is called a source condition. For classical Hilbert scales, the smoothness is measured in terms of powers

η (t) : = t^{μ}

with

0 \leq μ \leq μ_{0}

μ_{0} > 0

In a deterministic framework, the statistical model 1.1 is formulated as the problem of finding the best-approximate solution of

A f = y,

in the situation where only perturbed data

y^{δ}

are available with

∥ y - y^{δ} ∥ \leq δ .

Here,

δ

is called the noise level. It is important to remark that whereas in this case consistency of the estimators depends on the approximation parameter

δ

, in ( 1.1 ) it depends on the number of observations

n

In general, the best

L^{2}

approximation

A^{†} y

, where

A^{†}

is the Moore-Penrose (generalized) inverse of A, does not depend continuously on the left-hand side

y

. We define the Moore-Penrose inverse in an operator-theoretic way by restricting the domain and range of

A

in such a way that the resulting restricted operator is invertible; its inverse will then be extended to its maximal domain

D (A^{†}) = ℛ (A) + ℛ (A)^{⊥}

, with

ℛ (A)

the range of the operator

A

and

ℛ (A)^{⊥}

the orthogonal complement of the range of

A

The inverse problems that we study in this article are called ill-posed problems because the operator

A

is compact and consequently equation 1.1 can not be inverted directly since

A^{- 1}

is not a bounded operator. Ill-posed problems are usually treated by applying some linear regularization procedure, often based on a singular value decomposition; see Tikhonov and Arsenin in [23] . An interesting early survey of the statistical perspective on ill-posed problems is studied in great detail by O'Sullivan in [21] .

In practice however 1.1 is hardly ever considered. Instead, we project the problem onto a smaller dimensional space

Y_{m}

Y

. This yields a sequence of closed subspaces

Y_{m}

indexed by

m \in M_{n}

, a collection of index sets. Clearly, an important problem is thus how to choose subspace

Y_{m}

based on the data. This can be done by selection of a cutoff point or by threshold methods. Choosing the right subspace will be called model selection.

Sometimes this projection provides enough regularization to produce a good approximate solution, but often additional regularization is needed. Regularization methods replace an ill-posed problem by a family of well-posed problems, their solution, called regularized solutions, are used as approximations to the desired solution of the inverse problem. These methods always involve some parameter measuring the closeness of the regularized and the original (unregularized) inverse problem, rules (and algorithms) for the choice of these regularization parameters as well as convergence properties of the regularized solutions are central points in the theory of these methods, since they allow to find the right balance between stability and accuracy. The general principles of regularization for ill-posed problems are known. In particular, such principles have been established by A.N. Tikhonov. The literature on various regularization methods based on these general principles is extensive ( Engl, Hanke, and Neubauer [9] , Gilyazov and Gol'dman [11] ).

In statistic, regularization, is associated to penalty based methods or thresholding methods or more generality to “smoothing” techniques. In applications, regularization offers a unifying perspective for many diverse ill-posed inverse problems, a wide range of problems concerned with recovering information from indirect and usually noisy measurements, arising in geophysics, tomography and econometrics. One of the most important, but still insufficiently developed, topics in the theory of ill posed problems is connected with iteration regularization [11] ; i.e, with the utilization of iteration methods of any form for the stable approximate solution of ill-posed problems. Iterative regularization methods tend to be more attractive in terms of numerical cost and implementation, but a number of open questions remain in their theoretical analysis.

In this article we propose an iterative regularized estimator for linear ill-posed problems.

Necessary conditions for convergence are established. These conditions connect the choice of the regularization parameter (i.e., the iteration index) with the projection dimension.

Moreover, we prove that the iterative regularized estimator is optimal in the sense that the estimator achieves the best rate of convergence among all the regularized estimators. A recent work in this direction is developed by Loubes and Luden͂a in [16] , which discusses the problem of estimating inverse nonlinear ill-posed problems with different types of complexity penalties leading either to a model selection estimator or to a regularized estimator.

The choice of the regularization sequence is here crucial, and a lot of work associated with the selection of a good regularization parameter can be found in the literature [10] , [13] . When using iterative methods the problem is finding a good stopping criteria for terminating the iteration procedure. In this article we will use tools developed in the context of model selection via penalization, [2] ,[1] , based on the use of concentration inequalities.

Our article is organized as follows. Section 2 presents basic assumptions and a statement of the discretized inverse problem. In section 3 we discuss regularization methods and we prove consistency of the estimator when the regularization parameter is known. In section 4 we present our main result, prove optimality of an adaptive regularized estimator and give its rate of convergence. Finally, in the last section we introduce regularization by iterative methods for the solution of inverse problems and provide some examples to explain the properties of iterative regularization methods.

2 Preliminaries

2.1 Formulation of the problem and basic assumptions

We assume that the inverse problem is given by

y = A \tilde{f} + ɛ .

where

ɛ

is a centered random variable satisfying the moment condition

I E (| ɛ |^{p} / σ^{p}) \leq p! / 2

and

I E (ɛ^{2}) = σ^{2}

We also need some notations concerning the fixed design settings,

x_{i}, i = 1, \dots, n

. Define the empirical measure:

P_{n} = \frac{1}{n} \sum_{i = 1}^{n} δ_{x_{i}} .

and the associated empirical norm

∥ y ∥_{n}^{2} = ∥ y ∥_{P_{n}}^{2} = \frac{1}{n} \sum_{i = 1}^{n} (y (x_{i}))^{2}

as well as the empirical scalar product

< y, ɛ >_{n} = \frac{1}{n} \sum_{i = 1}^{n} ɛ_{i} y (t_{i}) .

We assume

A : F \to Y

F, Y

separable Hilbert spaces. Let

〈, 〉

stands for the inner product defined over

F

. We assume that the range of the operator

A

ℛ (A),

is closed in the sense that

\tilde{f} \in N (A)^{⊥}

, where

N (A)^{⊥}

is the orthogonal complement of the null space of the operator

A

With this notation let

J (f) = ∥ y - A f ∥_{n}^{2}

the quadratic risk function. We will denote by

\hat{f}

the function that minimizes the risk (which may not be unique), defined as

\begin{matrix} \hat{f} = a r g {min}_{f \in F} J (f), \end{matrix}

(2.1)

where the minimum is taken over all functions from

F

Y

The solution of the problem

{min}_{f \in F} J (f)

exists if and only if

f

is a solution of the normal equation

\begin{matrix} A^{*} A f = A^{*} y \end{matrix}

(2.2)

where

A^{*} : Y \to F

is the adjoint operator of

A

(introduced via the requirement that for all

f \in F

and

y \in Y

〈 A x, y 〉_{n} = \frac{1}{n} 〈 x, A^{*} y 〉

holds). It is important to remark that the operator

A^{*}

actually depends on the observation sequence

x_{i}, i = 1, \dots n

. If

Y

is generated by

{φ_{j}}_{j = 1}^{m}

and is such that this basis is orthonormal with respect to the

L^{2} (P_{n})

norm over

Y

, and

A

is the identity then

A^{*} = \frac{1}{n} (φ_{j} (x_{i}))_{i, j}

It is necessary to mention that the convergence rates can thus be given only over subsets of

F

, i.e., under a-priori assumptions on the exact solution

\tilde{f}

. We will formulate such a-priori assumptions, encountered typically in the inverse problem literature, in terms of the exact solutions by considering subsets of

F

given by some source condition of the form

A_{μ, ρ} = {f \in F, f = (A^{*} A)^{μ} ω, ∥ ω ∥ \leq ρ}

where

0 \leq μ \leq μ_{0}

μ_{0} > 0

and use the notation

\begin{matrix} A_{μ} = ⋃_{ρ > 0} A_{μ, ρ} = ℛ ((A^{*} A)^{μ}) \end{matrix}

(2.3)

These sets are usually called source sets,

f \in A_{μ, ρ}

is said to have a source representation.

The requirement that

f

be in

A_{μ, ρ}

can be considered as an smoothness condition.

2.2 Projection methods

For numerical calculations, we have to approximate the space

F

by a finite-dimensional subspace. Estimating over all

F

is in general not possible. One approach in this direction is regularization by projection, where the regularization is achieved by a finite-dimensional approximation through projection.

Let

M_{n}

be a collection of index sets (

m \in M_{n}, m = {j_{1} \dots, j_{d_{m}}}

). We give a sequence

Y_{1} \subset Y_{2} \dots \subset Y_{m} \dots \subset Y

whose of union is dense in

Y

. We assume

d i m (Y_{m}) = d_{m} .

Let

Π_{Y_{m}}^{n}

be the orthogonal projector in the empirical norm over the subspace

Y_{m}

and let

A_{m} = Π_{Y_{m}}^{n} A

. Define

F_{m} = A^{*} Y_{m}

, with

A_{m}^{*} : Y_{m} \to F

the adjoint operator of

A_{m}

, and

Π_{F_{m}}

to be the orthogonal projector onto the subspace

F_{m}

. Then, by construction

Π_{F_{m}} = (Π_{Y_{m}}^{n} A)^{+} Π_{Y_{m}}^{n} A .

Thus, we shall assume that data are give through an orthogonal design, corresponding to an orthogonal projection

Π_{Y_{m}}^{n}

\begin{matrix} Π_{Y_{m}}^{n} y = Π_{Y_{m}}^{n} A \tilde{f} + Π_{Y_{m}}^{n} ɛ . \end{matrix}

(2.4)

With this notation we have that the best-approximate

L^{2}

solution has the expression

Π_{F_{m}} \tilde{f} = A_{m}^{†} y_{m} .

for

y_{m} = Π_{Y_{m}}^{n} y

in the domain of

A_{m}^{†}

. In the following we shall denote

{\tilde{f}}_{m} = Π_{F_{m}} \tilde{f}

Our goal is to find the solution of the equation 1.1 in the finite-dimensional subspace

F_{m}

F

. We have that for projection without regularization the choice of

F_{m}

and of

A_{m}

has many advantages. For noisy data and severely ill-posed problems the dimension of the subspace has to be rather low to keep total error estimate small, since for these problems the smallest singular value of

A_{m}

decreases rapidly as

m

increases. To be able to use larger dimensions we have to combine the projection method with additional regularization methods,such as iterative methods [9] ,[11] .

2.3 Singular value decomposition

As often

A_{m}

is not of full rank, the singular value decomposition (SVD) of the operator

A_{m}

is then a useful tool. Let

(σ_{j}; φ_{j}, φ_{j})_{j \in m}

be a singular system for a linear operator

A_{m}

, that is,

A_{m} φ_{j} = σ_{j} φ_{j}

and

A_{m}^{*} φ_{j} = σ_{j} φ_{j}

; where

{σ_{j}^{2}}_{j \in m}

are the nonzero eigenvalues of the selfadjoint operator

A_{m}^{*} A_{m}

(and also of

A_{m} A_{m}^{*}

), considered in decreasing order. Furthermore,

{φ_{j}}_{j \in m}

and

{φ_{j}}_{j \in m}

are a corresponding complete orthonormal system of eigenvectors of

A_{m}^{*} A_{m}

and

A_{m} A_{m}^{*}

, respectively. For general linear operators with an SVD decomposition, we can write

\begin{matrix} A_{m} f = \sum_{j \in m} σ_{j} 〈 f, φ_{j} 〉 φ_{j} \end{matrix}

(2.5)

\begin{matrix} A_{m}^{*} y_{m} = \sum_{j \in m} σ_{j} 〈 y_{m}, φ_{j} 〉 φ_{j} . \end{matrix}

(2.6)

For

y_{m}

in the domain of

A_{m}^{†}

D (A_{m}^{†})

, the best-approximate

L^{2}

solution hast the expression

A_{m}^{†} y_{m} = \sum_{j \in m} \frac{〈 y_{m}, φ_{j} 〉}{σ_{j}} φ_{j} = \sum_{j \in m} \frac{〈 A_{m}^{*} y_{m}, φ_{j} 〉}{σ_{j}^{2}} φ_{j} .

Note that for large

j

, the term

1 / σ_{j}

grows to infinity. Thus, the high frequency errors are strongly amplified. We will asume that

σ_{j} = O (j^{- p})

for some

p > 1 / 2

, which is clearly related to the ill-posedness of the operator

A

and the approximation properties of

Y_{m}

. For the construction and analysis of regularization methods, we will require some general notation for functions of the operators

A_{m}^{*} A_{m}

and

A_{m} A_{m}^{*}

Let

E_{λ}

be the spectral decomposition of

A_{m}^{*} A_{m}

given by

E_{λ} (\cdot) = \sum_{σ^{2} < λ_{j}, j \in m} 〈 \cdot, φ_{j} 〉 φ_{j}

and

H_{λ}

the spectral decomposition of

A_{m} A_{m}^{*}

. Then

E_{λ}

is an orthogonal projector and projects onto

span {φ_{j} | j \in m, σ^{2} < λ} .

Since

(σ_{j}^{2}; φ_{j})

is an eigensystem for the selfadjoint compact operator

A_{m}^{*} A_{m}

A_{m}^{*} A_{m} f = \sum_{j \in m} σ_{j}^{2} 〈 f, φ_{j} 〉 φ_{j}

holds, which will be written (using the definition of the integral below) as

A_{m}^{*} A_{m} f = \int λ d E_{λ} f

for

f \in D (A_{m})

. Here the limits of integration could be 0 and

∥ A_{m} ∥^{2} + ε

for any

ε > 0

. We sometimes omit the limits of integration.

This, motivates the definition

\begin{matrix} G (A_{m}^{*} A_{m}) : = \int G (λ) d E_{λ} : = \sum_{σ^{2} = λ_{j}, j \in m} G (σ_{j}^{2}) 〈 \cdot, φ_{j} 〉 φ_{j} \end{matrix}

(2.7)

of a (piecewise) continuous function

G

of a selfadjoint linear operator on

F_{m}

. If

A^{*} A

is continuously invertible, then

(A^{*} A)^{- 1} = \int \frac{1}{λ} d E_{λ}

In this case the best-approximate

L^{2}

solution, for

y_{m}

in the domain of

A_{m}^{†}

, can be characterized by the equation

\begin{matrix} {\tilde{f}}_{m} = A_{m}^{†} y_{m} = \int \frac{1}{λ} d E_{λ} A_{m}^{*} y_{m} . \end{matrix}

(2.8)

G (A_{m}^{*} A_{m})

is defined via 2.7 , then for

f \in D (G_{1} (A_{m}^{*} A_{m}))

and

g \in D (G_{2} (A_{m}^{*} A_{m}))

\begin{matrix} 〈 G_{1} (A_{m}^{*} A_{m}) f, G_{2} (A_{m}^{*} A_{m}) g 〉 = \int G_{1} (λ) G_{2} (λ) d 〈 E_{λ} f, g 〉 \end{matrix}

(2.9)

and

\begin{matrix} ∥ G (A_{m}^{*} A_{m}) f ∥^{2} = \int G (λ) d ∥ E_{λ} f ∥^{2} . \end{matrix}

(2.10)

The source set,

A_{μ}

2.3 , can be characterized via the singular values as follows:

(A_{m}^{*} A_{m})^{μ} ω = \int λ^{μ} d E_{λ} ω = \sum_{j \in m} σ_{j}^{2 μ} 〈 ω, φ_{j} 〉 φ_{j}

3 Regularization methods

After the general considerations of the last section, we now explain the construction of a regularization method for the important special case of selfadjoint linear operators. The basic idea for deriving a regularization method is to replace the amplification factors

1 / λ_{j}

by a filtered version

Q (λ_{j}, α)

, where the filter function is a piecewise continuous, nonnegative and nonincreasing function of

λ

on the segment

[0, ∥ A_{m} ∥^{2}]

for a regularization parameter

α > 0

The assumptions over the regularizing coefficients

Q_{α} (λ)

are technical and are given in order to control fluctuations over set

[0, ∥ A_{m} ∥^{2}]

The filter family

{Q (λ_{j}, α)}_{j \in m}

approximates the function

λ^{- 1}

for

α \to \infty

. Intuitively, a regularization on

A_{m}^{†}

should then be the replacement of the ill conditioned operator

A_{m}^{†}

by a family

{R (λ_{j}, α)}_{j \in m} : Y_{m} \to F_{m}

of continuous operators. Throughout all the article, we shall denote

{Q (λ_{j}, α)}_{j \in m}

and

{R (λ_{j}, α)}_{j \in m}

Q_{α}

and

R_{α}

, respectively. Obviously, for all

α > 0,

R_{α}

is bounded.

As the approximation of

{\tilde{f}}_{m}

, we then take

f_{m, α} = Q_{α} (A_{m}^{*} A_{m}) A_{m}^{*} y_{m} = R_{α} y_{m},

where

R_{α} : = \int Q_{α} (λ) d E_{λ} A_{m}^{*}

Remark 3.1. Note that with the above notation

\begin{matrix} f_{m, α} = R_{α} y_{m} = R_{α} Π_{Y_{m}}^{n} A \tilde{f} + R_{α} Π_{Y_{m}}^{n} ɛ . \end{matrix}

(3.1)

Also that we can write

R_{α} = Q_{α} (A_{m}^{*} A_{m}) A_{m}^{*} .

The next theorem gives conditions under which the first term in 3.1 converges to

{\tilde{f}}_{m} = Π_{F_{m}} \tilde{f}

. The proof follows that of [9] , but we include it for the sake of completeness.

Theorem 3.2. Let, for all

α > 0

Q_{α} : [0, ∥ A_{m} ∥^{2}] \to I R

be a piecewise continuous and nonincreasing function of

λ

on the segment

[0, ∥ A_{m} ∥^{2}]

. Assume also that there is a

C > 0

such that

| λ Q_{α} (λ) | \leq C,

and

{lim}_{α \to \infty} Q_{α} (λ) = \frac{1}{λ}

for all

λ \in (0, ∥ A_{m} ∥^{2}) .

Then, for all

y \in D (A_{m}^{†})

{lim}_{α \to \infty} Q_{α} (A_{m}^{*} A_{m}) A_{m}^{*} Π_{Y_{m}}^{n} A \tilde{f} = {\tilde{f}}_{m}

holds with

{\tilde{f}}_{m} = A_{m}^{†} y_{m} .

Remark 3.3. In order to assume convergence as

α \to \infty

, it is necessary to choose

Q_{α}

such that it approximates

1 / λ

for all

λ \in (0, ∥ A_{m} ∥^{2}]

. Also, note that the condition

| λ Q_{α} (λ) | \leq C

implies that

∥ A_{m} R_{α} ∥ = ∥ A_{m} A_{m}^{*} Q_{α} (A_{m}^{*} A_{m}) ∥ \leq C

, i.e,

∥ A_{m} R_{α} ∥

is uniformly bounded.

Proof. As in [9] , if

{\tilde{f}}_{m}

is defined by 2.8 , then by 2.2 the residual norm has the representation

∥ {\tilde{f}}_{m} - Q_{α} (A_{m}^{*} A_{m}) A_{m}^{*} A_{m} \tilde{f} ∥^{2} = ∥ (I - Q_{α} (A_{m}^{*} A_{m}) A_{m}^{*} A_{m}) {\tilde{f}}_{m} ∥^{2}

From the formula 2.10 , it follows that

∥ {\tilde{f}}_{m} - Q_{α} (A_{m}^{*} A_{m}) A_{m}^{*} A_{m} \tilde{f} ∥^{2} = \int_{0}^{∥ A_{m} ∥^{2} +} (1 - λ Q_{α} (λ))^{2} d ∥ E_{λ} {\tilde{f}}_{m} ∥^{2} .

Since

(1 - λ Q_{α} (λ))^{2}

is bounded by the constant

(1 + C)^{2}

, which is integrable with respect to the measure

d ∥ E_{λ} {\tilde{f}}_{m} ∥^{2},

then by the Dominated Convergence Theorem,

\begin{matrix} {lim}_{α \to \infty} \int_{0}^{∥ A_{m} ∥^{2} +} (1 - λ Q_{α} (λ))^{2} d ∥ E_{λ} {\tilde{f}}_{m} ∥^{2} = \int_{0}^{∥ A_{m} ∥^{2} +} {lim}_{α \to \infty} (1 - λ Q_{α} (λ))^{2} d ∥ E_{λ} {\tilde{f}}_{m} ∥^{2} . \end{matrix}

(3.2)

Since for

λ > 0

{lim}_{α \to \infty} (1 - λ Q_{α} (λ)) = 0

then the integral on the right-hand side of 3.2 equals to

0

. On the other hand, if

λ = 0

{lim}_{α \to \infty} (1 - λ Q_{α} (λ)) = 1

then the equation 3.2 has the form

\begin{matrix} {lim}_{α \to \infty} \int_{0}^{∥ A_{m} ∥^{2} +} (1 - λ Q_{α} (λ))^{2} d ∥ E_{λ} {\tilde{f}}_{m} ∥^{2} = {lim}_{λ \to 0^{+}} ∥ E_{λ} {\tilde{f}}_{m} ∥^{2} - ∥ E_{0} {\tilde{f}}_{m} ∥^{2} \end{matrix}

(3.3)

which is equal the jump of

∥ E_{λ} {\tilde{f}}_{m} ∥^{2}

λ = 0

. Since

{\tilde{f}}_{m} \in N (A_{m})^{⊥}

, the term on the right-hand side of 3.3 equals to 0. Thus,

R_{α} A_{m} \tilde{f}

converges to

{\tilde{f}}_{m}

α \to \infty

for

y_{m} \in D (A_{m}^{†}),

which ends the proof.

□

Let

T r (B)

the trace of the selfadjoint operator

B^{t} B

for any square matriz

B

, which is defined by

T r (B) = \frac{1}{n} \sum_{j \in m} b_{j}

for

b_{j}

eigenvalues of

B^{t} B

We then have the following result,

Theorem 3.4. Let

Q_{α}

be as in theorem 3.2 . Let

μ, ρ > 0

and let

ω_{μ} : (0, α_{0}) \to R

be such that for all

α \in (0, α_{0})

and

λ \in [0, σ_{1}^{2}]

{sup}_{0 \leq λ \leq σ_{1}^{2}} λ^{μ} | 1 - λ Q_{α} (λ) | \leq ω_{μ} (α)

holds. Then for

\tilde{f} \in A_{μ, ρ},

the following inequality holds true

\begin{matrix} I E ∥ {\tilde{f}}_{m} - f_{α, m} ∥^{2} \leq 2 ω_{μ} (α)^{2} ρ^{2} + 2 σ^{2} T r (Q_{α}^{2} (A_{m}^{*} A_{m}) A_{m} A_{m}^{*}) . \end{matrix}

(3.4)

Proof. The proof of this inequality is based on the definition of the estimator

f_{m, α}

and on the assumptions over this function. We have that the

L^{2} -

norm of the difference between the regularized function and the true data function can be bounded by

\begin{matrix} I E ∥ {\tilde{f}}_{m} - f_{m, α} ∥^{2} \leq 2 I E ∥ {\tilde{f}}_{m} - R_{α} A_{m} \tilde{f} ∥^{2} + 2 I E ∥ R_{α} A_{m} \tilde{f} - f_{m, α} ∥^{2} \end{matrix}

(3.5)

where

f_{m, α} = R_{α} y_{m}

This is the typical bias-variance decomposition. The first term on the right-hand side is an approximation error, which corresponds to the bias, whereas the second term, variance, is a stability bound on the regularizing operator

R_{α}

. Note that by the Theorem 3.2 , the first term in 3.5 goes to 0 if

y_{m} \in D (T_{m}^{†})

Let

ω \in F_{m}

with

∥ ω ∥ \leq ρ

. Since

\tilde{f} \in F_{m}

then

Π_{F_{m}} \tilde{f} = (A_{m}^{*} A_{m})^{μ} ω

. On the other hand,

λ^{μ} {sup}_{λ} | 1 - λ Q_{α} (λ) | \leq ω_{μ} (α)

, then the first term in this equation can be bounded by

\begin{matrix} I E ∥ {\tilde{f}}_{m} - R_{α} A_{m} \tilde{f} ∥^{2} & = I E ∥ {\tilde{f}}_{m} - Q_{α} (A_{m}^{*} A_{m}) A_{m}^{*} A_{m} {\tilde{f}}_{m} ∥^{2} \end{matrix}

\begin{matrix} = I E ∥ (I - Q_{α} (A_{m}^{*} A_{m}) A_{m}^{*} A_{m}) {\tilde{f}}_{m} ∥^{2} \end{matrix}

\begin{matrix} \leq ω_{μ}^{2} ρ^{2} . \end{matrix}

\begin{matrix}  \end{matrix}

In order to control the term corresponding to the variance we used that the data perturbation is white noise. Thus,

\begin{matrix} I E ∥ R_{α} A_{m} \tilde{f} - f_{m, α} ∥^{2} & = I E 〈 ɛ, (Q_{α} (A_{m}^{*} A_{m}) A_{m}^{*})^{*} Q_{α} (A_{m}^{*} A_{m}) A_{m}^{*} ɛ 〉 \end{matrix}

\begin{matrix} = I E 〈 ɛ, Q_{α}^{2} (A_{m}^{*} A_{m}) A_{m} A_{m}^{*} ɛ 〉 \end{matrix}

\begin{matrix} = σ^{2} T r (Q_{α}^{2} (A_{m}^{*} A_{m}) A_{m} A_{m}^{*}) \end{matrix}

\begin{matrix}  \end{matrix}

which yields the desired result. □

The next result will be useful when studying iterative methods.

Theorem 3.5. Let

Q_{α}

be as in theorem 3.2 . Assume also that

Q_{α}

is continuously differentiable and that the function

| 1 - λ Q_{α} (λ) |^{'} | λ Q_{α} (λ) - 1 |^{- 1}

doest not decrease. Then the estimates are valid

{sup}_{0 \leq λ \leq σ_{1}^{2}} | Q_{α} (λ) | = Q_{α} (0),

and

{sup}_{0 \leq λ \leq σ_{1}^{2}} λ^{μ} | 1 - λ Q_{α} (λ) | < μ^{μ} (μ + 1)^{- 1} ω_{μ} (α)

where

ω_{μ} (α) = Q_{α} (0)^{- μ} .

Proof. The proof can be carried out by standard techniques. A proof of this result can be found in [11] . □

4 Rates of convergence for the regularized estimator

In any regularization method, the regularization parameter

α

plays a crucial role. For choosing the parameter, there are general methods of parameter selection. For example, the Discrepancy Principe [20] , Cross-Validation [7] and the L-curve [10] . They differ in the amount of a priori information required as well as in the decision criteria. The appropriate choice of regularization parameter is a difficult problem. We would like too choose

α

, based on the data in such a way that optimal rates are maintained. This choice should not depend on a priori regularity assumptions.

Our goal is to introduce adaptive methods in the context of statistical inverse problems.

In this section we introduce our adaptive estimator, for a fixed

m = m_{0}

. We choose

m_{0}

such that

∥ \tilde{f} - Π_{F_{m_{0}}} \tilde{f} ∥^{2}

satisfies the optimal rates with high probability since we know

∥ \tilde{f} - Π_{F_{m_{0}}} \tilde{f} ∥^{2} < ∥ I - Π_{Y_{m_{0}}} ∥^{4 μ} = O (d_{m_{0}}^{- 4 μ p})

for a certain

p

and

0 < μ \leq 1 / 2

. It is satisfied if the dimension of the set is such that

\begin{matrix} d_{m_{0}} \geq n^{\frac{1}{2 p + 1}} . \end{matrix}

(4.1)

This leads to the rate

∥ \tilde{f} - f_{m, α} ∥^{2} = O (n^{- \frac{4 μ p}{4 μ p + 2 p + 1}}) .

Analogous results are obtained in the case of Hilbert scales ([12] ,[16] ).

Adaptive model selection is a technique which penalizes the regularization parameter, in such a way that we choose

{\hat{f}}_{m_{0}, α_{\hat{k}}}

by minimizing

a r g {min}_{k \in K, f \in F_{m}} (∥ R_{α_{k}} (y_{m} - A_{m} f) ∥^{2} + p e n (α_{k}))

where

\hat{k} = a r g {min}_{k \in K} (∥ R_{α_{k}} (y_{m} - A_{m} f) ∥^{2} + p e n (α_{k}))

and

p e n (α_{k}) = r σ^{2} (1 + L_{k}) [T r (R_{α_{k}}^{t} R_{α_{k}}) + ρ^{2} (R_{α_{k}})],

with

r > 2

and

L_{k}

is a sequence which is incorporated in order to control the complexity of the set

K = {1, 2, \dots, k_{n}}

, of all possible index up to

k_{n}

. Here

ρ^{2} (B) = ρ (B^{t} B)

is the spectral radius of the selfadjoint operator

B^{t} B

for any square matriz

B

, which is defined by

ρ^{2} (B) = \frac{1}{n} {max}_{j \in m} b_{j}

for

b_{j}

eigenvalues of

B^{t} B

Thus,

\hat{k}

is selected by minimizing

\begin{matrix} a r g {min}_{k \in K} (∥ R_{α_{k}} (y_{m} - A_{m} f) ∥^{2} + \frac{r σ^{2} (1 + L_{k})}{n} [\sum_{j \in m} Q^{2} (λ_{j}) λ_{j} + {max}_{j \in m} Q^{2} (λ_{j}) λ_{j}]) . \end{matrix}

(4.2)

The strategy as proposed in this article automatically provides the optimal order of accuracy.

The regularized estimator has a rate of convergence less or equal than the best rate achieved by the best estimator for a selected model. We have the following result,

Theorem 4.1. For any

f \in F_{m}

and any

α_{k}

the following inequality holds true for

d

a positive constant that depends on r (as in Lemma 4.4 ),

\begin{matrix} I E ∥ {\tilde{f}}_{m} - {\hat{f}}_{α_{\hat{k}}} ∥^{2} \leq \frac{1}{(1 - ν)} {inf}_{k \in K} [C (1 + ν) ∥ {\tilde{f}}_{m} - f_{α_{k}} ∥^{2} + 2 p e n (α_{k})] + \frac{C_{1} (d)}{n} \end{matrix}

(4.3)

where

C_{1} (d) = 4 σ^{2} \sum_{k} \frac{n ρ^{2} (R_{α_{k}})}{d} [\sqrt{d r L_{k} [\frac{T r (R_{α_{k}}^{t} R_{α_{k}})}{ρ^{2} (R_{α_{k}})} + 1]} + 1] e^{- \sqrt{d r L_{k} [\frac{T r (R_{α_{k}}^{t} R_{α_{k}})}{ρ^{2} (R_{α_{k}})} + 1]}} .

Remark 4.2. An important issue is that equation 4.3 is non asymptotic. The goodness of fit of the estimator is defined by trace,

T r (R_{α}^{t} R_{α})

, and spectral radius,

ρ^{2} (R_{α})

. Also, the estimator is optimal in the sense that the adaptive estimator achieves the best rate of convergence among all the regularized estimators.

Remark 4.3. Remark that under our assumptions, namely that the basis is orthonormal for the fixed design, both

n ρ^{2} (R_{k})

and

T r (R_{k}^{t} R_{k}) / ρ^{2} (R_{k})

do not depend on n.

Proof. For any

f_{α_{k}}

and any

k \in I N

\begin{matrix} ∥ R_{α_{\hat{k}}} (y_{m} - A_{m} {\hat{f}}_{α_{\hat{k}}}) ∥^{2} + p e n (α_{\hat{k}}) \leq ∥ R_{α_{k}} (y_{m} - A_{m} f_{α_{k}}) ∥^{2} + p e n (α_{k}) \end{matrix}

and

\begin{matrix} ∥ R_{α_{k}} (y_{m} - A_{m} f_{α_{k}}) ∥^{2} = ∥ R_{α_{k}} A_{m} (\tilde{f} - f_{α_{k}}) ∥^{2} + 2 〈 R_{α_{k}} A_{m} (\tilde{f} - f_{α_{k}}), R_{α_{k}} Π_{Y_{m}}^{n} ɛ 〉 + ∥ R_{α_{k}} Π_{Y_{m}}^{n} ɛ ∥^{2} \end{matrix}

Thus, following standard arguments we have

\begin{matrix} ∥ R_{α_{\hat{k}}} A_{m} (\tilde{f} - {\hat{f}}_{α_{\hat{k}}}) ∥^{2} \end{matrix}

\begin{matrix} \leq & ∥ R_{α_{k}} A_{m} (\tilde{f} - f_{α_{k}}) ∥^{2} - 2 < R_{α_{\hat{k}}} A_{m} (\tilde{f} - {\hat{f}}_{α_{\hat{k}}}), R_{α_{\hat{k}}} Π_{Y_{m}}^{n} ɛ > \end{matrix}

\begin{matrix} + 2 < R_{α_{k}} A_{m} (\tilde{f} - f_{α_{k}}), R_{α_{k}} Π_{Y_{m}}^{n} ɛ > - ∥ R_{α_{\hat{k}}} Π_{Y_{m}}^{n} ɛ ∥^{2} + ∥ R_{α_{k}} Π_{Y_{m}}^{n} ɛ ∥^{2} + p e n (α_{k}) + p e n (α_{\hat{k}}) . \end{matrix}

Let

0 < ν < 1

. Since the algebraic inequality

2 a b \leq ν a^{2} + \frac{1}{ν} b^{2}

holds for all

a, b \in I R

, we find that

\begin{matrix} (1 - ν) ∥ R_{α_{\hat{k}}} A_{m} (\tilde{f} - {\hat{f}}_{α_{\hat{k}}}) ∥^{2} \end{matrix}

\begin{matrix} \leq & (1 + ν) ∥ R_{α_{k}} A_{m} (\tilde{f} - f_{α_{k}}) ∥^{2} + 2 p e n (α_{k}) + 2 {sup}_{α_{k}} {\frac{1}{ν} ∥ R_{α_{k}} Π_{Y_{m}}^{n} ɛ ∥^{2} - p e n (α_{k})}, \end{matrix}

holds for any

k

and

f_{α_{k}} \in F_{m}

On the other hand, using that is

1 \leq ∥ R_{α_{k}} A ∥ \leq C

, we have that for any

f_{α_{k}} \in F_{m_{0}}

and any

k \in I N

\begin{matrix} (1 - ν) ∥ {\tilde{f}}_{m} - {\hat{f}}_{α_{\hat{k}}} ∥^{2} \leq C (1 + ν) ∥ {\tilde{f}}_{m} - f_{α_{k}} ∥^{2} \end{matrix}

\begin{matrix} + & 2 p e n (α_{k}) + 2 C_{1} {sup}_{α_{k}} {∥ R_{α_{k}} Π_{Y_{m}}^{n} ɛ ∥^{2} - p e n (α_{k})} . \end{matrix}

The proof then follows directly from the following technical lemma ([3] ,[16] ) which characterizes the supremum of an empirical process by the regularization family.

Lemma 4.4. Let

η (A) = \sqrt{ɛ^{t} A^{t} A ɛ} = ∥ A ɛ ∥ .

Then, there exists a positive constant

d

that depends on

r / 2

such that the following inequality holds

\begin{matrix} P (η^{2} (A) \geq σ^{2} [T r (A^{t} A) + ρ (A^{t} A)] r / 2 (1 + L) + σ^{2} u) \end{matrix}

(4.4)

\begin{matrix} \leq exp {- \sqrt{d (1 / ρ (A^{t} A) u + r / 2 L [T r (A^{t} A) / ρ (A^{t} A) + 1])}} . \end{matrix}

With the above notation,

η (R_{α_{k}}) = ∥ R_{k} ɛ_{m} ∥

where

ɛ_{m} = Π_{Y_{m}}^{n} ɛ

Now, with this lemma we have

\begin{matrix} P ({sup}_{α_{k}} ∥ R_{α_{k}} ɛ_{m} ∥^{2} - p e n (α_{k}) > σ^{2} x) \end{matrix}

\begin{matrix} \leq & \sum_{k} P [η^{2} (R_{α_{k}}) \geq r σ^{2} (1 + L_{k}) [T r (R_{α_{k}}^{t} R_{α_{k}}) + ρ^{2} (R_{α_{k}})] + σ^{2} x] \end{matrix}

\begin{matrix} \leq & \sum_{k} exp {- \sqrt{d (1 / ρ^{2} (R_{α_{k}}) x + r L_{k} [T r (R_{α_{k}}^{t} R_{α_{k}}) / ρ^{2} (R_{α_{k}}) + 1])}} \end{matrix}

Since for

X

positive

I E X = \int_{0}^{\infty} P (X > u) d u

, we then have that

\begin{matrix} I E [{sup}_{α_{k}} ∥ R_{α_{k}} ɛ_{m} ∥^{2} - p e n (α_{k})] = \int_{0}^{\infty} P [{sup}_{α_{k}} ∥ R_{α_{k}} ɛ_{m} ∥^{2} - p e n (α_{k}) \geq x] d x \end{matrix}

\begin{matrix} = & σ^{2} \int_{0}^{\infty} P [{sup}_{α_{k}} ∥ R_{α_{k}} ɛ_{m} ∥^{2} - p e n (α_{k}) \geq σ^{2} u] d u \end{matrix}

\begin{matrix} \leq & σ^{2} \sum_{k} \int_{0}^{\infty} exp {- \sqrt{k_{1} u + k_{2}}} d u . \end{matrix}

where

k_{1} = d / ρ^{2} (R_{α_{k}})

and

k_{2} = d r L_{k} [T r (R_{α_{k}}^{T} R_{α_{k}}) / ρ^{2} (R_{α_{k}}) + 1]

Let

w = k_{1} u + k_{2},

then

\begin{matrix} I E [{sup}_{α_{k}} ∥ R_{α_{k}} ɛ_{m} ∥^{2} - p e n (α_{k})] & \leq σ^{2} \sum_{k} \int_{k_{2}}^{\infty} \frac{1}{k_{1}} exp {- \sqrt{w}} d w \end{matrix}

\begin{matrix} = σ^{2} \sum_{k} \frac{2}{k_{1}} [- \sqrt{k_{2}} + 1] exp {- \sqrt{k_{2}}} \end{matrix}

\begin{matrix}  \end{matrix}

Finally, we have the desired result.

\begin{matrix} I E ∥ {\tilde{f}}_{m} - {\hat{f}}_{α_{\hat{k}}} ∥^{2} \leq \frac{1}{(1 - ν)} {inf}_{k \in K} [C (1 + ν) ∥ {\tilde{f}}_{m} - f_{α_{k}} ∥^{2} + 2 p e n (α_{k})] + \frac{C_{1} (d)}{n} \end{matrix}

\begin{matrix}  \end{matrix}

where

C_{1} (d) = 4 σ^{2} \sum_{k} \frac{n ρ^{2} (R_{α_{k}})}{d} [\sqrt{d r L_{k} [\frac{T r (R_{α_{k}}^{t} R_{α_{k}})}{ρ^{2} (R_{α_{k}})} + 1]} + 1] e^{- \sqrt{d r L_{k} [\frac{T r (R_{α_{k}}^{t} R_{α_{k}})}{ρ^{2} (R_{α_{k}})} + 1]}},

□

5 Regularization by iterative methods

Iterative regularization methods, are very competitive methods for linear inverse problems.

In iterative regularization, one picks an initial guess

f_{0}

for the unknown

\tilde{f}

, and then one iteratively constructs updated approximations via a regularization scheme. The regularization parameter associated with iterative regularization is thus the

“

stopping point”of the iterative sequence, and an important part of the mathematical theory is the development of stopping criteria for terminating the iteration. In other words, the iteration index plays the role of the regularization parameter

α

, and the stopping criteria plays of the parameter selection method.

5.1 Descent Methods for Linear Inverse Problems

As an example of iterative regularization, we consider descent methods. Descendent methods have become quite popular in the last years for the solution of linear inverse problems and for nonlinear inverse problems [11] . In this subsection we consider two examples.

As an approximation of

{\tilde{f}}_{m}

we will choose

f_{m, α}

such that

\begin{matrix} f_{m, α} = [I - A_{m}^{*} A_{m} Q_{α} (A_{m}^{*} A_{m})] f_{0} + Q_{α} (A_{m}^{*} A_{m}) A_{m}^{*} Π_{Y_{m}}^{n} y \end{matrix}

(5.1)

where

f_{0} \in F_{m}

is an initial approach and this

f_{0} \in N (A_{m})^{⊥}

[11] .

Most iterative methods for approximating

\tilde{f}

are based on a transformation of the normal equation into equivalent fixed point equations like

f = f + A_{m}^{*} (A_{m} f - y)

∥ A_{m} ∥^{2} < 2

then the corresponding fixed point operator

I - A_{m}^{*} A_{m}

is nonexpansive and one may apply the method of successive approximations. It must be emphasized that

I - A_{m}^{*} A_{m}

is no contradiction if our inverse problem is ill-posed, since the spectrum of

A_{m}^{*} A_{m}

clusters at the origin.

5.2 Landweber iteration

In this subsection we presented the well-known Landweber iteration, which arises from converting the necessary conditions for minimizing 2.1 into a fixed point iteration. Much development in the last few years has taken place in advancing the theory of Landweber iteration for linear and nonlinear inverse problems.

Using the terminology of the last sections, we introduce the function

\begin{matrix} Q_{k} (λ) = \sum_{j = 0}^{k - 1} (1 - λ)^{j} = λ^{- 1} (1 - (1 - λ)^{k}) \end{matrix}

(5.2)

We call

Q_{k}

the iteration polynomial of degree

k - 1

. Associated with it is the polynomial

r_{k} (λ) = 1 - λ Q_{k} (λ) = (1 - λ)^{k}

of degree

k

, which is called the residual polynomial since it determines the residual

y - A_{m} f_{m, k}

Thus, inserting the equation 5.2 in 5.1 we obtain recursively,

\begin{matrix} f_{m, k + 1} = f_{m, k} - A_{m}^{*} (A_{m} f_{m, k} - y_{m}), k = 0, 1, \dots \end{matrix}

(5.3)

starting from an initial guess

f_{0}

. This is a steepest descent method called the linear version of Landweber's iteration. Each step of the iterative process 5.3 is carried out along the direction opposite to the direction of the gradient of the quadratic functional

J (f)

in 2.1 . It is known that there is the greatest decrease of the functional along this direction.

∥ A_{m} ∥ \leq 1

, we considerer

λ \in (0, 1]

such that in this interval

λ Q_{k} (λ)

is uniformly bounded and since

Q_{k} (λ)

converge to

1 / λ

k \to \infty

then according to Theorem 3.2 the sequences

f_{m, k}

converge to

{\tilde{f}}_{m}

when

y \in D (A_{m}^{†})

. If

∥ A_{m} ∥

is not bounded by one, then we introduce a relaxation parameter

0 < τ < ∥ A_{m} ∥^{- 2}

in front of

A_{m}^{*}

in 5.3 , i.e, we would iterate

\begin{matrix} f_{m, k + 1} = f_{m, k} - τ A_{m}^{*} (A_{m} f_{m, k} - y), k = 0, 1, \dots \end{matrix}

(5.4)

τ \equiv τ_{k}

, one can obtain various variants of the method of steepest descent depending on a choice of the sequence

τ_{k}

. The Landweber iteration 5.4 is usually called a method of simple iteration.

In the following we derive a simple estimate for the error propagation in the Landweber iteration. We then have the following result,

Corollary 5.1. Let

τ = 1 / (2 ∥ A_{m} ∥^{2}) < 1 / λ_{1}

. If

y \in ℛ (A_{m})

, then the Landweber iteration is an order optimal regularization method, i.e,

∥ {\tilde{f}}_{m} - f_{k (m)} ∥^{2} \leq 2 c_{1} k^{- 2 μ} + 2 c_{2} \frac{σ^{2}}{n} (τ k)^{(2 p + 1) / 2 p},

where

c_{1} = ρ^{2} (\frac{μ}{τ e})^{μ}

and

c_{2} = \frac{1}{2 p + 1} (\frac{2 p + 1}{2 p - 1})^{(2 p + 1) / 4 p} .

Proof. To apply Theorem 3.4 we have to study the terms of the bias,

I E ∥ {\tilde{f}}_{m} - R_{α} A_{m} \tilde{f} ∥^{2}

, and variance

I E ∥ R_{α} A_{m} \tilde{f} - f_{k (m)} ∥

. By 5.2 we have

{\tilde{f}}_{m} - R_{α} A_{m} \tilde{f} = (I - A_{m}^{*} A_{m} Q_{k} (A_{m}^{*} A_{m})) {\tilde{f}}_{m} = (I - A_{m}^{*} A_{m})^{k} {\tilde{f}}_{m}

We have to study the residual polynomial

r_{k} (λ) = (1 - λ)^{k}

of the Landweber iteration.

For

0 \leq λ \leq ∥ A_{m} ∥^{2}

the function

λ^{μ} | 1 - λ Q_{k} (λ) |

assumes its maximum for

λ = τ^{- 1} μ (μ + k)^{- 1}

Thus, we have

\begin{matrix} λ^{μ} | 1 - λ Q_{k} (λ) | & \leq {max}_{0 \leq λ \leq ∥ A_{m} ∥^{2}} λ^{μ} | 1 - λ Q_{k} (λ) | \end{matrix}

\begin{matrix} < \frac{μ^{μ}}{τ^{μ} (μ + k)^{μ}} \frac{k^{k}}{(μ + k)^{k}} \end{matrix}

\begin{matrix} < {(\frac{μ}{τ e})}^{μ} k^{- μ} \end{matrix}

\begin{matrix}  \end{matrix}

This leads to numbers

ω_{μ} (k)

as introduced in Theorem 3.4

ω_{μ} (k) = {(\frac{μ}{τ e})}^{μ} k^{- μ}

Thus, the term corresponding to the bias is bounded by

∥ {\tilde{f}}_{m} - R_{α} A_{m} \tilde{f} ∥^{2} \leq ρ^{2} (\frac{μ}{τ e})^{2 μ} k^{- 2 μ} .

Next, we establish bounds for the variance term. By assumption, the singular values satisfy

λ_{j} \approx j^{- 2 p} .

Note that for small values of

λ_{j}

we have

Q_{k}^{2} (λ) \leq (τ k)^{2}

\forall

j > m^{'}

and for big values of

λ_{j}

(

λ_{j} \approx λ_{1}

)

Q_{k}^{2} (λ_{j}) \leq λ_{j}^{- 2}

\forall

j < m^{'}

Consequently,

\begin{matrix} n T r (Q_{k}^{2} (A_{m} A_{m}^{*}) A_{m} A_{m}^{*}) = \sum_{j = 1}^{m} Q_{k}^{2} (λ_{j}) λ_{j} & \leq \sum_{j = 1}^{m^{'}} λ_{j}^{- 1} + \sum_{j > m^{'}} (τ k)^{2} λ_{j} \end{matrix}

\begin{matrix} \leq \int_{0}^{m^{'}} s^{2 p} d s + (τ k)^{2} \int_{0}^{m^{'}} s^{- 2 p} d s \end{matrix}

\begin{matrix}  \end{matrix}

This suggest searching

m^{'} \approx c (τ k)^{1 / 2 p}

for

p > 1 / 2

, where

c = (\frac{2 p + 1}{2 p + 1})^{1 / 4 p}

. Hence we have,

\begin{matrix} I E ∥ R_{α} A_{m} \tilde{f} - f_{k (m)} ∥^{2} & = T r (Q_{k}^{2} (A_{m} A_{m}^{*}) A_{m} A_{m}^{*}) \end{matrix}

\begin{matrix} \leq \frac{c^{2 p + 1}}{2 p + 1} \frac{(τ k)^{(2 p + 1) / 2 p}}{n} . \end{matrix}

\begin{matrix}  \end{matrix}

Finally, this implies

I E ∥ {\tilde{f}}_{m} - f_{k (m)} ∥^{2} \leq 2 c_{1} k^{- 2 μ} + 2 c_{2} \frac{σ^{2}}{n} (τ k)^{(2 p + 1) / 2 p},

where

c_{1} = ρ^{2} (\frac{μ}{τ e})^{μ}

and

c_{2} = \frac{1}{2 p + 1} (\frac{2 p + 1}{2 p - 1})^{(2 p + 1) / 4 p}

□

Remark 5.2. Note that under the above inequality is satisfied if the dimension of the set is such that

d_{m_{0}} \approx n^{\frac{1}{4 μ p + 2 p + 1}}

. Here, the optimal choice of regularization sequence, depending on

p

and

μ

. The optimal rates are of order

I E ∥ \tilde{f} - f_{k (m)} ∥^{2} = O (n^{- \frac{4 μ p}{4 μ p + 2 p + 1}})

. Analogous results are obtained in the ill-posed problem literature, see for example [5] , where typically in a Hilbert scale setting optimal rates are of order

O (n^{- \frac{2 s}{2 s + 2 p + 1}})

, with

s = 2 μ p

We are ready to state our main result for the Landweber iteration, which bounds the mean squared error of the select estimate

{\hat{f}}_{\hat{k}}

basically by the smallest mean squared error among the estimates

f_{k}

plus a remainder term of order

1 / n

. The result follows from Theorem 4.1 .

Corollary 5.3. Let

τ = 1 / (2 ∥ A_{m} ∥^{2}) < 1 / λ_{1}

. Next assume

\hat{k}

as in 4.2 and

d_{m_{0}}

as in 4.1 . If

y \in ℛ (A_{m})

then for any

f \in F_{m}

and any

k

, the following inequality holds true

\begin{matrix} I E ∥ {\tilde{f}}_{m} - {\hat{f}}_{\hat{k}} ∥^{2} & \leq \frac{1}{(1 - ν)} {inf}_{k \in K} [C (1 + ν) ∥ {\tilde{f}}_{m} - f_{k} ∥^{2} + \frac{2 r σ^{2} (1 + L_{k}) (c (τ k)^{\frac{2 p + 1}{2 p}} + τ k)}{n}] \end{matrix}

\begin{matrix} + \frac{4 σ^{2}}{n} \sum_{k} \frac{τ k}{d} [\sqrt{d r L_{k} [c (τ k)^{1 / 2 p} + 1]} + 1] e^{- \sqrt{d r L_{k} [c (τ k)^{1 / 2 p} + 1]}}, \end{matrix}

\begin{matrix}  \end{matrix}

for some

C > 0

and

c = \frac{1}{2 p + 1} (\frac{2 p + 1}{2 p - 1})^{(2 p + 1) / 4 p}

Proof. For fixed

λ_{j}

and

m

we have that the terms of the trace and spectral radius are bounded by the follows expression

\begin{matrix} T r (R_{k}^{t} R_{k}) = \sum_{j = 1}^{m} Q_{k}^{2} (λ_{j}) λ_{j} \leq \sum_{j = 1}^{m^{'}} λ_{j}^{- 1} + \sum_{j > m^{'}} (τ k)^{2} λ_{j} \end{matrix}

(5.5)

and

\begin{matrix} ρ^{2} (R_{k}^{t} R_{k}) = {max}_{j \in m} Q_{k}^{2} (λ_{j}) λ_{j} \leq {max}_{j \leq m^{'}} λ_{j}^{- 1} + {max}_{j > m^{'}} (τ k)^{2} λ_{j} . \end{matrix}

(5.6)

Balancing both terms in 5.5 and 5.6 gives the optimal choice of the trace and the spectral radius, respectively. Thus, we have

T r (R_{k}^{t} R_{k}) \approx \frac{1}{2 p + 1} {(\frac{2 p + 1}{2 p - 1})}^{\frac{(2 p + 1)}{4 p}} \frac{(τ k)^{\frac{2 p + 1}{2 p}}}{n}

and

ρ^{2} (R_{k}^{t} R_{k}) \approx \frac{τ k}{n}

Note that the penalization term is roughly proportional to

\frac{1}{n} [\frac{1}{2 p + 1} {(\frac{2 p + 1}{2 p - 1})}^{\frac{(2 p + 1)}{4 p}} (τ k)^{\frac{2 p + 1}{2 p}} + τ k]

On the other hand

\frac{T r (R_{α}^{t} R_{α})}{ρ^{2} (R_{α})} = \frac{1}{2 p + 1} {(\frac{2 p + 1}{2 p - 1})}^{(2 p + 1) / 4 p} (τ k)^{1 / 2 p} .

The result then follows directly from Theorem 4.1 . □

5.3 Nonlinear multistep iterative process

Many approximate methods widely used in practice are nonlinear. We cite a important example of nonlinear approximate method. We considerer a nonlinear multistep iterative process, which have error residual

1 - λ Q_{k} (λ) =^{k} \prod_{i = 1} (1 - τ_{i k}^{- 1} λ)

with

τ_{i k} = τ_{i k} (f_{0}, A, y) > 0, 0 < τ_{1 k} \leq τ_{2 k} \dots \leq τ_{k k} \leq λ_{1}

. Then for

λ > 0

Q_{k} (λ)

have the following representation

Q_{k} (λ) = λ^{- 1} [1 -^{k} \prod_{i = 1} (1 - τ_{i k}^{- 1} λ)]

The following corollary is established.

Corollary 5.4. Let

τ_{i k} = τ_{i k} (f_{0}, A, y) > 0,

with

0 < τ_{1 k} \leq τ_{2 k} \dots \leq τ_{k k} \leq λ_{1}

. If

y \in ℛ (A_{m})

, then the nonlinear multistep iterative process is an order optimal regularization method, i.e,

I E ∥ {\tilde{f}}_{m} - f_{k (m)} ∥^{2} \leq 2 c_{1} (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{- 2 μ} + 2 c_{2} \frac{σ^{2}}{n} (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{(2 p + 1) / 2 p},

where

c_{1} = ρ^{2} μ^{μ} (μ + 1)^{- 1}

and

c_{2} = \frac{1}{2 p + 1} (\frac{2 p + 1}{2 p - 1})^{(2 p + 1) / 4 p} .

Proof. As before, we investigate the behavior of the bias and the variance.

In the relation

λ^{μ} (1 - λ Q_{k} (λ)) = {λ^{μ}}^{k} \prod_{i = 1} (1 - τ_{i k}^{- 1} λ)

the least upper can not be reached at the points

λ = 0

and

λ = τ_{1 k}

, since the estimated function is not equal to zero identically.

On the other hand

\begin{matrix} [1 - λ Q_{k} (λ)]^{'} [λ Q_{k} (λ) - 1]^{- 1} = \sum_{i = 1}^{k} \frac{1}{τ_{i k} - λ} \end{matrix}

(5.7)

Since the function in the right-hand of 5.7 does not decrease as a function of

λ

on the half-interval

[0, τ_{1 k})

then, the estimates of the Theorem 3.5 are valid.

Thus, for

0 \leq λ \leq τ_{1 k}

, we have

\begin{matrix} {sup}_{0 \leq λ \leq τ_{1 k}} | Q_{k} (λ) | & = Q_{k} (0) = \sum_{i = 1}^{k} τ_{i_{k}}^{- 1} \end{matrix}

\begin{matrix}  \end{matrix}

and

\begin{matrix} {sup}_{0 \leq λ \leq τ_{1 k}} λ^{μ} | 1 - λ Q_{k} (λ) | & < μ^{μ} (μ + 1)^{- 1} (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{- μ} \end{matrix}

\begin{matrix}  \end{matrix}

Note that

ω_{μ} (k) = (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{- μ} .

Thus, the bias is bounded by

∥ {\tilde{f}}_{m} - R_{α} A_{m} \tilde{f} ∥^{2} \leq c_{1} (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{- 2 μ},

where

c_{1} = ρ^{2} μ^{2 μ} (μ + 1)^{- 2}

On the order hand, it is not difficult to see that

\begin{matrix} T r (Q_{k}^{2} (A_{m} A_{m}^{*}) A_{m} A_{m}^{*}) & \leq \frac{1}{n} [\sum_{1 \leq j \leq m^{'}} j^{2 p} + (\sum_{i = 1}^{k} τ_{i k}^{- 1})^{2} \sum_{j > m^{'}} j^{- 2 p}] \leq \frac{1}{n} [\frac{{m^{'}}^{2 p + 1}}{2 p + 1} + (\sum_{i = 1}^{k} τ_{i k}^{- 1})^{2} \frac{{m^{'}}^{- 2 p + 1}}{2 p - 1}] \end{matrix}

\begin{matrix}  \end{matrix}

This suggest searching

m^{'} \approx c (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{1 / 2 p}

with

c = (\frac{2 p + 1}{2 p + 1})^{1 / 4 p}

Thus, we have that the term variance is bounded by

I E ∥ R_{α} A_{m} \tilde{f} - f_{k (m)} ∥ = σ^{2} T r (Q_{k}^{2} (A_{m}^{*} A_{m}) A_{m}^{*} A_{m}) \leq c_{2} \frac{σ^{2}}{n} (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{(2 p + 1) / 2 p}

where

c_{2} = \frac{1}{2 p + 1} (\frac{2 p + 1}{2 p - 1})^{(2 p + 1) / 4 p}

Finally we have

I E ∥ {\tilde{f}}_{m} - f_{k (m)} ∥^{2} \leq 2 c_{1} (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{- 2 μ} + 2 c_{2} \frac{σ^{2}}{n} (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{(2 p + 1) / 2 p}

Balancing the bias and variance terms gives the optimal choice

I E ∥ {\tilde{f}}_{m} - f_{k (m)} ∥^{2} = O (n^{- \frac{4 μ p}{4 μ p + 2 p + 1}}) .

□

We have the following result.

Corollary 5.5. Let

τ_{i k}

be as in corollary 5.4 . Next assume

\hat{k}

as in 4.2 and

d_{m_{0}}

as in 4.1 . If

y \in ℛ (A_{m})

then for any

f \in F_{m}

and any

k

, the following inequality holds true

\begin{matrix} I E ∥ {\tilde{f}}_{m} - {\hat{f}}_{\hat{k}} ∥^{2} & \leq \frac{1}{(1 - ν)} {inf}_{k \in K} [C (1 + ν) ∥ {\tilde{f}}_{m} - f_{k} ∥^{2} + \frac{2 r σ^{2} (1 + L_{k}) (c (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{\frac{2 p + 1}{2 p}} + \sum_{i = 1}^{k} τ_{i_{k}}^{- 1})}{n}] \end{matrix}

\begin{matrix} + \frac{4 σ^{2}}{n} \sum_{k} \frac{\sum_{i = 1}^{k} τ_{i_{k}}^{- 1}}{d} [\sqrt{d r L_{k} [c (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{1 / 2 p} + 1]} + 1] e^{- \sqrt{d r L_{k} [c (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{1 / 2 p} + 1]}}, \end{matrix}

\begin{matrix}  \end{matrix}

for some

C > 0

and

c = \frac{1}{2 p + 1} (\frac{2 p + 1}{2 p - 1})^{(2 p + 1) / 4 p}

Proof. First observe that $T r (R_{k}^{t} R_{k}) \approx \frac{1}{2 p + 1} {(\frac{2 p + 1}{2 p - 1})}^{\frac{(2 p + 1)}{4 p}} \frac{(\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{\frac{2 p + 1}{2 p}}}{n}$ and $ρ^{2} (R_{k}^{t} R_{k}) \approx \frac{\sum_{i = 1}^{k} τ_{i_{k}}^{- 1}}{n}$ Consequently $\frac{T r (R_{k}^{t} R_{k})}{ρ (R_{k})} \approx \frac{1}{2 p + 1} {(\frac{2 p + 1}{2 p - 1})}^{\frac{(2 p + 1)}{4 p}} (\sum_{i = 1}^{k} τ_{i_{k}}^{- 1})^{\frac{1}{2 p}}$ Note that both $n ρ^{2} (R_{k})$ and $T r (R_{k}^{t} R_{k}) / ρ^{2} (R_{k})$ do not depend on n. The proof then follows directly from theorem 4.1 .
□

References

Barron A. et al., (1999). “Risk Bounds for Model Selection via Penalization ”. Probab. Theory and Related Fields. 113, pp. 467-493.
Birg L., and Massart P., (1978). “Minimum Contrast Estimators on Sieves: Exponential Bounds and Rates of Convergence ”. Bernoulli, Vol4.N3, pp 329-395.
Bousquet O., (2002). “ Concentration Inequalities for Sub-Additive Functions Using of Entropy Method.”
Burger, M., (2001). “A level set method for inverse problems ”. Inverse Problems 17, pp. 13271356.
Cavalier L., Golubev G., Picard D., and Tsybakov A., (2002). “Oracle inequalities for inverse problem ”. Ann. Statist, Vol30. N3, pp 843-874.
Deuflhard P., Engl H. and Scherzer O., (1998) “A convergence analysis of iterative methods for the solution of nonlinear ill-posed problems under affinely invariant conditions ”. Inverse Problems Vol14, pp. 1081-1106.
Dey A.K. et al, (1996) “Cross-Validation for Parameter Selection in inverse estimation problems ”. Scand. J. Statist., Vol23. N4, pp. 609-620.
Engl H., (1993) “Regularization methods for the stable solution of inverse problems ”. Surveys on Mathematics for Industry 3, pp. 71-143.
Engl H., Hanke M. and Neubauer A., (1996) “Reguralization of Inverse Problems ”. Kluwer Academic Publishers.
Engl H. and Grever W., (1994) “Using the L-curve for Determinig Optimal Regularization parameters ”. Numer. MAth. Vol. 69, pp. 25-31.
Gilyazov S.F., and Gol'dman N.L., (2000). “Reguralization of Ill-Posed Problems by Iteration Methods”. Kluwer Academic Publishers.
Cohen A., Hoffmann M., and Reiss M. “Adaptive Wavelet Galerkin Methods for Linear Inverse Problems”. SIAM J. Numer. Anal. Vol. 42, No. 4, pp. 1479-1501.
Kilmer M.E, and O'leary D.P., (2001). “Choosing Reguralization Parameters in iterative Methods for Ill-Posed Problems”. SIAM J. MATRIX ANAL. APPL. Vol. 22, N4, pp. 1204-1221.
Lamm P.K., (1999).“Some Recent Developments and Open Problems in Solution Methods for Mathematical Inverse Problems”. Department of Mathematics, Michihan state University, USA.
Ledoux, M., and Talagrand, (1996). “Deviation Inequalities for Product Measures ”. ESAIM: Probabilities and Statistics 1, pp. 63-87.
Loubes J.-M. and Lude $\tilde{n}$ a C., (2004). “Penalized Estimators for Nonlinear Inverse Problems. ”.
Loubes J.-M., and Van De Geer S., (2002). “Adaptive estimation in regression, using soft thresholding type penalties ”. Statistica Neerlandica, 56, pp 453-478.
Lude $\tilde{n}$ a C., and Rios,(2003).“Penalized Model Selection for Ill-posed Linear Problems. ”.
Mathé P., and Pereverezev S.V., (2003) “Discretization Strategy for Linear Ill-posed problems in variable Hilbert Scales ”. Inverse Problems, Vol.19, N6, pp. 1263-1279.
Morozov V.A., (1966). “On the Solution of Functional Equations by the Method of Regularization ”. Soviet Math. Dokl.,7, pp. 414-417.
F. O'sullivan., (1986).“A Statistical Perspective on Ill-Posed Inverse Problems ”. Statistical Science. Vol. 1, N4, pp. 502-527.
Pereverzev S. and Schock E., (2003). “On the adaptive selection of the parameter in regularization of ill-posed problems ”.
Tikhonov, A.,and Arsenin, V., (1977).“Solutions of Ill-Posed Problems ”. Wiley, New York.

Escuela de Matematicas, Facultad de Ciencias, UCV, Av. Los Ilustres, Los Chaguaramos, Codigo Postal 1020-A, Caracas Venezuela. Telf.: (58)212-6051481.
Departamento de Matematicas, IVIC, Carretera Panamericana KM. 11, Aptdo. 21827, Codigo Postal 1020-A, Caracas Venezuela. Telf.: (58)212-5041412.
E-mail address : afermin@euler.ciens.ucv.ve E-mail address : cludena@ivic.ve

Ana K. Fermín

Carenne Luden͂a