A stochastic approximation algorithm with multiplicative step size adaptation, ,

In particular, if

P (ξ_{1} > 0) = P (ξ_{1} < 0) = 1 / 2

and

P (ξ_{1} = x) = 0

for any

x \in R

, it is established that for

u d < 1

, convergence takes place, and for

u d > 1

, divergence. Due to the multiplicative rule of updating of

γ_{t}

, it is natural to expect that

{x_{t}}

converges rapidly: like a geometric progression (if convergence takes place), but the limit value may not coincide with, but instead, approximates one of zeros of

φ

. By adjusting the parameters

u

and

d

, one can reach necessary precision of approximation; higher precision is obtained at the expense of lower convergence rate.

Key words: stochastic approximation, accelerated convergence algorithms, step size adaptation.

AMS subject classification: 62L20 (Stochastic approximation), 90C15 (Stochastic programming), 93B30 (System identification)

1 Introduction

Consider the problem of finding a zero of a function

φ : R \to R

. If there are several zeros, it is required to find at least one of them. It is supposed that the function can be measured at any point, with some random error. The standard algorithm of stochastic approximation consists in calculating successive approximations of the required value,

x_{0}

x_{1}

x_{2}, \dots

, according to the rule

\begin{matrix} x_{t} = x_{t - 1} - γ_{t - 1} y_{t}, t = 1, 2, \dots, \end{matrix}

(1)

where

\begin{matrix} y_{t} = φ (x_{t - 1}) + ξ_{t} \end{matrix}

(2)

is the value of

φ

measured at

x_{t - 1}

ξ_{t}

is the measurement error;

γ_{0}

γ_{1}

γ_{2}, \dots

is the sequence of step sizes of the algorithm. Usually it is assumed that the step sizes are positive real numbers satisfying the relations

\sum γ_{t} = \infty

\sum γ_{t}^{2} < \infty

. Then, under some additional assumptions on

φ

and

ξ_{t}

, the algorithm a.s. converges to a zero point of

φ

(see, e.g., [1, 2] ). In practice, however, the convergence rate of this algorithm may prove to be unsatisfactory, therefore, when solving practical tasks, various modifications of the algorithm are used. There are widely utilized heuristical algorithms using random, rather than deterministic, step size, which is corrected in the course of the algorithm, according to the current data [3, 6, 9, 11] . In particular, there is used the idea that prescribes to decrease the step size if the sequence of increments

x_{t} - x_{t - 1}

changes the sign often enough, indicating that the current value

x_{t}

is close to the set of zeros of

φ

, and hence, the measurement error

ξ_{t}

of the function is big enough with respect to the function itself

φ (x_{t - 1})

. Alternatively, one should increase the step size, or leave it unchanged. So, Kesten in the theoretical work [7] considered an algorithm using ( 1 ), ( 2 ), and the rule of modification of

γ_{t}

\begin{matrix} γ_{t} = γ (s_{t}), s_{t} = {\begin{matrix} s_{t - 1} & if & y_{t - 1} y_{t} > 0 \\ s_{t - 1} + 1 & if & y_{t - 1} y_{t} \leq 0, \end{matrix} t = 2, 3, \dots . \end{matrix}

(3)

where

s_{0} = 0

s_{1} = 1

;

γ (0)

γ (1)

γ (2), \dots

is a sequence of positive numbers satisfying the relations

\sum γ (m) = \infty

\sum γ^{2} (m) < \infty

. Thus, the step size cannot increase in the course of algorithm; it can only decrease or remain unchanged. It is supposed that there is a unique zero of

φ

. Kesten proved that

x_{t}

a.s. converges to this zero point. A multidimensional version of this algorithm is considered in [8] .

There are also heuristical procedures (in particular, in artificial neural networks), where at each moment

t

the step size is multiplied by a positive constant less than 1, if the measurement data indicate that

x_{t}

is close enough to the zero set of

φ

, and by a constant more than 1, elsewhere [4, 5, 9, 10] . This kind of rules ensure sufficiently high convergence rate, however the step size converges like a geometric progression, therefore

\sum γ_{t} < \infty

, which means that the limit of

{x_{t}}

need not be a zero point of

φ

, but instead, the sequence may ”get stuck” on its way to the set of zeros of

φ

. Nevertheless, such a procedure may be justified if it gives a value close enough to one of the zeros of

φ

. In the present paper, a stochastic approximation algorithm utilizing this rule of step size modification is considered. Namely, the rule ( 1 ), ( 2 ), jointly with the following rule

\begin{matrix} γ_{t} = {\begin{matrix} min {u γ_{t - 1}, \bar{g}} & if & y_{t - 1} y_{t} > 0, \\ d γ_{t - 1} & if & y_{t - 1} y_{t} \leq 0, \end{matrix} t = 2, 3, \dots . \end{matrix}

(4)

is used. Here

0 < d < 1 < u

0 < γ_{0}

γ_{1} \leq \bar{g}

\bar{g}

is a positive constant.

Let us point out the main differences between ( 4 ) and Kesten's rule ( 3 ). First, according to ( 4 ),

γ_{t}

can both decrease and increase. Second, in Kesten's algorithm one always has

\sum γ_{t} = \infty

. On the other hand, it looks likely that in the case of convergence of the algorithm ( 1 ), ( 2 ), ( 4 ),

γ_{t}

converges like a geometric progression (this conjecture will be justified in the section 3), therefore the limit of algorithm may not be a zero point of

φ

Suppose that

{ξ_{t}}

is a sequence of i.i.d.r.v. with zero mean, besides

P (ξ_{t} > 0) = P (ξ_{t} < 0)

. Under some additional assumptions on

φ

ξ_{t}

, and

\bar{g}

, stated below, the process defined by ( 1 ), ( 2 ), ( 4 ) a.s. diverges if

u d > 1

, and converges if

u d < 1

, moreover the limit of

{x_{t}}

belongs to

U (\frac{ln u}{- ln d})

. Here

U (λ)

0 < λ < 1

, is a monotone decreasing family of sets of real numbers, besides every set

U (λ)

contains the set

Z

of zeros of

φ

, and

\partial (U (λ), Z) \to 0

λ \to 1^{-}

. (Here by definition

\partial (A, B) = {sup}_{x \in A} {inf}_{y \in B} | x - y |

for any two sets of real numbers

A

and

B

.) This statement is a consequence of the main theorem, which will be stated in section 2 and proved in section 3. Thus, by adjusting the parameters

u

and

d

(for example, fixing

u

and letting

d \to 1 / u - 0

), one can reach necessary precision of the algorithm; higher precision is obtained at the expense of lower convergence rate.

2 Definition of the algorithm and statement of the main result

Consider the algorithm given by ( 1 ), ( 2 ), ( 4 ). The rule ( 4 ) means that at each instant

t

, step size is multiplied by

u

or by

d

, if the result of multiplication is less than

\bar{g}

; otherwise, step size is set to be

\bar{g}

. Thus, the maximal possible value of step size equals

\bar{g}

The rule ( 4 ) can be written in the form

\begin{matrix} \begin{matrix} ln {\tilde{γ}}_{t} & = & ln γ_{t - 1} + ln u \cdot I (y_{t - 1} y_{t} > 0) + ln d \cdot I (y_{t - 1} y_{t} \leq 0), \\ ln γ_{t} & = & min {ln {\tilde{γ}}_{t}, ln \bar{g}} . \end{matrix} \end{matrix}

(5)

Let us take the following assumptions:

A1 Denote $ℱ_{t}$ , $t = 0, 1, 2, \dots$ the $σ$ -algebra generated by $x_{i}$ , $γ_{i}$ , and $ξ_{i}$ , $0 \leq i \leq t$ ; then $ξ_{t + 1}$ does not depend on $ℱ_{t}$ .
A2 The values $ξ_{t}$ are identically distributed, with zero mean and finite variance: $E ξ_{t} = 0$ , $V a r ξ_{t} = : S < + \infty$ .
A3 (a) There exists $L > 0$ such that for any interval $I \subset [- L, L]$ , $P (ξ_{1} \in I) > 0$ ; (b) $P (ξ_{1} = 0) = 0$ .
A4 $φ \in C^{1} (R)$ and ${sup}_{x} | φ^{'} (x) | = : M < \infty$ .
A5 $\bar{g} < 2 / M$ .
A6 There exists R > 0 such that
- (a) $x φ (x) > 0$ as $| x | \geq R$ , and
- (b) ${inf}_{| x | \geq R} φ^{2} (x) > \frac{\bar{g} M S}{2 - \bar{g} M}$ .

Remark 1 From A4 and A6 (a) it follows that the set

Z

is non-empty and is contained in

(- R, R)

Remark 2 Note that assumptions A4–A6 guarantee convergence of the deterministic counterpart of algorithm ( 1 ), ( 2 ), ( 4 ) (that is, of the algorithm with

ξ_{t} \equiv 0

Moreover, under these conditions, any deterministic algorithm

x_{t} = x_{t - 1} - γ_{t - 1} φ (x_{t - 1})

converges, whatever the sequence

{γ_{t}}

satisfying

γ_{t} \leq \bar{g}

Introduce the functions:

\begin{matrix} k_{+} (z) : = {lim}_{ε \to 0^{+}} sup {P ((φ_{1} + ξ_{1}) (φ_{2} + ξ_{2}) > 0), | φ_{1} - z | < ε, | φ_{2} - z | < ε}, \end{matrix}

(6)

\begin{matrix} k_{-} (z) : = {lim}_{ε \to 0^{+}} inf {P ((φ_{1} + ξ_{1}) (φ_{2} + ξ_{2}) > 0), | φ_{1} - z | < ε, | φ_{2} - z | < ε}; \end{matrix}

(7)

one has

k_{+} (z) \geq 1 / 2

0 \leq k_{\pm} (z) \leq 1

{lim}_{z \to \infty} k_{\pm} (z) = 1

. Further, define the sets of real numbers

\begin{matrix} V_{\pm}^{(a)} : = {x : k_{\pm} (φ (x)) < a}, V_{\pm}^{[a]} : = {x : k_{\pm} (φ (x)) \leq a}; \end{matrix}

(8)

obviously,

V_{+}^{(a)} \subset V_{-}^{(a)}

V_{\pm}^{(a)} \subset V_{\pm}^{[a]}

for any

a

Note that

V_{+}^{(a)}

is open. Indeed, let

x \in V_{+}^{(a)}

, then there exists

ε > 0

such that

sup {P ((φ_{1} + ξ_{1}) (φ_{2} + ξ_{2}) > 0), | φ_{1} - φ (x) | < ε, | φ_{2} - φ (x) | < ε} = : c < a .

Then for

x^{'}

close enough to

x

one has

| φ (x^{'}) - φ (x) | < ɛ / 2

, hence

sup {P ((φ_{1} + ξ_{1}) (φ_{2} + ξ_{2}) > 0), | φ_{1} - φ (x^{'}) | < ε / 2, | φ_{2} - φ (x^{'}) | < ε / 2} \leq c < a .

This implies that

k_{+} (φ (x^{'})) < a

, hence

x^{'} \in V_{+}^{(a)}

Denote also

\begin{matrix} k : = \frac{ln (1 / d)}{ln (u / d)} . \end{matrix}

(9)

Denote by

Z

the set of zeros of

φ

, i.e.,

Z : = {x : φ (x) = 0}

. Suppose that

x \in V_{+}^{(k)}

x_{t - 2} \in (x - ε, x + ε) \subset V_{+}^{(k)}

, and

γ_{t - 2} < ε

, where

ε

is a small positive number. Then, with a probability close to 1,

x_{t - 1}

also belongs to a small (possibly larger) neighborhood of

x

contained in

V_{+}^{(k)}

, and taking into account ( 6 ) and ( 8 ), one gets

\begin{matrix} P (y_{t - 1} y_{t} > 0 | | x_{t - 2} - x | < ε, γ_{t - 2} < ε) = \end{matrix}

\begin{matrix} = P ((φ (x_{t - 2}) + ξ_{t - 1}) (φ (x_{t - 1}) + ξ_{t}) > 0 | | x_{t - 2} - x | < ε, γ_{t - 2} < ε) < k . \end{matrix}

Then, using ( 5 ) and ( 9 ), one obtains

\begin{matrix} E [ln γ_{t} - ln γ_{t - 1} | | x_{t - 2} - x | < ε, γ_{t - 2} < ε] \leq \end{matrix}

\begin{matrix} ln u \cdot P (y_{t - 1} y_{t} > 0 | | x_{t - 2} - x | < ε, γ_{t - 2} < ε) + ln d \cdot P (y_{t - 1} y_{t} \leq 0 | | x_{t - 2} - x | < ε, γ_{t - 2} < ε) \end{matrix}

\begin{matrix} < ln u \cdot k + ln d \cdot (1 - k) = 0 . \end{matrix}

Thus, in a sense, the set

V_{+}^{(k)}

can be regarded to be a domain of decrease of step size: if several consecutive values of

x_{t}

belong to

V_{+}^{(k)}

and are close enough to each other, and if the first term of the sequence of corresponding step sizes

γ_{t}

is small enough, then the sequence of their mean values

E γ_{t}

decreases.

Now, suppose that

x \in R \ V_{-}^{[k]}

x_{t - 2} \in (x - ε, x + ε) \subset R \ V_{-}^{[k]}

, and that

γ_{t - 2} < ε

. Analogously, for

ε

small enough, one has

P (y_{t - 1} y_{t} > 0 | | x_{t - 2} - x | < ε, γ_{t - 2} < ε) > k,

and then, using again ( 5 ) and ( 9 ) and taking into account that for

ε < \bar{g} / u^{2}

{\tilde{γ}}_{t} = γ_{t}

, one obtains

\begin{matrix} E [ln γ_{t} - ln γ_{t - 1} | | x_{t - 2} - x | < ε, γ_{t - 2} < ε] = \end{matrix}

\begin{matrix} ln u \cdot P (y_{t - 1} y_{t} > 0 | | x_{t - 2} - x | < ε, γ_{t - 2} < ε]) + ln d \cdot P (y_{t - 1} y_{t} \leq 0 | | x_{t - 2} - x | < ε, γ_{t - 2} < ε]) \end{matrix}

\begin{matrix} > ln u \cdot k + ln d \cdot (1 - k) = 0 . \end{matrix}

Thus, the set

R \ V_{-}^{[k]}

can be regarded as a domain of increase of step size: if several consecutive values of

x_{t}

belong to

R \ V_{-}^{[k]}

and are close enough to each other, and if the first of the corresponding values of

γ_{t}

is small enough, then the sequence of their mean values

E γ_{t}

increases.

Note that if

k > k_{+} (0)

then, by virtue of ( 8 ),

Z \subset V_{+}^{(k)}

, that is, all the zeros of

φ

belong to the region of decrease of step size. On the other hand, if

k < {inf}_{z} k_{-} (z)

then

V_{-}^{[k]} = \emptyset

, which means that the region of increase of step size coincides with

R

It seems likely that in the first case the algorithm can converge, and in the second one, cannot. This conjecture is confirmed by the following theorem, which is the main result of the paper.

Theorem Let the assumptions A1–A6 be satisfied; consider the process

{x_{t}, γ_{t}}

defined by ( 1 ), ( 2 ), ( 4 ). Recall that

k = \frac{ln (1 / d)}{ln (u / d)}

. Then (a) If

k > k_{+} (0)

then

{x_{t}}

a.s. converges to a point from

V_{-}^{[k]}

(b) If

k < {inf}_{z} k_{-} (z)

then

{x_{t}}

a.s. diverges.

Suppose that

P (ξ_{1} = x) = 0

for any real

x

and that

P (ξ_{1} > 0) = P (ξ_{1} < 0)

Then the function

k (\cdot) : = k_{+} (\cdot)

coincides with

k_{-} (\cdot)

, is continuous, and is given by

k (z) = P ((z + ξ_{1}) (z + ξ_{2}) > 0);

z = 0

is the unique minimum of

k (\cdot)

, and

k (0) = {inf}_{z} k (z) = 1 / 2

. After a simple algebra, one can rewrite the hypotheses of theorem in the form (a)

u d < 1

, (b)

u d > 1

. Denote

U (λ) : = V^{[\frac{1}{1 + λ}]} = {x : k (φ (x)) \leq \frac{1}{1 + λ}}

;

U (λ)

1 < λ < 1

is a monotone decreasing family of sets containing

Z

and tending to

Z

λ \to 1^{-}

Thus, one comes to Corollary Let, in addition to assumptions A1–A6,

P (ξ_{1} = x) = 0

for any

x \in R

, and

P (ξ_{1} > 0) = P (ξ_{1} < 0) = 1 / 2

. Consider the process defined by ( 1 ), ( 2 ), ( 4 ). Then there exists a monotone decreasing family of sets

U (λ)

0 < λ < 1

such that

U (λ) \supset Z

\partial (U (λ), Z) \to 0

λ \to 1^{-}

, and (a) if

u d < 1

then

{x_{t}}

a.s. converges to a point from

U (\frac{ln u}{- ln d})

; (b) if

u d > 1

then

{x_{t}}

a.s. diverges.

Remark 3 Theorem does not give any information about behavior of the algorithm for the values

u

d

such that

{inf}_{z} k_{-} (z) \leq \frac{ln (1 / d)}{ln (u / d)} \leq k_{+} (0) .

In particular, under the hypotheses of corollary, the case

u d = 1

remains unexplored. These issues will be addressed elsewhere.

3 Proof of theorem

First we prove 10 auxiliary lemmas, and then, basing on them, we prove theorem.

Here all statements about random variables are supposed to be true almost surely.

In the sequel, we shall mainly designate random values by Greek letters, and real numbers and functions from

R

R

, by Latin ones; the letters

t

i

j

s

will denote integer non-negative numbers. The function

φ

and the random values

x_{t}

y_{t}

are exceptions; also, traditional notation

ε

δ

for small positive numbers will be used.

Lemma 1 If

\sum_{t} γ_{t} < \infty

then the sequence

{x_{t}}

converges.

Proof. Note that without loss of generality one can assume that

x_{0}

is bounded. Indeed, replacing

x_{0}

{\tilde{x}}_{0} = x_{0} \cdot I (| x_{0} | < X)

changes the process only with probability

P (| x_{0} | > X)

. By taking

X

large enough, one can make this probability arbitrarily small.

Let

C > 0

; define the stopping time

τ_{C} = inf {t : \sum_{i = 0}^{t} γ_{i} > C}

and introduce the new process

x_{t}^{C}

γ_{t}^{C}

\begin{matrix} x_{t}^{C} = x_{t}, γ_{t}^{C} = γ_{t} as t < τ_{c}, and \end{matrix}

\begin{matrix} x_{t}^{C} = x_{τ_{C}}, γ_{t}^{C} = 0 as t \geq τ_{c} . \end{matrix}

First, let us prove that the sequence

{x_{t}^{C}}

is bounded. Designate

M_{R} : = {sup}_{| x | \geq R} \frac{φ (x)}{x}

; from A4 it follows that

M_{R} < \infty

. One has

\begin{matrix} | x_{t}^{C} | \leq | x_{t - 1}^{C} - γ_{t - 1}^{C} φ (x_{t - 1}^{C}) | + γ_{t - 1}^{C} | ξ_{t} | . \end{matrix}

(10)

Using that

γ_{t - 1}^{C} \leq C

and

| φ (x_{t - 1})^{C} | \leq | φ (0) | + M | x_{t - 1}^{C} |

, one obtains

\begin{matrix} | x_{t}^{C} | \leq | x_{t - 1}^{C} | (1 + C M) + γ_{t - 1}^{C} (| φ (0) | + | ξ_{t} |) . \end{matrix}

(11)

γ_{t - 1}^{C} \leq 2 / M_{R}

, an even more precise estimate for

x_{t}^{C}

can be obtained. We shall distinguish between two cases: (i)

| x_{t - 1} | \leq R

and (ii)

| x_{t - 1}^{C} | > R

In case (i), designating

\bar{b} : = {sup}_{| x | \leq R} | φ (x) |

, one has

\begin{matrix} | x_{t - 1}^{C} - γ_{t - 1}^{C} φ (x_{t - 1}^{C}) | \leq | x_{t - 1}^{C} | + γ_{t - 1}^{C} \bar{b} . \end{matrix}

(12)

In the case (ii) one has

0 \leq γ_{t - 1}^{C} \frac{φ (x_{t - 1}^{C})}{x_{t - 1}^{C}} \leq \frac{2}{M_{R}} M_{R} = 2,

hence

\begin{matrix} | x_{t - 1}^{C} - γ_{t - 1}^{C} φ (x_{t - 1}^{C}) | \leq | x_{t - 1}^{C} | . \end{matrix}

(13)

Thus, in both cases (i) and (ii), from ( 10 ), ( 12 ), and ( 13 ) one gets

\begin{matrix} | x_{t}^{C} | \leq | x_{t - 1}^{C} | + γ_{t - 1}^{C} (\bar{b} + | ξ_{t} |) . \end{matrix}

(14)

The overall number of values of

t

such that

γ_{t - 1}^{C} \leq 2 / M_{R}

is less than

C M_{R} / 2

; therefore, using ( 11 ) and ( 14 ), one concludes that

\begin{matrix} | x_{t}^{C} | \leq (| x_{0} | + \sum_{i = 1}^{t} γ_{i - 1}^{C} (\bar{b} + | φ (0) | + | ξ_{i} |)) \cdot (1 + C M)^{C M_{R} / 2} . \end{matrix}

(15)

Denote

c_{0} : = \bar{b} + | φ (0) | + E | ξ_{1} |

and

ζ_{t} : = | ξ_{t} | - E | ξ_{t} |

; using that

\sum_{1}^{\infty} γ_{i - 1}^{C} \leq C

one gets

\begin{matrix} | x_{t}^{C} | \leq (| x_{0} | + C c_{0} + \sum_{i = 1}^{t} γ_{i - 1}^{C} ζ_{i}) \cdot (1 + C M)^{C M_{R} / 2} . \end{matrix}

(16)

Using that

\sum_{1}^{\infty} E (γ_{t - 1}^{C} ζ_{t})^{2} = E ζ_{1}^{2} \cdot \sum_{1}^{\infty} E (γ_{t - 1}^{C})^{2} < \infty

, one obtains that the martingale

\sum_{1}^{t} γ_{i - 1}^{C} ζ_{i}

is bounded; the value

x_{0}

is also bounded, so, by ( 16 ), one concludes that the sequence

{x_{t}^{C}}

is bounded.

Now, let us show that

{x_{t}^{C}}

converges. From the definition of

x_{t}^{C}

and

γ_{t}^{C}

it follows that

x_{t}^{C} = x_{0} - \sum_{1}^{t} γ_{i - 1}^{C} φ (x_{i - 1}^{C}) - \sum_{1}^{t} γ_{i - 1}^{C} ξ_{i} .

Using that the sequence

{φ (x_{i - 1}^{C})}

is bounded and that

\sum_{1}^{\infty} γ_{i - 1}^{C} \leq C

, one gets that the series

\sum_{1}^{\infty} γ_{i - 1}^{C} φ (x_{i - 1}^{C})

converges. Further, one has

\sum_{1}^{\infty} E (γ_{t - 1}^{C} ξ_{t})^{2} = S \cdot \sum_{1}^{\infty} E (γ_{t - 1}^{C})^{2} < \infty,

hence the martingale

\sum_{1}^{t} γ_{i - 1}^{C} ξ_{i}

converges. This implies that

{x_{t}^{C}}

also converges.

Define the events

A_{C} = {\sum_{t} γ_{t} \leq C}

and

A_{\infty} = {\sum_{t} γ_{t} < \infty}

. One has

A_{\infty} = \cup_{C} A_{C}

. If

\sum_{t} γ_{t} \leq C

then

x_{t}^{C} = x_{t}

for any

t

; this means that

I (A_{C}) \cdot (x_{t}^{C} - x_{t}) = 0

for any

t

and

C

. The sequence

{I (A_{C}) x_{t}^{C}}

converges, therefore the sequence

{I (A_{C}) x_{t}}

also converges, and passing to the limit

C \to \infty

one obtains that

{I (A_{\infty}) x_{t}}

converges. This means exactly that if

\sum_{t} γ_{t} < \infty

then

{x_{t}}

converges.

□

Lemma 2 If

{lim}_{t \to \infty} x_{t} = x

then

x \in V_{-}^{[k]}

Proof. Note that, using A3 (a), it is easy to show that there exists

δ_{0} > 0

such that

P (ξ_{1} \notin [x - L / 2, x + L / 2]) > δ_{0}

, whatever

x \in R

Next, for any

x \notin V_{-}^{([k])}

there exist

w (x) > 0

and

0 < ε (x) < L / 4

such that the following holds: for any two random variables

φ_{1}

and

φ_{2}

satisfying the relations

| φ_{l} - φ (x) | \leq ε (x)

l = 1, 2

one has

P ((φ_{1} + ξ_{1}) (φ_{2} + ξ_{2}) > 0) > \frac{ln (1 / d) + w (x)}{ln u + ln (1 / d)} .

Choose a countable set of intervals

U_{i} = (φ (x_{i}) - ε (x_{i}), φ (x_{i}) + ε (x_{i}))

covering the set

φ (R \ V_{-}^{[k]})

, and denote

w_{i} : = w (x_{i})

. Fix

i

and

s \in {0, 1, 2, \dots}

, and define the auxiliary process

x_{t}^{(i s)}

γ_{t}^{(i s)}

by formulas:

t < s

then

x_{t}^{(i s)} = x_{t}

, and if

t \geq s

then

\begin{matrix} x_{t}^{(i s)} = {\begin{matrix} x_{t - 1}^{(i s)} - γ_{t - 1}^{(i s)} y_{t}^{(i s)} & if φ (x_{t - 1}^{(i s)} - γ_{t - 1}^{(i s)} y_{t}^{(i s)}) \in U_{i}, \\ x_{i} & elsewhere; \end{matrix} \end{matrix}

(17)

\begin{matrix} y_{t}^{(i s)} = φ (x_{t - 1}^{(i s)}) + ξ_{t}, \end{matrix}

(18)

\begin{matrix} γ_{t}^{(i s)} = {\begin{matrix} min {u γ_{t - 1}^{(i s)}, \bar{g}} & if & y_{t - 1}^{(i s)} y_{t}^{(i s)} > 0, \\ d γ_{t - 1}^{(i s)} & if & y_{t - 1}^{(i s)} y_{t}^{(i s)} \leq 0 . \end{matrix} \end{matrix}

(19)

So, as

t \geq s

φ (x_{t}^{(i s)})

is forced to be contained in

U_{i}

For

t \geq s + 2

, using that

y_{t - 1}^{(i s)} = φ (x_{t - 2}^{(i s)}) + ξ_{t - 1}

y_{t}^{(i s)} = φ (x_{t - 1}^{(i s)}) + ξ_{t}

φ (x_{t - 2}^{(i s)}) \in U_{i}

, one obtains that

P (y_{t - 1}^{(i s)} y_{t}^{(i s)} > 0) > \frac{ln (1 / d) + w_{i}}{ln u + ln (1 / d)}

and

P (y_{t - 1}^{(i s)} y_{t}^{(i s)} \leq 0) < \frac{ln u - w_{i}}{ln u + ln (1 / d)},

hence

E [ln u \cdot I (y_{t - 1}^{(i s)} y_{t}^{(i s)} > 0) + ln d \cdot I (y_{t - 1}^{(i s)} y_{t}^{(i s)} \leq 0)] >

> ln u \cdot \frac{ln (1 / d) + w_{i}}{ln u + ln (1 / d)} + ln d \cdot \frac{ln u - w_{i}}{ln u + ln (1 / d)} = w_{i} .

Consider variables

φ_{1} = f_{1} (ξ_{1}, ξ_{2})

and

φ_{2} = f_{2} (ξ_{1}, ξ_{2})

providing a solution of the (deterministic) minimization problem:

(φ_{1} + ξ_{1}) (φ_{2} + ξ_{2}) \to min,

subject to

\begin{matrix} | φ_{1} - φ (x_{i}) | \leq ε (x_{i}) \end{matrix}

\begin{matrix} | φ_{2} - φ (x_{i}) | \leq ε (x_{i}), \end{matrix}

\begin{matrix}  \end{matrix}

and denote

Y_{t - 1}^{1} = f_{1} (ξ_{t - 1}, ξ_{t}) + ξ_{t - 1}

Y_{t}^{2} = f_{2} (ξ_{t - 1}, ξ_{t}) + ξ_{t}

η_{t} = ln u \cdot I (Y_{t - 1}^{1} Y_{t - 1}^{2} > 0) + ln d \cdot I (Y_{t - 1}^{1} Y_{t - 1}^{2} \leq 0)

. One has (i)

η_{t} \leq ln u \cdot I (y_{t - 1}^{(i s)} y_{t}^{(i s)} > 0) + ln d \cdot I (y_{t - 1}^{(i s)} y_{t}^{(i s)} \leq 0)

; (ii)

η_{t}

are identically distributed, and

E η_{t} \geq w_{i}

; (iii) the set of random variables

{η_{t}, t even, t \geq s + 2}

as well as the set

{η_{t}, t odd, t \geq s + 2}

, are mutually independent.

From (ii)–(iii) it follows that almost surely

\sum_{t} η_{t} = + \infty

, and from (i) it follows that

\sum_{t} [ln u \cdot I (y_{t - 1}^{(i s)} y_{t}^{(i s)} > 0) + ln d \cdot I (y_{t - 1}^{(i s)} y_{t}^{(i s)} \leq 0)] = + \infty,

so, by virtue of ( 19 ),

γ^{(i s)}

does not go to zero.

Thus, there exists a random value

χ > 0

such that for infinitely many values of

t

γ_{t}^{(i s)} \geq χ

Define a sequence of stopping times

τ_{0}

τ_{1}

τ_{2}, \dots

inductively, letting

τ_{0} = 0

and

τ_{j} = inf {t > τ_{j - 1} : γ_{t}^{(i s)} \geq χ}

for

j \geq 1

. The events

B_{j} = {| ξ_{τ_{j} + 1} + φ (x_{i}) | > L / 2}

happen with probability more that

δ_{0}

(recall the remark done in the beginning of proof ), and every event

B_{j}

j \geq 2

does not depend on the set of events

{B_{1}, \dots, B_{j - 1}}

. Therefore, for infinitely many values of

j

B_{j}

, takes place, i.e.,

| ξ_{τ_{j} + 1} + φ (x_{i}) | > L / 2

, and hence, taking into account that

| y_{τ_{j} + 1} | \geq | ξ_{τ_{j} + 1} + φ (x_{i}) | - | φ (x_{τ_{j}}) - φ (x_{i}) |

and

| φ (x_{τ_{j}}) - φ (x_{i}) | < ε (x_{i}) < L / 4

, for these values of

j

one has

| y_{τ_{j} + 1} | \geq L / 4

. Thus, one concludes that

\begin{matrix} for infinitely many values of j, | γ_{τ_{j}} y_{τ_{j} + 1} | \geq χ L / 4 . \end{matrix}

(20)

Suppose that

x_{t}

converges to a point from

R \ V_{-}^{[k]}

, then for some

i

and

s

one has

x_{t} \in U_{i}

t \geq s

, hence the process

x_{t}^{(i s)}

γ_{t}^{(i s)}

coincides with

x_{t}

γ_{t}

, and therefore

γ_{t} y_{t + 1} \to 0

t \to \infty

. The last relation contradicts ( 20 ), thus Lemma 2 is proved.

□

Lemma 3 Let

\sum_{t} γ_{t} = \infty

. Then for any open set

O

containing

Z

there exists a positive constant

g = g (O)

such that either (i) for some

t

x_{t} \in O

, or (ii) for some

t

| x_{t} | < R

and

γ_{t} > g

Proof. Designate by

f

the primitive of

φ

such that

{inf}_{x} f (x) = 0

. Define the stopping time

τ = τ (O, g) : = inf {t : either (i) x_{t} \in O, or (ii) | x_{t} | < R and γ_{t} \geq g} .

The value of

g \in (0, \bar{g})

will be specified below.

Consider the sequence

E_{t} = E [f (x_{t}) I (t < τ)]

. Introducing shorthand notation

f (x_{t}) = : f_{t}

I (t < τ) = : I_{t}

f^{'} (x_{t}) = : f_{t}^{'} = φ_{t}

, and using that

I_{t} \leq I_{t - 1}

, one gets

\begin{matrix} E_{t} - E_{t - 1} = E [f_{t} I_{t} - f_{t - 1} I_{t - 1}] \leq E [(f_{t} - f_{t - 1}) I_{t - 1}] . \end{matrix}

(21)

Next, we utilize the Taylor decomposition

f_{t} = f (x_{t - 1} - γ_{t - 1} y_{t}) = f_{t - 1} - f_{t - 1}^{'} γ_{t - 1} y_{t} + \frac{1}{2} f^{''} (x^{'}) γ_{t - 1}^{2} y_{t}^{2},

x^{'}

being some point between

x_{t - 1}

and

x_{t}

. Substituting

y_{t} = φ_{t - 1} + ξ_{t}

and recalling that

f_{t - 1}^{'} = φ_{t - 1}

and

f^{''} (x^{'}) = φ^{'} (x^{'}) \leq M

, one obtains

\begin{matrix} f_{t} - f_{t - 1} \leq - γ_{t - 1} φ_{t - 1} (φ_{t - 1} + ξ_{t}) + \frac{M}{2} γ_{t - 1}^{2} (φ_{t - 1} + ξ_{t})^{2} . \end{matrix}

(22)

Using ( 21 ) and ( 22 ) and taking into account that each of the values

γ_{t - 1}

φ_{t - 1}

I_{t - 1}

is mutually independent with

ξ_{t}

(see A1), one gets

\begin{matrix} \begin{matrix} E_{t} - E_{t - 1} \leq E [(- γ_{t - 1} φ_{t - 1}^{2} - γ_{t - 1} φ_{t - 1} ξ_{t} + \frac{M}{2} γ_{t - 1}^{2} φ_{t - 1}^{2} + M γ_{t - 1}^{2} φ_{t - 1} ξ_{t} + \frac{M}{2} γ_{t - 1}^{2} ξ_{t}^{2}) I_{t - 1}] = \\ = E [(- φ_{t - 1}^{2} + \frac{M}{2} γ_{t - 1} φ_{t - 1}^{2} + \frac{M}{2} γ_{t - 1} S) γ_{t - 1} I_{t - 1}] = \\ = E [(- φ_{t - 1}^{2} (1 - M γ_{t - 1} / 2) + M γ_{t - 1} S / 2) γ_{t - 1} I_{t - 1}] . \end{matrix} \end{matrix}

(23)

I_{t - 1} = 1

then either (i)

x_{t - 1} \in [- R, R] \ O

and

γ_{t - 1} < g

, or (ii)

| x_{t - 1} | \geq R

In the case (i) one has

\begin{matrix} - φ_{t - 1}^{2} (1 - M γ_{t - 1} / 2) + M γ_{t - 1} S / 2 \leq - c_{0} (1 - M g / 2) + M g S / 2 = : - c_{g}^{'}, \end{matrix}

(24)

where

c_{0} : = inf {| φ (x) | : x \in [- R, R] \ O}

; obviously,

c_{0} > 0

. Let us fix a

g \in (0, \bar{g})

such that

c_{g}^{'} > 0

In the case (ii), designating

b_{0} : = {inf}_{| x | \geq R} φ^{2} (x)

, one has

\begin{matrix} - φ_{t - 1}^{2} (1 - M γ_{t - 1} / 2) + M γ_{t - 1} S / 2 \leq - b_{0} (1 - M \bar{g} / 2) + M \bar{g} S / 2 = : - c^{''} . \end{matrix}

(25)

Using A6, one gets that

c^{''} > 0

Denote

c = min {c_{g}^{'}, c^{''}}

. The relations ( 24 ) and ( 25 ) imply that if

I_{t - 1} = 1

then

- φ_{t - 1}^{2} (1 - M γ_{t - 1} / 2) + M γ_{t - 1} S / 2 \leq - c < 0

, hence, by virtue of ( 23 ),

\begin{matrix} E_{t} - E_{t - 1} \leq - c \cdot E [γ_{t - 1} I_{t - 1}] . \end{matrix}

(26)

Summing up both sides of ( 26 ) over

t = 1, \dots, s

and denoting

I_{\infty} = I (τ = \infty) = {min}_{t} I_{t}

, one obtains

E_{s} - E_{0} \leq - c \cdot E [\sum_{i = 0}^{s - 1} γ_{i} \cdot I_{\infty}] .

One has

E_{s} \geq 0

, and

x_{0}

is bounded, hence

E_{0} < \infty

. Thus, for arbitrary

s

E [\sum_{i = 0}^{s - 1} γ_{i} \cdot I_{\infty}] \leq \frac{E_{0}}{c} < \infty .

This implies that a.s. either

\sum_{0}^{\infty} γ_{i} < \infty

, or

τ = \infty

. Lemma 3 is proved.

□

Denote

c_{1} : = 1 - M \bar{g} / 2

. Recall that

f

is the primitive of

φ

such that

{inf}_{x} f (x) = 0

; the assumption A6 implies that

{lim}_{x \to \pm \infty} f (x) = + \infty

. Denote

H : = {sup}_{| x | \leq R} f (x)

. Denote also

c_{3} : = \bar{g} \cdot sup {| φ (x) | : f (x) \leq H} + 1

z^{l} : = inf {x : f (x) \leq H} - c_{3}

z^{r} : = sup {x : f (x) \leq H} + c_{3}

c_{2} : = inf {| φ (x) | : x \in [z^{l}, z^{r}] \ O}

, and

K : = sup {| φ (x) | : x \in [z^{l}, z^{r}]}

. Obviously,

c_{1} > 0

and

K \geq c_{2} > 0

Fix an open set

O

containing

Z

. Let

g > 0

0 < w < 1

. We shall say that a (finite or infinite) deterministic sequence

{z_{0}, z_{1}, z_{2}, \dots}

(g, w)

-admissible if

| z_{0} | \leq R

and there exist deterministic sequences

{q_{t}},

{h_{t}}

such that 1)

| h_{t} | \leq w

; 2) if

{z_{0}, z_{1}, \dots, z_{t}} \subset [z^{l}, z^{r}] \ O

then

g d^{2} \leq q_{s} \leq \bar{g}

s = 0, 1, \dots, t

; 3)

z_{t} = z_{t - 1} - q_{t - 1} φ (z_{t - 1}) - h_{t}

t = 1, 2, \dots

Proposition 1 There exists constants

t_{0}

and

w

such that any

(g, w)

-admissible sequence

{z_{t}, t = 0, 1, \dots, t_{0}}

has non-empty intersection with

O

Proof. Let

w : = min {1, g d^{2} c_{2}^{2} c_{1} / (2 K)}

. Designate

\tilde{t} = inf {t : z_{t} \in O}

;

\tilde{t}

takes values from

{0, 1, \dots, t_{0}, + \infty}

. We shall use shorthand notation

f_{t} : = f (z_{t})

f_{t}^{'} = φ_{t} : = φ (z_{t})

. One has

\begin{matrix} f_{t} = f (z_{t - 1} - q_{t - 1} φ_{t - 1} - h_{t}) = f (z_{t - 1} - q_{t - 1} φ_{t - 1}) - f^{'} (\tilde{z}) . h_{t}, \end{matrix}

(27)

where

\tilde{z}

is a point between

z_{t - 1} - q_{t - 1} φ_{t - 1}

and

z_{t - 1} - q_{t - 1} φ_{t - 1} - h_{t}

Next, one has

\begin{matrix} f (z_{t - 1} - q_{t - 1} φ_{t - 1}) = f_{t - 1} - f_{t - 1}^{'} q_{t - 1} φ_{t - 1} + \frac{1}{2} f^{''} (\hat{z}) q_{t - 1}^{2} φ_{t - 1}^{2}, \end{matrix}

(28)

where

\hat{z}

is a point between

z_{t - 1}

and

z_{t - 1} - q_{t - 1} φ_{t - 1}

We are going to prove by induction that

\begin{matrix} if 0 \leq s \leq \tilde{t} then f_{s} \leq H - s \cdot g d^{2} c_{2}^{2} c_{1} / 2 . \end{matrix}

(29)

For

s = 0

, ( 29 ) follows from the condition

| z_{0} | \leq R

and the definition of

H

Now, let

1 \leq t \leq \tilde{t}

; suppose that formula ( 29 ) is true for

0 \leq s \leq t - 1

and prove it for

s = t

. For

0 \leq s \leq t - 1

, one has

f (z_{s}) \leq H

z_{s} \notin O

, therefore

z_{s} \in [z^{l}, z^{r}] \ O

; hence, by virtue of 2),

g d^{2} \leq q_{s} \leq \bar{g}

for

0 \leq s \leq t - 1

. One has

f (z_{t - 1}) \leq H

| q_{t - 1} φ_{t - 1} | \leq \bar{g} \cdot sup {| φ (x) | : f (x) \leq H}

, and

| h_{t} | \leq w \leq 1

, hence

| q_{t - 1} φ_{t - 1} | \leq c_{3}

| q_{t - 1} φ_{t - 1} + h_{t} | \leq c_{3}

, and so,

z_{t - 1} - q_{t - 1} φ_{t - 1} \in [z^{l}, z^{r}]

z_{t - 1} - q_{t - 1} φ_{t - 1} - h_{t} \in [z^{l}, z^{r}]

, thus

\tilde{z}

also belongs to

[z^{l}, z^{r}]

. This implies that

| φ (\tilde{z}) | = | f^{'} (\tilde{z}) | \leq K

. Then, combining ( 27 ) and ( 28 ) and using that

| h_{t} | \leq w

and

| f^{''} (\hat{z}) | = | φ^{'} (\hat{z}) | \leq M

, one obtains

\begin{matrix} f_{t} \leq f_{t - 1} - q_{t - 1} φ_{t - 1}^{2} (1 - \frac{1}{2} q_{t - 1} M) + w K . \end{matrix}

(30)

One has

z_{t - 1} \in [z^{l}, z^{r}] \ O

, hence

| φ (z_{t - 1}) | = | φ_{t - 1} | \geq c_{2}

. Using also that

q_{t - 1} \geq g d^{2}

1 - \frac{1}{2} q_{t - 1} M \geq c_{1}

, and

w K \leq g d^{2} c_{2}^{2} c_{1} / 2

, one gets from ( 30 ) that

f_{t} \leq f_{t - 1} - g d^{2} c_{2}^{2} c_{1} / 2,

and using the induction hypothesis, one concludes that

f_{t} \leq H - t \cdot g d^{2} c_{2}^{2} c_{1} / 2 .

Formula ( 29 ) is proved.

Let

t_{0} : = ⌊ 2 H / (g d^{2} c_{2}^{2} c_{1}) ⌋ + 1

; here

⌊ z ⌋

stands for the integral part of

z

Then, taking into account that

f_{s} \geq 0

, from ( 29 ) one concludes that

\tilde{t} < t_{0}

, thus Proposition 1 is proved.

□

Proposition 2 If

γ_{t - 1} < 1 / (3 M)

| ξ_{t} | < c_{2}

| ξ_{t + 1} | < c_{2}

x_{t - 1}

and

x_{t}

belong to

[z^{l}, z^{r}] \ O

, then

γ_{t + 1} \geq γ_{t}

Proof. Using notation

φ_{t} : = φ (x_{t})

, one gets

φ_{t} = φ (x_{t - 1} - γ_{t - 1} (φ_{t - 1} + ξ_{t})) = φ_{t - 1} - φ^{'} (\tilde{x}) \cdot γ_{t - 1} (φ_{t - 1} + ξ_{t}),

where

\tilde{x}

is a point between

x_{t - 1}

and

x_{t}

. Therefore,

φ_{t - 1} φ_{t} = φ_{t - 1}^{2} \cdot [1 - φ^{'} (\tilde{x}) γ_{t - 1} \cdot (1 + ξ_{t} / φ_{t - 1})] .

Using that

| φ^{'} (\tilde{x}) | \leq M

γ_{t - 1} < 1 / (3 M)

| ξ_{t} | < c_{2}

| φ_{t - 1} | \geq c_{2}

, one obtains

1 - φ^{'} (\tilde{x}) γ_{t - 1} \cdot (1 + ξ_{t} / φ_{t - 1}) \geq 1 / 3

, hence

φ_{t - 1} φ_{t} > 0

. Further, using that

| ξ_{t} | < c_{2}

| ξ_{t + 1} | < c_{2}

| φ_{t - 1} | \geq c_{2}

| φ_{t} | \geq c_{2}

, one gets

y_{t} y_{t + 1} = φ_{t - 1} φ_{t} \cdot (1 + ξ_{t} / φ_{t - 1}) (1 + ξ_{t + 1} / φ_{t}) > 0 .

This implies that

γ_{t + 1} = min {u γ_{t}, \bar{g}} \geq γ_{t}

□

Lemma 4 For any open set

O

, containing

Z

, and any

g > 0

there exists

δ = δ (O, g) > 0

such that

if | x_{0} | \leq R, γ_{0} \geq g then P (for some t, x_{t} \in O) \geq δ .

Proof. Without loss of generality suppose that

g < 1 / (3 M)

. Define the event

A : = {| ξ_{i} | < min {c_{2}, w / \bar{g}}, i = 1, 2, \dots, t_{0}},

where

w

and

t_{0}

are the same as in the proof of Proposition 1:

w = min {1, g d^{2} c_{2}^{2} c_{1} / (2 K)}

t_{0} = ⌊ 2 H / (g d^{2} c_{2}^{2} c_{1}) ⌋ + 1

Denote

δ : = P (A) = (P (| ξ_{1} | < min {c_{2}, w / \bar{g}}))^{t_{0}};

by virtue of A3 (a),

δ > 0

. Let us show that for any elementary event

ω \in A

, the sequence

{z_{t} = x_{t} (ω), t = 0, 1, \dots, t_{0}}

(g, w)

-admissible.

One has

| z_{0} | = | x_{0} (ω) | < R

. Further, one has

z_{t} = z_{t - 1} - q_{t - 1} φ (z_{t - 1}) - h_{t}

, with

q_{t - 1} = γ_{t - 1} (ω)

h_{t} = γ_{t - 1} (ω) ξ_{t} (ω)

, and using that

γ_{t - 1} (ω) \leq \bar{g}

and

| ξ_{t} (ω) | < ω / \bar{g}

, one gets

| h_{t} | \leq w

. Thus, conditions 1) and 3) are verified.

Now, let

{z_{0}, z_{1}, \dots, z_{t}} \subset [z^{l}, z^{r}] \ O

t \leq t_{0}

. Let

s_{0} \in {0, 1, 2, \dots, t}

be the minimal value such that

q_{s_{0}} = min {q_{0}, q_{1}, \dots, q_{t}}

. If

s_{0} = 0

then

min {q_{0}, q_{1}, \dots, q_{t}} = q_{0} = γ_{0} (ω) \geq g \geq g d^{2}

. If

s_{0} = 1

then

min {q_{0}, q_{1}, \dots, q_{t}} = q_{1} = γ_{1} (ω) \geq g d \geq g d^{2}

. If

s_{0} \geq 2

then

γ_{s_{0} - 2} (ω) \geq 1 / (3 M)

; otherwise, using that

| ξ_{s_{0} - 1} | < c_{2}

| ξ_{s_{0}} | < c_{2}

x_{s_{0} - 2} (ω)

and

x_{s_{0} - 1} (ω)

belong to

[z^{l}, z^{r}] \ O

, and applying Proposition 2, one would conclude that

γ_{s_{0}} (ω) \geq γ_{s_{0} - 1} (ω)

, which contradicts the definition of

s_{0}

Thus,

γ_{s_{0}} (ω) \geq 1 / (3 M) \cdot d^{2} \geq g d^{2}

, and therefore,

min {q_{0}, q_{1}, \dots, q_{t}} = γ_{s_{0}} (ω) \geq g d^{2}

. So, the condition 2) is also verified.

Now, applying Proposition 1 to the

(g, w)

-admissible sequence

{z_{t}}

, one concludes that there exists a non-negative

τ \leq t_{0}

such that

z_{τ} = x_{τ} (ω) \in O

This implies that

P (for some t, x_{t} \in O) \geq P (A) = δ .

□

Lemma 5 If

\sum_{t} γ_{t} = \infty

then for any open set

O

containing

Z

there exists

t

such that

x_{t} \in O

Proof. Let us fix an open set

O \supset Z

, and denote

δ = δ (O, g (O))

. Combining Lemma 3 and Lemma 4, one concludes that for any

O \supset Z

there exists

δ > 0

such that whatever the initial conditions

x_{0}

γ_{0}

γ_{1}

P (for some t, x_{t} \in O | \sum_{t} γ_{t} = \infty) > δ .

Then one can choose a measurable integer-valued function

n (\cdot, \cdot, \cdot)

defined on

R \times (0, \bar{g}] \times (0, \bar{g}]

such that for

ν = n (x_{0}, γ_{0}, γ_{1})

one will have

P (for some t \leq ν, x_{t} \in O | \sum_{t} γ_{t} = \infty) > δ / 2

Designate

\bar{p} = sup P (for all t, x_{t} \notin O | \sum_{t} γ_{t} = \infty),

the supremum being taken over all the initial conditions

x_{0}

γ_{0}

γ_{1}

. Fix

x_{0}

γ_{0}

γ_{1}

, then

\begin{matrix} \begin{matrix} P (for all t, x_{t} \notin O | \sum_{t} γ_{t} = \infty) = \\ = P (for all t > ν, x_{t} \notin O | for all t \leq ν, x_{t} \notin O and \sum_{t} γ_{t} = \infty) \cdot \\ \cdot P (for all t \leq ν, x_{t} \notin O | \sum_{t} γ_{t} = \infty) \leq \bar{p} (1 - δ / 2) . \end{matrix} \end{matrix}

(31)

Taking supremum of the left hand side of ( 31 ) over all

(x_{0}, γ_{0}, γ_{1}) \in R \times (0, \bar{g}] \times (0, \bar{g}]

, one obtains

\bar{p} \leq \bar{p} (1 - δ / 2)

, hence

\bar{p} = 0

. Lemma 5 is proved.

□

Denote

O_{*} = {x : | φ (x) | < L / 2}

Lemma 6 For any open bounded sets

O

O_{1}

such that

\bar{O} \subset O_{1} \subset O_{*}

and for any

w > 0

there exists

δ = δ (O, O_{1}, w) > 0

such that

if x_{0} \in O then P (for some n, x_{n} \in O_{1} and γ_{n} < w) \geq δ .

Proof. Denote

n = ⌊ \frac{ln \bar{g} - ln w}{ln (1 / d)} ⌋ + 2

. Denote also

ɛ = min {\frac{L}{2}, \frac{\partial (O, R \ O_{1})}{n \bar{g}}},

where

\partial (A, B) : = {sup}_{x \in A} {inf}_{y \in B} | x - y |

for arbitrary sets of real numbers

A

B

. Using assumption A3 (a), one obtains that there exists

δ_{1} > 0

such that for any

x \in O_{1}

and for any integer

t

P ((- 1)^{t - 1} φ (x) < (- 1)^{t} ξ_{1} < (- 1)^{t - 1} φ (x) + ɛ) \geq δ_{1} .

This implies that if

x_{0} \in O

then

P (0 < (- 1)^{t} y_{t} < ɛ, dist (x_{t - 1}, O) < (t - 1) \bar{g} ɛ, t = 1, 2, \dots, n + 1) \geq δ_{1}^{n + 1} .

Denoting

δ = δ_{1}^{n + 1}

, one concludes that the following statements (i) and (ii) hold with probability at least

δ

(i) dist

(x_{n}, O) < n \bar{g} ɛ \leq

dist

(O, R \ O_{1})

, hence

x_{n} \in O_{1}

; (ii) as

t = 2, 3, \dots, n + 1

, one has

y_{t - 1} y_{t} < 0

, hence

γ_{t} = d γ_{t - 1}

, therefore

γ_{n} = d^{n - 1} γ_{1} \leq d^{n - 1} \bar{g} < w

Lemma 6 is proved.

□

Lemma 7 If

\sum_{t} γ_{t} = \infty

O

is an open set containing

Z

, and

w > 0

then for some

t

x_{t - 1} \in O

and

γ_{t} < w

Proof. Without loss of generality, suppose that

O

is bounded and

O \subset O_{*}

Choose an open set

O_{1}

such that

Z \subset O_{1}

{\bar{O}}_{1} \subset O

; applying Lemmas 5 and 6, one gets that for

δ = δ (O_{1}, O, w)

and for arbitrary initial conditions,

P (for some t, x_{t} \in O and γ_{t} < w) > δ .

Repeating the argument of Lemma 5, one concludes that there exists

t

such that

x_{t} \in O

and

γ_{t} < w

□

From now on we suppose that

k > k_{+} (0)

. Choose

k^{'}

such that

k_{+} (0) < k^{'} < k

; using A3 (b), one obtains that for some

ɛ_{0} > 0

P (ξ_{1} ξ_{2} > 0, or | ξ_{1} | < ɛ_{0}, or | ξ_{2} | < ɛ_{0}) \leq k^{'}

. Denote

O_{0} = {x : | φ (x) | < ɛ_{0}}

and

τ = inf {t : x_{t} \notin O_{0}}

Without loss of generality, suppose that

O_{0}

is bounded.

Lemma 8 Suppose that

k > k_{+} (0)

, then there exist a constant

b > 0

and a monotone decreasing function

p (\cdot)

such that

{lim}_{a \to + \infty} p (a) = 0

and

if γ_{0} < w then P (ln γ_{t} < ln v - b t for all t < τ) > 1 - p (v / w) .

Proof. Define the sequences

{ρ_{t}}

and

{σ_{t}}

\begin{matrix} ρ_{t} & = & ln u \cdot I (ξ_{t - 1} ξ_{t} > 0, or | ξ_{t - 1} | < ɛ_{0}, or | ξ_{t} | < ɛ_{0}) + \end{matrix}

\begin{matrix} + & ln d \cdot I (ξ_{t - 1} ξ_{t} \leq 0 & | ξ_{t - 1} | \geq ɛ_{0} & | ξ_{t} | \geq ɛ_{0}), \end{matrix}

σ_{t} = ln w + \sum_{i = 1}^{t} ρ_{i} .

Using ( 5 ) and definition of

τ

, one obtains that for all

t < τ

γ_{t} \leq σ_{t}

. The variables

ρ_{t}

are identically distributed, take the values

ln u

and

ln d

, and

\begin{matrix} E ρ_{t} & = & ln u \cdot P (ξ_{t - 1} ξ_{t} > 0, or | ξ_{t - 1} | < ɛ_{0}, or | ξ_{t} | < ɛ_{0}) + \end{matrix}

\begin{matrix} + & ln d \cdot P (ξ_{t - 1} ξ_{t} \leq 0 & | ξ_{t - 1} | \geq ɛ_{0} & | ξ_{t} | \geq ɛ_{0}) \leq \end{matrix}

\begin{matrix} \leq & ln u \cdot k^{'} + ln d \cdot (1 - k^{'}) < ln u \cdot k + ln d \cdot (1 - k) = 0 . \end{matrix}

Moreover, the variables in the set

{ρ_{t}, t even}

, as well as the variables in the set

{ρ_{t}, t odd}

, are independent.

Denote

b = - E ρ_{t} / 2

. One has

P (ln γ_{t} < ln v - b t for all t < τ) \geq P (σ_{t} < ln v - b t for all t) =

= P (\sum_{i = 1}^{t} (ρ_{i} + 2 b) < ln v - ln w + b t for all t) \geq 1 - p (v / w),

where

p (a) = p_{1} (a) + p_{2} (a)

p_{1} (a) = P ({\sum_{1 \leq i \leq t}}^{'} (ρ_{i} + 2 b) \geq \frac{ln a}{2} + \frac{b}{2} t for all t),

p_{2} (a) = P ({\sum_{1 \leq i \leq t}}^{''} (ρ_{i} + 2 b) \geq \frac{ln a}{2} + \frac{b}{2} t for all t);

the sum

\sum^{'}

(

\sum^{''}

) is taken over the even (odd) values of

i

. Both

\sum^{'}

and

\sum^{''}

are sums of i.i.d.r.v. with zero mean, hence both

p_{1} (a)

and

p_{2} (a)

tend to zero as

a \to + \infty

. Lemma 8 is proved.

□

Define the stopping times

τ_{v} = inf {t : x_{t} \notin O_{0} or ln γ_{t} \geq ln v - b t}

. Recall that

f

is the primitive of

φ

such that

{inf}_{x} f (x) = 0

. Fix an open set

O^{'}

such that

Z \subset O^{'} \subset O_{0}

and

{sup}_{x \in O^{'}} f (x) < {inf}_{x \notin O_{0}} f (x)

, and denote

δ = {inf}_{x \notin O_{0}} f (x) - {sup}_{x \in O^{'}} f (x)

Lemma 9 Let

k > k_{+} (0)

x_{0} \in O^{'}

, and

γ_{0} < w

, then

P (τ_{v} < \infty) \leq K v^{2} + p (v / w);

here

K

is a positive constant, and

p (\cdot)

satisfies the statement of lemma 8.

Proof. We shall use shorthand notation of Lemma 3:

f_{t} : = f (x_{t})

and

φ_{t} : = φ (x_{t})

. According to ( 22 ), one has

f_{t} - f_{t - 1} \leq - γ_{t - 1} φ_{t - 1} (φ_{t - 1} + ξ_{t}) + \frac{M}{2} γ_{t - 1}^{2} (φ_{t - 1} + ξ_{t})^{2} \leq

\leq - γ_{t - 1} φ_{t - 1} ξ_{t} + M γ_{t - 1}^{2} (φ_{t - 1}^{2} + ξ_{t}^{2}) .

This implies that

f_{t} - f_{1} \leq Q_{t}^{'} + Q_{t}^{''}

, with

Q_{t}^{'} = | \sum_{i = 2}^{t} γ_{i - 1} φ_{i - 1} ξ_{i} |, Q_{t}^{''} = M \sum_{i = 2}^{t} γ_{i - 1}^{2} (φ_{i - 1}^{2} + ξ_{i}^{2}) .

Using Lemma 8, one gets

P (τ_{v} < \infty) \leq p (v / w) + P^{'} + P^{''},

where

P^{'} = P (Q_{τ_{v}}^{'} \geq δ / 2) and P^{''} = P (Q_{τ_{v}}^{''} \geq δ / 2) .

According to the Chebyshev inequality,

P^{'} \leq \frac{4}{δ^{2}} E {Q^{'}}_{τ_{v}}^{2} = \frac{4}{δ^{2}} \sum_{i, j = 1}^{\infty} E_{i j},

where

E_{i j} = E [γ_{i - 1} φ_{i - 1} ξ_{i} I (i - 1 < τ_{v}) \cdot γ_{j - 1} φ_{j - 1} ξ_{j} I (j - 1 < τ_{v})] .

Using that the values

γ_{i}

φ_{i}

ξ_{i}

, and

I (i < τ_{v})

are

ℱ_{i}

-measurable, and using assumptions A1 and A2, one obtains that for

i \neq j

E_{i j} = 0

, and for

i = j

E_{i i} = E [γ_{i - 1}^{2} φ_{i - 1}^{2} I (i - 1 < τ_{v}) \cdot ξ_{i}^{2}] \leq v^{2} e^{- 2 b i} {sup}_{x \in O_{0}} φ^{2} (x) \cdot S .

Therefore,

P^{'} \leq \frac{4}{δ^{2}} \sum_{i = 2}^{\infty} E_{i i} \leq \frac{4 v^{2} S}{δ^{2}} \frac{e^{- 4 b}}{1 - e^{- 2 b}} {sup}_{x \in O_{0}} φ^{2} (x) .

Similarly,

P^{''} \leq \frac{2}{δ} E Q_{τ_{v}}^{''} = \frac{2 M}{δ} \sum_{i = 2}^{\infty} E [γ_{i - 1}^{2} (φ_{i - 1}^{2} + ξ_{i}^{2}) I (i - 1 < τ_{v})] \leq

\leq \frac{2 M v^{2}}{δ} \sum_{i = 2}^{\infty} e^{- 2 b i} ({sup}_{x \in O_{0}} φ^{2} (x) + S) = \frac{2 M v^{2}}{δ} \frac{e^{- 4 b}}{1 - e^{- 2 b}} ({sup}_{x \in O_{0}} φ^{2} (x) + S) .

Taking

K = [\frac{4 S}{δ^{2}} {sup}_{x \in O_{0}} φ^{2} (x) + \frac{2 M}{δ} ({sup}_{x \in O_{0}} φ^{2} (x) + S)] \frac{e^{- 4 b}}{1 - e^{- 2 b}},

one gets that

P^{'} + P^{''} \leq K v^{2}

. Lemma 9 is proved.

□

Lemma 10 If

k > k_{+} (0)

then

\sum_{t} γ_{t} < \infty

Proof. From the definition of

τ_{v}

one easily sees that if

τ_{v} = \infty

for some

v > 0

, then

\sum_{t} γ_{t} < \infty

. This implies that for any

v > 0

\begin{matrix} P (\sum γ_{t} = \infty) \leq P (τ_{v} = \infty) . \end{matrix}

(32)

Further, by virtue of Lemma 9, if

x_{0} \in O^{'}

and

γ_{0} < w

then

\begin{matrix} P (τ_{\sqrt{w}} < \infty) \leq K w + p (1 / \sqrt{w}) . \end{matrix}

(33)

Combining ( 32 ) and ( 33 ), one gets that for any

w > 0

\begin{matrix} P (\sum γ_{t} = \infty | x_{0} \in O^{'} and γ_{0} < w) \leq K w + p (1 / \sqrt{w}) . \end{matrix}

(34)

Define the event

A_{w} = {for some t, x_{t} \in O^{'} and γ_{t} < w}

, then by virtue of ( 34 ),

\begin{matrix} P (\sum γ_{t} = \infty | A_{w}) \leq K w + p (1 / \sqrt{w}) . \end{matrix}

(35)

Denote by

{\bar{A}}_{w}

the complementary event,

{\bar{A}}_{w} = {for any t, x_{t} \notin O^{'} or γ_{t} \geq w}

By virtue of Lemma 7,

\begin{matrix} P (\sum γ_{t} = \infty & {\bar{A}}_{w}) = 0 . \end{matrix}

(36)

Using ( 35 ) and ( 36 ), one gets

\begin{matrix} P (\sum γ_{t} = \infty) = P (\sum γ_{t} = \infty & A_{w}) + P (\sum γ_{t} = \infty & {\bar{A}}_{w}) \leq \end{matrix}

\leq (K w + p (1 / \sqrt{w})) \cdot P (A_{w}) .

Taking into account that

w

can be chosen arbitrarily small and that

K w + p (1 / \sqrt{w}) \to 0

w \to 0^{+}

, one concludes that

P (\sum_{t} γ_{t} = \infty) = 0

□

Now, we are in a position to prove the theorem. Suppose that

k < {inf}_{z} k_{-} (z)

, then

V_{-}^{[k]} = \emptyset

, and by Lemma 2,

{x_{t}}

diverges. So, the statement (b) of Theorem is proved.

On the other hand, according to Lemma 10, if

k > k_{+} (0)

then

\sum_{t} γ_{t} < \infty

, and by Lemmas 1 and 2, the sequence

{x_{t}}

converges to a point from

V_{-}^{[k]}

Thus, the statement (a) of theorem is also established.

Acknowledgements This work was partially supported by the R&D Unit CEOC (Center for Research in Optimization and Control). The second author (PC) also gratefully acknowledges the financial support by the Portuguese program PRODEP `Medida 5 Acção 5.3 Formação Avançada de Docentes do Ensino Superior Concurso nr. 2/5.3/PRODEP/2001'.

References

Harold J. Kushner and G.George Yin, Stochastic approximation algorithms and applications., Applications of Mathematics. 35. (1997) Berlin: Springer.
M. Nevel'son and R. Has'minskii, Stochastic approximation and recursive estimation. Translated from the Russian by Israel Program for Scientific Translations. Translation edited by B. Silver., Translations of Mathematical Monographs. Vol. 47. (1976) Providence, R.I.: American Mathematical Society.
Y. Fang and T. J. Sejnowski, Faster learning for dynamic recurrent backpropagation, Neural Computation, 2 (1990), pp. 270–273.
Fernando M. Silva and Luis B. Almeida, Speeding up backpropagation, Advanced Neural Computers, (1990), R. Eckmiller (Ed.), Elsevier Science Publishers, Amsterdam pp. 151-158.
L. B. Almeida, T. Langlois, J. D. Amaral, and A. Plakhov, Parameter adaptation in stochastic optimization, Online Learning in Neural Networks, Saad, D. (Ed.), (1998), pp. 111–134.
P. J. Werbos, Neurocontrol and supervised learning: an overview and evaluation, in Handbook of Intelligent Control (Neural, Fuzzy, and Adaptive approaches), D. A. White and D. A. Sofge, eds., Van Nostrand Reinhold, New York, (1992), pp. 65–89.
H. Kesten, Accelerated stochastic approximation., Ann. Math. Stat., 29 (1958), pp. 41–59.
B. Delyon and A. Juditsky, Accelerated stochastic approximation., SIAM J. Optim., 3 (1993), pp. 868–881.
R. Salomon and van J. L. Hemmen, Accelerating Backpropagation through Dynamic Self-Adaptation, Neural Networks, 9:4, (1996) Elsevier Science Publishers, pp 589–601.
Roberto Battiti, Accelerated Backpropagation Learning: Two Optimization Methods, Complex Systems, Inc., (1989) pp. 331-342.
Marcus Frean, A “thermal” perceptron learning rule, Neural Computation, 4:6 (1992), The MIT Press, Cambridge–Massachusetts, pp 946–957.

A stochastic approximation algorithm with multiplicative step size adaptation

A l e x a n d e r P l a k h o v p l a k h o v @ m a t . u a . p t P e d r o C r u z j p e d r o @ m a t . u a . p t Department of Mathematics University of Aveiro — Portugal