, , ,

Consider a discrete time Markov process

(X, Y) = (X_{n}, Y_{n})_{n \geq 0}

with values¹ in

R \times R

, where the signal component

X

is a Markov process itself and the observation component

Y

satisfies

\begin{matrix} P (Y_{n} \in A | F_{[0, n - 1]}^{X} \lor F_{[1, n - 1]}^{Y}) = P (Y_{n} \in A | F_{n - 1}^{X}), P - a . s ., \end{matrix}

(1.1)

for any real Borel set

A \in B (R)

. We assume that all the random variables are defined on a complete probability space

(Ω, F, P)

and use the generic notation

F_{[ℓ, m]}^{Z} = σ {Z_{ℓ}, . . ., Z_{m}}

for

ℓ < m

and

F_{m}^{Z} = F_{[m, m]}^{Z}

for brevity. Let

π_{n} (d x)

be the regular conditional distribution of

X_{n}

given

F_{[1, n]}^{Y}

P (X_{n} \in A | F_{[1, n]}^{Y}) = \int_{A} π_{n} (d x), A \in B (R), P - a . s .

and assume that

Y_{0} \equiv 0

, so that the a priori knowledge about

X_{0}

is given only through its distribution

ν

. The measure valued random process

π = (π_{n})_{n \geq 0}

satisfies the well known recursive Bayes formula (the nonlinear filtering equation):

\begin{matrix} π_{n} (d x) = \frac{\int_{R} λ (u, d x) γ (u, Y_{n}) π_{n - 1} (d u)}{\int_{R} γ (v, Y_{n}) π_{n - 1} (d v)}, π_{0} (d x) = ν (d x), \end{matrix}

(1.2)

where

λ (u, d x)

is the transition kernel of

X

and

γ (x, y)

is the probability density of the conditional distribution 1.1 with respect to some

σ

-finite measure

M (d x)

(assumed to exist)

P (Y_{n} \in A | F_{n - 1}^{X}) = \int_{A} γ (X_{n - 1}, y) M (d y) .

Let

\bar{ν}

be a probability measure on

R

. Assuming that

ν ≪ \bar{ν}

, the recursion 1.2 remains

P

-a.s. well defined if started from

π_{0} : = \bar{ν}

, in which case its solution is denoted by

{\bar{π}}_{n}

and is referred as the ”wrong” filtering to emphasize its deviation from the exact conditional distribution

π_{n} (d x)

. The filter is stable if for any bounded Borel

f

\begin{matrix} E | π_{n} (f) - {\bar{π}}_{n} (f) | : = E | \int_{R} f (x) π_{n} (d x) - \int_{R} f (x) {\bar{π}}_{n} (d x) | - - - - \to n \to \infty 0 . \end{matrix}

(1.3)

Verifying 1.3 in terms of the properties of

(X, Y)

is a challenging problem. Most of the known positive results are obtained by treating 1.2 as a random dynamical system, which typically requires rather strong assumptions and leads to the stronger exponential convergence of the total variation norm:

\begin{matrix} l i m_{n \to \infty} \frac{1}{n} log ∥ π_{n} - {\bar{π}}_{n} ∥_{T V} < 0, P - a . s . \end{matrix}

(1.4)

The case of ergodic signals with compact state space is relatively well understood (see e.g. [2] , [6] ), while nonergodic or noncompact cases remain mysterious (some results can be found in [1] , [5] , [10] ). Moreover it is well known that both 1.3 or 1.4 may fail, even when the signal

X

is ergodic and its state space is finite (see the discussion and references in [3] ).

For a bounded Borel function

f

let

η_{n} (f) = E (f (Y_{n}) | F_{[1, n - 1]}^{Y}),

which is the optimal predicting estimate of

f (Y_{n})

, given the past

F_{[1, n - 1]}^{Y}

. It is not hard to see that

\begin{matrix} η_{n} (f) = \int_{R} f (y) \int_{R} γ (u, y) π_{n - 1} (d u) M (d y) . \end{matrix}

(1.5)

Analogously

\begin{matrix} {\bar{η}}_{n} (f) = \int_{R} f (y) \int_{R} γ (u, y) {\bar{π}}_{n - 1} (d u) M (d y) \end{matrix}

(1.6)

is defined. In this note it is shown that

\begin{matrix} {lim}_{n \to \infty} E | η_{n} (f) - {\bar{η}}_{n} (f) | = 0, \end{matrix}

(1.7)

under very general conditions, e.g. even when 1.3 fails! A similar phenomenon is reported in [7] , where the continuous time setting is addressed. This is briefly discussed in Section 3 , following the proof of 1.7 in the next section.

¹ the tow dimensional state space is chosen for the sake of clarity: the extension to more abstract setting is immediate

2 The main result

Hereafter we identify

(Ω, F)

with the measurable space

(R^{\infty} \times R^{\infty}, B (R^{\infty} \times R^{\infty}))

and denote by

P

and

\bar{P}

the probability measures, under which the canonical process

(X, Y)

has the given transition law and

X_{0}

has distribution

ν

\bar{ν}

respectively.

E

and

\bar{E}

denote the expectations with respect to

P

and

\bar{P}

. As before

π = (π_{n})_{n \geq 0}

and

\bar{π} = ({\bar{π}}_{n})_{n \geq 0}

stand for the solutions of 1.2 , started from

ν

and

\bar{ν}

respectively.

Being the regular conditional distribution under

\bar{P}

, the process

\bar{π}

is well defined as the solution of 1.2 under

\bar{P}

. It is not immediately clear whether iterating 1.2 subject to

{\bar{π}}_{0} = \bar{ν}

makes sense under

P

. If

ν ≪ \bar{ν}

is assumed, then due to the Markov structure of

(X, Y)

P ≪ \bar{P}

and

\begin{matrix} \frac{d P}{d \bar{P}} (x, y) = \frac{d ν}{d \bar{ν}} (x_{0}), \bar{P} - a . s . \end{matrix}

(2.1)

Since

\begin{matrix} \bar{P} (\int_{R} γ (v, Y_{n}) {\bar{π}}_{n - 1} (d v) = 0) = \bar{E} \bar{P} (\int_{R} γ (v, Y_{n}) {\bar{π}}_{n - 1} (d v) = 0 | F_{[1, n - 1]}^{Y} \lor F_{n - 1}^{X}) = \bar{E} \int_{R} 1 {\int_{R} γ (v, y) {\bar{π}}_{n - 1} (d v) = 0} \int_{R} γ (u, y) {\bar{π}}_{n - 1} (d u) M (d y) = 0 \end{matrix}

(2.2)

we have

P (\int_{R} γ (v, Y_{n}) {\bar{π}}_{n - 1} (d v) = 0) = 0 .

The latter means that the denominator of 1.2 , solved subject to

{\bar{π}}_{0} = \bar{ν}

, does not vanish for any

n \geq 1

P

-a.s., which allows to define

{\bar{π}}_{n}

under

P

Let

P_{n}^{Y}

and

{\bar{P}}_{n}^{Y}

be the restrictions of

P

and

\bar{P}

F_{[1, n]}^{Y}

, then

P_{n}^{Y} ≪ {\bar{P}}_{n}^{Y}

and

\frac{d P_{n}^{Y}}{d {\bar{P}}_{n}^{Y}} (Y) = \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n]}^{Y}) .

With

A = {\int_{R} γ (v, Y_{n}) π_{n - 1} (d v) = 0}, B = {\frac{d P_{n}^{Y}}{d {\bar{P}}_{n}^{Y}} (Y) > 0},

the Lebesgue decomposition implies

P_{n}^{Y} (A) = \bar{E} 1 A \frac{d P_{n}^{Y}}{d {\bar{P}}_{n}^{Y}} + P (A \cap B^{c}) \geq \bar{E} 1 A \cap B \frac{d P_{n}^{Y}}{d {\bar{P}}_{n}^{Y}} .

Similarly to 2.2 we have

P_{n}^{Y} (A) = 0

and thus since

\frac{d P_{n}^{Y}}{d {\bar{P}}_{n}^{Y}}

is positive on

B

\bar{E} 1 A \cap B \frac{d P_{n}^{Y}}{d {\bar{P}}_{n}^{Y}} = 0 ⟹ \bar{P} (A \cap B) = 0 .

The latter means that on the set

\begin{matrix} {\bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n]}^{Y}) > 0} \end{matrix}

(2.3)

the denominator of 1.2 , started from

π_{0} = ν

, doesn't vanish

\bar{P}

-a.s. So under

\bar{P}

we define

π_{n}

to be the solution of 1.2

on the set 2.3 and zero elsewhere.

Theorem 2.1. Assume

ν ≪ \bar{ν}

, then 1.7 holds for any bounded Borel

f

Proof. Recall the definitions 1.5 and 1.6 of

η_{n} (f)

and

{\bar{η}}_{n} (f)

. For

A \in F_{[1, n - 1]}^{Y}

\begin{matrix} \bar{E} 1 A \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n - 1]}^{Y}) η_{n} (f) = \bar{E} \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) 1 A η_{n} (f) | F_{[1, n - 1]}^{Y}) = \end{matrix}

\begin{matrix} \bar{E} \frac{d ν}{d \bar{ν}} (X_{0}) 1 A η_{n} (f) = E 1 A η_{n} (f) = E 1 A E (f (Y_{n}) | F_{[1, n - 1]}^{Y}) = E E (1 A f (Y_{n}) | F_{[1, n - 1]}^{Y}) = \end{matrix}

\begin{matrix} E 1 A f (Y_{n}) = E \frac{d ν}{d \bar{ν}} (X_{0}) 1 A f (Y_{n}) = \bar{E} 1 A \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) f (Y_{n}) | F_{[1, n - 1]}^{Y}) \end{matrix}

\begin{matrix}  \end{matrix}

By arbitrariness of

A

, the latter implies

\bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n - 1]}^{Y}) η_{n} (f) = \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) f (Y_{n}) | F_{[1, n - 1]}^{Y}), \bar{P} - a . s .

Hence (recall the definition of

π_{n}

and the consequent meaning of

η_{n} (f)

under

\bar{P}

)

\begin{matrix} E | η_{n} (f) - {\bar{η}}_{n} (f) | = \bar{E} \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n - 1]}^{Y}) | η_{n} (f) - {\bar{η}}_{n} (f) | = \end{matrix}

\begin{matrix} \bar{E} | \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n - 1]}^{Y}) η_{n} (f) - \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n - 1]}^{Y}) \bar{E} (f (Y_{n}) | F_{[1, n - 1]}^{Y}) | = \end{matrix}

\begin{matrix} \bar{E} | \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) f (Y_{n}) | F_{[1, n - 1]}^{Y}) - \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n - 1]}^{Y}) \bar{E} (f (Y_{n}) | F_{[1, n - 1]}^{Y}) | = \end{matrix}

\begin{matrix} \bar{E} | \bar{E} (\bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n]}^{Y}) f (Y_{n}) | F_{[1, n - 1]}^{Y}) - \bar{E} (\bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n - 1]}^{Y}) f (Y_{n}) | F_{[1, n - 1]}^{Y}) | \leq \end{matrix}

\begin{matrix} ∥ f ∥_{\infty} \bar{E} | \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n]}^{Y}) - \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n - 1]}^{Y}) | . \end{matrix}

(2.4)

\begin{matrix}  \end{matrix}

The sequence

Z_{n} = \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n]}^{Y})

is a uniformly integrable martingale and so converges

\bar{P}

-a.s., say, to

Z_{\infty}

. Since

Z_{n} \geq 0

\bar{E} Z_{n} \equiv 1 = \bar{E} Z_{\infty}

and

Z_{n} \to Z_{\infty}

\bar{P}

-a.s. by Scheffe theorem

\bar{E} | Z_{n} - Z_{\infty} | \to 0

, which implies 1.7 . □

3 Additive observation noise

Suppose that

Y_{n} = h (X_{n - 1}) + ξ_{n}

, where

h

is a bounded Borel function and

ξ

is an i.i.d. sequence, independent of

X

and with

E | ξ_{1} | < \infty

Proposition 3.1. Assume

ν ≪ \bar{ν}

, then

\begin{matrix} {lim}_{n \to \infty} E | π_{n} (h) - {\bar{π}}_{n} (h) | = 0 . \end{matrix}

(3.1)

Remark 3.2. The stability 3.1 resembles the result, obtained in [7] in the continuous time setting, where information theory arguments were used. Let

X = (X_{t})_{t \geq 0}

be a continuous time Markov process and

Y_{t} = \int_{0}^{t} h (X_{s}) d s + W_{t}

with a Wiener process

W

, independent of

X

. Assume that

ν ≪ \bar{ν}

and

E \int_{0}^{T} h^{2} (X_{s}) d s < \infty, E \int_{0}^{T} h^{2} ({\bar{X}}_{s}) d s < \infty, \forall T > 0,

then (Theorem 3.1 in [7] )

E \int_{0}^{\infty} {(π_{t} (h) - {\bar{π}}_{t} (h))}^{2} d t \leq 2 D (ν ∥ \bar{ν}),

where

D (ν ∥ \bar{ν}) = \int_{R} log \frac{d ν}{d \bar{ν}} (x) ν (d x)

is the relative entropy between

ν

and

\bar{ν}

Remark 3.3. Of course 1.3 may still fail for

f \neq h

. This can be observed in the Blackwell's counterexample [4] , which was already brought up in several related contexts (see [9] , [8] , [3] ). Suppose

X

takes values in

{1, 2, 3, 4}

and

Y_{n} = 1 {X_{n - 1} = 1} + 1 {X_{n - 1} = 3}

Then it is not hard to see that the stationary distribution of the filtering process

{\bar{π}}_{n}

has equiprobable atoms at eight vectors with entries depending explicitly on

\bar{ν}

(see the details in [3] ). In other words, the stationary distribution of the process

\bar{π}

is determined by

\bar{ν}

and in particular one may find a constant

C > 0

, so that

∥ π_{n} - {\bar{π}}_{n} ∥ \geq C

for all

n \geq 1

However

∥ η_{n} (f) - {\bar{η}}_{n} (f) ∥ \equiv 0

n \geq 1

, since

Y = (Y_{n})_{n \geq 1}

are independent.

Proof. For unbounded

f

, instead of 2.4 we have

\begin{matrix} E | η_{n} (f) - {\bar{η}}_{n} (f) | \leq \bar{E} | f (Y_{n}) | | \bar{E} (\frac{d ν}{d \bar{ν}} ({\bar{X}}_{0}) | F_{[1, n]}^{Y}) - \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n - 1]}^{Y}) | \leq \end{matrix}

\begin{matrix} C \bar{E} | Z_{n} - Z_{n - 1} | + \bar{E} | f (Y_{n}) | 1 {| f (Y_{n}) | \geq C} \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n]}^{Y}) + \end{matrix}

(3.2)

\begin{matrix} \bar{E} | f (Y_{n}) | 1 {| f (Y_{n}) | \geq C} \bar{E} (\frac{d ν}{d \bar{ν}} (X_{0}) | F_{[1, n - 1]}^{Y}) = \end{matrix}

\begin{matrix} C \bar{E} | Z_{n} - Z_{n - 1} | + E | f (Y_{n}) | 1 {| f (Y_{n}) | \geq C} + \bar{E} | f (Y_{n}) | 1 {| f (Y_{n}) | \geq C} Z_{n - 1} . \end{matrix}

\begin{matrix}  \end{matrix}

For particular

f (y) = y

, we have

E | Y_{n} | 1 {| Y_{n} | \geq C} = E | h (X_{n - 1}) + ξ_{n} | 1 {| h (X_{n - 1}) + ξ_{n} | \geq C} \leq E | h (X_{n - 1}) + ξ_{n} | 1 {| h (X_{n - 1}) | \geq C / 2} + E | h (X_{n - 1}) + ξ_{n} | 1 {| ξ_{n} | \geq C / 2}

and hence

{sup}_{n \geq 1} E | Y_{n} | 1 {| Y_{n} | \geq C} - - - - \to n \to \infty 0,

since

h

is bounded. Similarly, using independence of

X

and

ξ

\begin{matrix} \bar{E} | Y_{n} | 1 {| Y_{n} | \geq C} Z_{n - 1} = \bar{E} | h (X_{n - 1}) + ξ_{n} | 1 {| h (X_{n - 1}) + ξ_{n} | \geq C} Z_{n - 1} \leq \end{matrix}

\begin{matrix} \bar{E} | h (X_{n - 1}) | 1 {| h (X_{n - 1}) | \geq C / 2} Z_{n - 1} + \bar{E} | ξ_{n} | 1 {| h (X_{n - 1}) | \geq C / 2} Z_{n - 1} + \end{matrix}

\begin{matrix} \bar{E} | h (X_{n - 1}) | 1 {| ξ_{n} | \geq C / 2} Z_{n - 1} + \bar{E} | ξ_{n} | 1 {| ξ_{n} | \geq C / 2} Z_{n - 1} \leq \end{matrix}

\begin{matrix} (∥ h ∥_{\infty} + \bar{E} | ξ_{1} |) \bar{E} 1 {| h (X_{n - 1}) | \geq C / 2} Z_{n - 1} + ∥ h ∥_{\infty} P (| ξ_{1} | \geq C / 2) + \bar{E} | ξ_{1} | 1 {| ξ_{1} | \geq C / 2}, \end{matrix}

\begin{matrix}  \end{matrix}

and thus by dominated convergence

l i m_{n \to \infty} \bar{E} | Y_{n} | 1 {| Y_{n} | \geq C} Z_{n - 1} - - - - \to C \to \infty 0 .

The claim now follows from 3.2 , since

E (Y_{n} | F_{[1, n - 1]}^{Y}) = π_{n - 1} (h)

. □

References

R. Atar, Exponential stability for nonlinear filtering of diffusion processes in a noncompact domain. Ann. Probab. 26 (1998), no. 4, 1552–1574
R. Atar, O.Zeitouni, Exponential stability for nonlinear filtering. Ann. Inst. H. Poincare Probab. Statist. 33 (1997), no. 6, 697–725
P. Baxendale, P. Chigansky, R. Liptser, Asymptotic stability of the Wonham filter: ergodic and nonergodic signals, SIAM J. Control Optim. 43 (2004), no. 2, 643–669
D. Blackwell, The entropy of functions of finite-state Markov chains. 1957 Transactions of the first Prague conference on information theory, Prague, 1956 pp. 13–20 Publishing House of the Czechoslovak Academy of Sciences, Prague
A. Budhiraja, D. Ocone, Exponential stability in discrete-time filtering for non-ergodic signals. Stochastic Process. Appl. 82 (1999), no. 2, 245–257
P. Chigansky, R. Liptser, Stability of nonlinear filters in nonmixing case. Ann. Appl. Probab. 14 (2004), no. 4, 2038–2056
J.M.C.Clark, D.Ocone, C.Coumarbatch, Relative entropy and error bounds for filtering of Markov processes, Math. Control Signals Systems 12 (1999), no. 4, 346–360
B. Delyon, O. Zeitouni, Lyapunov exponents for filtering problem, in Applied Stochastic Analysis, Davis, M. H. A. and Elliot R. J. eds., Gordon & Breach, New York, 1991, pp. 511-521.
T. Kaijser, A limit theorem for partially observed Markov chains, Ann. Probab. 3 (1975), pp. 677-696.
F. LeGland, N. Oudjane, A robustification approach to stability and to uniform particle approximation of nonlinear filters: the example of pseudo-mixing signals. Stochastic Process. Appl. 106 (2003), no. 2, 279–316

Department of Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel E-mail address : pavel.chigansky@weizmann.ac.il Department of Electrical Engineering Systems, Tel Aviv University, 69978 Tel Aviv, Israel E-mail address : liptser@eng.tau.ac.il

P. Chigansky

R. Liptser