Research supported by a grant from the Israel Science Foundation .
<ph f="cmbx">What is always stable in nonlinear filtering?</ph>

P. Chigansky

R. Liptser

Department of Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel E-mail address : pavel.chigansky@weizmann.ac.il Department of Electrical Engineering Systems, Tel Aviv University, 69978 Tel Aviv, Israel E-mail address : liptser@eng.tau.ac.il

1 Introduction

Consider a discrete time Markov process ( X , Y ) = ( X n , Y n ) n 0   with values1 in R × R   , where the signal component X   is a Markov process itself and the observation component Y   satisfies
P ( Y n A | F [ 0 , n 1 ] X F [ 1 , n 1 ] Y ) = P ( Y n A | F n 1 X ) , P a . s . , (1.1)
for any real Borel set A B ( R )   . We assume that all the random variables are defined on a complete probability space ( Ω , F , P )   and use the generic notation F [ , m ] Z = σ { Z , . . . , Z m }   for < m   and F m Z = F [ m , m ] Z   for brevity. Let π n ( d x )   be the regular conditional distribution of X n   given F [ 1 , n ] Y   :
P ( X n A | F [ 1 , n ] Y ) = A π n ( d x ) , A B ( R ) , P a . s .   and assume that Y 0 0   , so that the a priori knowledge about X 0   is given only through its distribution ν   . The measure valued random process π = ( π n ) n 0   satisfies the well known recursive Bayes formula (the nonlinear filtering equation):
π n ( d x ) = R λ ( u , d x ) γ ( u , Y n ) π n 1 ( d u ) R γ ( v , Y n ) π n 1 ( d v ) , π 0 ( d x ) = ν ( d x ) , (1.2)
where λ ( u , d x )   is the transition kernel of X   and γ ( x , y )   is the probability density of the conditional distribution  1.1 with respect to some σ   -finite measure M ( d x )   (assumed to exist) P ( Y n A | F n 1 X ) = A γ ( X n 1 , y ) M ( d y ) .   Let ν ¯   be a probability measure on R   . Assuming that ν ν ¯   , the recursion  1.2 remains P   -a.s. well defined if started from π 0 : = ν ¯   , in which case its solution is denoted by π ¯ n   and is referred as the ”wrong” filtering to emphasize its deviation from the exact conditional distribution π n ( d x )   . The filter is stable if for any bounded Borel f  
E | π n ( f ) π ¯ n ( f ) | : = E | R f ( x ) π n ( d x ) R f ( x ) π ¯ n ( d x ) | n 0 . (1.3)
Verifying  1.3 in terms of the properties of ( X , Y )   is a challenging problem. Most of the known positive results are obtained by treating  1.2 as a random dynamical system, which typically requires rather strong assumptions and leads to the stronger exponential convergence of the total variation norm:
l i m n 1 n log π n π ¯ n T V < 0 , P a . s . (1.4)
The case of ergodic signals with compact state space is relatively well understood (see e.g. [2, [6), while nonergodic or noncompact cases remain mysterious (some results can be found in [1, [5, [10). Moreover it is well known that both  1.3 or  1.4 may fail, even when the signal X   is ergodic and its state space is finite (see the discussion and references in [3).
For a bounded Borel function f   let η n ( f ) = E ( f ( Y n ) | F [ 1 , n 1 ] Y ) ,   which is the optimal predicting estimate of f ( Y n )   , given the past F [ 1 , n 1 ] Y   . It is not hard to see that
η n ( f ) = R f ( y ) R γ ( u , y ) π n 1 ( d u ) M ( d y ) . (1.5)
Analogously
η ¯ n ( f ) = R f ( y ) R γ ( u , y ) π ¯ n 1 ( d u ) M ( d y ) (1.6)
is defined. In this note it is shown that
lim n E | η n ( f ) η ¯ n ( f ) | = 0 , (1.7)
under very general conditions, e.g. even when  1.3 fails! A similar phenomenon is reported in [7, where the continuous time setting is addressed. This is briefly discussed in Section  3 , following the proof of  1.7 in the next section.

1 the tow dimensional state space is chosen for the sake of clarity: the extension to more abstract setting is immediate

2 The main result

Hereafter we identify ( Ω , F )   with the measurable space ( R × R , B ( R × R ) )   and denote by P   and P ¯   the probability measures, under which the canonical process ( X , Y )   has the given transition law and X 0   has distribution ν   or ν ¯   respectively. E   and E ¯   denote the expectations with respect to P   and P ¯   . As before π = ( π n ) n 0   and π ¯ = ( π ¯ n ) n 0   stand for the solutions of  1.2 , started from ν   and ν ¯   respectively.
Being the regular conditional distribution under P ¯   , the process π ¯   is well defined as the solution of  1.2 under P ¯   . It is not immediately clear whether iterating  1.2 subject to π ¯ 0 = ν ¯   makes sense under P   . If ν ν ¯   is assumed, then due to the Markov structure of ( X , Y )   , P P ¯   and
d P d P ¯ ( x , y ) = d ν d ν ¯ ( x 0 ) , P ¯ a . s . (2.1)
Since
P ¯ ( R γ ( v , Y n ) π ¯ n 1 ( d v ) = 0 ) = E ¯ P ¯ ( R γ ( v , Y n ) π ¯ n 1 ( d v ) = 0 | F [ 1 , n 1 ] Y F n 1 X ) = E ¯ R 1 { R γ ( v , y ) π ¯ n 1 ( d v ) = 0 } R γ ( u , y ) π ¯ n 1 ( d u ) M ( d y ) = 0 (2.2)
we have P ( R γ ( v , Y n ) π ¯ n 1 ( d v ) = 0 ) = 0 .   The latter means that the denominator of  1.2 , solved subject to π ¯ 0 = ν ¯   , does not vanish for any n 1   P   -a.s., which allows to define π ¯ n   under P   .
Let P n Y   and P ¯ n Y   be the restrictions of P   and P ¯   to F [ 1 , n ] Y   , then P n Y P ¯ n Y   and d P n Y d P ¯ n Y ( Y ) = E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n ] Y ) .   With A = { R γ ( v , Y n ) π n 1 ( d v ) = 0 } , B = { d P n Y d P ¯ n Y ( Y ) > 0 } ,   the Lebesgue decomposition implies P n Y ( A ) = E ¯ 1 A d P n Y d P ¯ n Y + P ( A B c ) E ¯ 1 A B d P n Y d P ¯ n Y .   Similarly to  2.2 we have P n Y ( A ) = 0   and thus since d P n Y d P ¯ n Y   is positive on B   E ¯ 1 A B d P n Y d P ¯ n Y = 0 P ¯ ( A B ) = 0 .   The latter means that on the set
{ E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n ] Y ) > 0 } (2.3)
the denominator of  1.2 , started from π 0 = ν   , doesn't vanish P ¯   -a.s. So under P ¯   we define π n   to be the solution of  1.2    on the set  2.3 and zero elsewhere.
Theorem 2.1. Assume ν ν ¯   , then  1.7 holds for any bounded Borel f   .
  • Proof. Recall the definitions  1.5 and  1.6 of η n ( f )   and η ¯ n ( f )   . For A F [ 1 , n 1 ] Y  
    E ¯ 1 A E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n 1 ] Y ) η n ( f ) = E ¯ E ¯ ( d ν d ν ¯ ( X 0 ) 1 A η n ( f ) | F [ 1 , n 1 ] Y ) =
    E ¯ d ν d ν ¯ ( X 0 ) 1 A η n ( f ) = E 1 A η n ( f ) = E 1 A E ( f ( Y n ) | F [ 1 , n 1 ] Y ) = E E ( 1 A f ( Y n ) | F [ 1 , n 1 ] Y ) =
    E 1 A f ( Y n ) = E d ν d ν ¯ ( X 0 ) 1 A f ( Y n ) = E ¯ 1 A E ¯ ( d ν d ν ¯ ( X 0 ) f ( Y n ) | F [ 1 , n 1 ] Y )
    By arbitrariness of A   , the latter implies E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n 1 ] Y ) η n ( f ) = E ¯ ( d ν d ν ¯ ( X 0 ) f ( Y n ) | F [ 1 , n 1 ] Y ) , P ¯ a . s .   Hence (recall the definition of π n   and the consequent meaning of η n ( f )   under P ¯   )
    E | η n ( f ) η ¯ n ( f ) | = E ¯ E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n 1 ] Y ) | η n ( f ) η ¯ n ( f ) | =
    E ¯ | E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n 1 ] Y ) η n ( f ) E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n 1 ] Y ) E ¯ ( f ( Y n ) | F [ 1 , n 1 ] Y ) | =
    E ¯ | E ¯ ( d ν d ν ¯ ( X 0 ) f ( Y n ) | F [ 1 , n 1 ] Y ) E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n 1 ] Y ) E ¯ ( f ( Y n ) | F [ 1 , n 1 ] Y ) | =
    E ¯ | E ¯ ( E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n ] Y ) f ( Y n ) | F [ 1 , n 1 ] Y ) E ¯ ( E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n 1 ] Y ) f ( Y n ) | F [ 1 , n 1 ] Y ) |
    f E ¯ | E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n ] Y ) E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n 1 ] Y ) | . (2.4)
    The sequence Z n = E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n ] Y )   is a uniformly integrable martingale and so converges P ¯   -a.s., say, to Z   . Since Z n 0   , E ¯ Z n 1 = E ¯ Z   and Z n Z   , P ¯   -a.s. by Scheffe theorem E ¯ | Z n Z | 0   , which implies  1.7 .

3 Additive observation noise

Suppose that Y n = h ( X n 1 ) + ξ n   , where h   is a bounded Borel function and ξ   is an i.i.d. sequence, independent of X   and with E | ξ 1 | <   .
Proposition 3.1. Assume ν ν ¯   , then
lim n E | π n ( h ) π ¯ n ( h ) | = 0 . (3.1)
Remark 3.2. The stability  3.1 resembles the result, obtained in [7in the continuous time setting, where information theory arguments were used. Let X = ( X t ) t 0   be a continuous time Markov process and Y t = 0 t h ( X s ) d s + W t   with a Wiener process W   , independent of X   . Assume that ν ν ¯   and E 0 T h 2 ( X s ) d s < , E 0 T h 2 ( X ¯ s ) d s < , T > 0 ,   then (Theorem 3.1 in [7) E 0 ( π t ( h ) π ¯ t ( h ) ) 2 d t 2 D ( ν ν ¯ ) ,   where D ( ν ν ¯ ) = R log d ν d ν ¯ ( x ) ν ( d x )   is the relative entropy between ν   and ν ¯   .
Remark 3.3. Of course  1.3 may still fail for f h   . This can be observed in the Blackwell's counterexample [4, which was already brought up in several related contexts (see [9, [8, [3). Suppose X   takes values in { 1 , 2 , 3 , 4 }   and Y n = 1 { X n 1 = 1 } + 1 { X n 1 = 3 }   .
Then it is not hard to see that the stationary distribution of the filtering process π ¯ n   has equiprobable atoms at eight vectors with entries depending explicitly on ν ¯   (see the details in [3). In other words, the stationary distribution of the process π ¯   is determined by ν ¯   and in particular one may find a constant C > 0   , so that π n π ¯ n C   for all n 1   .
However η n ( f ) η ¯ n ( f ) 0   , n 1   , since Y = ( Y n ) n 1   are independent.
  • Proof. For unbounded f   , instead of  2.4 we have
    E | η n ( f ) η ¯ n ( f ) | E ¯ | f ( Y n ) | | E ¯ ( d ν d ν ¯ ( X ¯ 0 ) | F [ 1 , n ] Y ) E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n 1 ] Y ) |
    C E ¯ | Z n Z n 1 | + E ¯ | f ( Y n ) | 1 { | f ( Y n ) | C } E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n ] Y ) + (3.2)
    E ¯ | f ( Y n ) | 1 { | f ( Y n ) | C } E ¯ ( d ν d ν ¯ ( X 0 ) | F [ 1 , n 1 ] Y ) =
    C E ¯ | Z n Z n 1 | + E | f ( Y n ) | 1 { | f ( Y n ) | C } + E ¯ | f ( Y n ) | 1 { | f ( Y n ) | C } Z n 1 .
    For particular f ( y ) = y   , we have
    E | Y n | 1 { | Y n | C } = E | h ( X n 1 ) + ξ n | 1 { | h ( X n 1 ) + ξ n | C } E | h ( X n 1 ) + ξ n | 1 { | h ( X n 1 ) | C / 2 } + E | h ( X n 1 ) + ξ n | 1 { | ξ n | C / 2 }  
    and hence sup n 1 E | Y n | 1 { | Y n | C } n 0 ,   since h   is bounded. Similarly, using independence of X   and ξ  
    E ¯ | Y n | 1 { | Y n | C } Z n 1 = E ¯ | h ( X n 1 ) + ξ n | 1 { | h ( X n 1 ) + ξ n | C } Z n 1
    E ¯ | h ( X n 1 ) | 1 { | h ( X n 1 ) | C / 2 } Z n 1 + E ¯ | ξ n | 1 { | h ( X n 1 ) | C / 2 } Z n 1 +
    E ¯ | h ( X n 1 ) | 1 { | ξ n | C / 2 } Z n 1 + E ¯ | ξ n | 1 { | ξ n | C / 2 } Z n 1
    ( h + E ¯ | ξ 1 | ) E ¯ 1 { | h ( X n 1 ) | C / 2 } Z n 1 + h P ( | ξ 1 | C / 2 ) + E ¯ | ξ 1 | 1 { | ξ 1 | C / 2 } ,
    and thus by dominated convergence l i m n E ¯ | Y n | 1 { | Y n | C } Z n 1 C 0 .   The claim now follows from  3.2 , since E ( Y n | F [ 1 , n 1 ] Y ) = π n 1 ( h )   .
References

  1. R. Atar, Exponential stability for nonlinear filtering of diffusion processes in a noncompact domain. Ann. Probab. 26 (1998), no. 4, 1552–1574
  2. R. Atar, O.Zeitouni, Exponential stability for nonlinear filtering. Ann. Inst. H. Poincare Probab. Statist. 33 (1997), no. 6, 697–725
  3. P. Baxendale, P. Chigansky, R. Liptser, Asymptotic stability of the Wonham filter: ergodic and nonergodic signals, SIAM J. Control Optim. 43 (2004), no. 2, 643–669
  4. D. Blackwell, The entropy of functions of finite-state Markov chains. 1957 Transactions of the first Prague conference on information theory, Prague, 1956 pp. 13–20 Publishing House of the Czechoslovak Academy of Sciences, Prague
  5. A. Budhiraja, D. Ocone, Exponential stability in discrete-time filtering for non-ergodic signals. Stochastic Process. Appl. 82 (1999), no. 2, 245–257
  6. P. Chigansky, R. Liptser, Stability of nonlinear filters in nonmixing case. Ann. Appl. Probab. 14 (2004), no. 4, 2038–2056
  7. J.M.C.Clark, D.Ocone, C.Coumarbatch, Relative entropy and error bounds for filtering of Markov processes, Math. Control Signals Systems 12 (1999), no. 4, 346–360
  8. B. Delyon, O. Zeitouni, Lyapunov exponents for filtering problem, in Applied Stochastic Analysis, Davis, M. H. A. and Elliot R. J. eds., Gordon & Breach, New York, 1991, pp. 511-521.
  9. T. Kaijser, A limit theorem for partially observed Markov chains, Ann. Probab. 3 (1975), pp. 677-696.
  10. F. LeGland, N. Oudjane, A robustification approach to stability and to uniform particle approximation of nonlinear filters: the example of pseudo-mixing signals. Stochastic Process. Appl. 106 (2003), no. 2, 279–316

Department of Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel E-mail address : pavel.chigansky@weizmann.ac.il Department of Electrical Engineering Systems, Tel Aviv University, 69978 Tel Aviv, Israel E-mail address : liptser@eng.tau.ac.il