<ph f="cmbx">Quantitative concentration inequalities for empirical measures on non-compact spaces</ph>

François Bolley, Arnaud Guillin,

Cédric Villani

ENS Lyon, Umpa, 46 allee d'Italie, F-69364 Lyon Cedex 07 E-mail address : fbolley@umpa.ens-lyon.fr CEREMADE, Universite Paris Dauphine E-mail address : guillin@ceremade.dauphine.fr ENS Lyon, Umpa, 46 allee d'Italie, F-69364 Lyon Cedex 07 E-mail address : cvillani@umpa.ens-lyon.fr

Contents

Introduction

Large stochastic particle systems constitute a popular way to perform numerical simulations in many contexts, either because they are used in some physical model (as in e.g. stellar or granular media) or as an approximation of a continuous model (as in e.g. vortex simulation for Euler equation, see [21,Chapter5for instance). For such systems one may wish to establish concentration estimates showing that the behavior of the system is sharply stabilized as the number N   of particles goes to infinity. It is natural to search for these estimates in the setting of large (or moderate) deviations, since one wishes to make sure that the numerical method has a very small probability to give wrong results. From a physical perspective, concentration estimates may be useful to establish the validity of a continuous approximation such as a mean-field limit.
When one is interested in the asymptotic behavior of just one, or a few observables (such as the mean position...), there are efficient methods, based for instance on concentration of measure theory. As a good example, Malrieu [19recently applied tools from the fields of Logarithmic Sobolev inequalities, optimal transportation and concentration of measure, to prove very neat bounds like
sup φ L i p 1 P [ | 1 N i = 1 N φ ( X t i ) φ d μ t | > ɛ ] 2 e λ N ɛ 2 . (0.1)
Here ( X t i ) 1 i N   stand for the positions of particles (in phase space) at time t   , ɛ   is a given error, P   stands for the probability, μ t   is a probability measure governing the limit behavior of the system, and λ > 0   is a positive constant depending on the particular system he is considering (a simple instance of McKean-Vlasov model used in particular in the modelling of granular media). Moreover, φ L i p : = sup x y | f ( x ) f ( y ) | d ( x , y ) ,   where d   is the distance in phase space (say the Euclidean norm | |   in R d   ).
This approach can lead to nice bounds, but has the drawback to be limited to a finite number of observables. Of course, one may apply  0.1 to many functions φ   , and obtain something like
P [ k = 1 1 k 2 | 1 N i = 1 N φ k ( X t i ) φ k d μ t | > ɛ ] C e N λ ɛ 2 , (0.2)
where ( φ k ) k N   is an arbitrarily chosen dense family in the set of all 1   -Lipschitz functions converging to 0 at infinity. If we denote by δ x   the Dirac mass at point x   , and by μ ^ t N : = 1 N i = 1 N δ X t i   the empirical measure associated with the system (this is a random probability measure), then estimate  0.2 can be interpreted as a bound on how close μ ^ t N   is to μ t   . Indeed,
d ( μ , ν ) : = k = 1 1 k 2 | φ k d ( μ ν ) | (0.3)
defines a distance on probability measures, associated with a topology which is at least as strong as the weak convergence of measures (convergence against bounded continuous test functions). However, this point of view is deceiving: for practical purposes, the distance d   can hardly be estimated, and in any case  0.2 does not contain more information than  0.1 :
it is only useful if one considers a finite number of observables.
Sanov's large deviation principle [12,Theorem6.2.10provides a more satisfactory tool to estimate the distance between the empirical measure and its limit. Roughly speaking, it implies, for independent variables X t i   , an estimate of the form P [ d i s t ( μ ^ t N , μ ) ɛ ] e N α ( ɛ ) as N ,   where
α ( ɛ ) : = inf { H ( ν | μ ) ; d i s t ( ν , μ ) ɛ } (0.4)
and H   is the relative H   functional: H ( ν | μ ) = d ν d μ log d ν d μ d μ   (to be interpreted as +   if ν   is not absolutely continuous with respect to μ   ). Since H   behaves in many ways like a square distance, one can hope that α ( ɛ ) c o n s t . ɛ 2   . Here “ d i s t   ” may be any distance which is continuous with respect to the weak topology, a condition which might cause trouble on a non-compact phase space.
Yet Sanov's theorem is not the final answer either: it is actually asymptotic, and only implies a bound like limsup 1 N log P [ d i s t ( μ ^ t N , μ ) ɛ ] α ( ɛ ) ,   which, unlike  0.1 , does not contain any explicit estimate for a given N   . Fortunately, there are known techniques to obtain quantitative upper bounds for such theorems, see in particular [12,Exercise 4.5.5. Since these techniques are devised for compact phase spaces, a further truncation will be necessary to treat more general situations.
In this paper, we shall show how to combine these ideas with recent results about measure concentration and transportation distances, in order to derive in a systematic way estimates that are explicit, deal with the empirical measure as a whole, apply to non-compact phase spaces, and can be used to study some particle systems arising in practical problems.
Typical estimates will be of the form
P [ sup φ L i p 1 ( 1 N i = 1 N φ ( X t i ) φ d μ t ) > ɛ ] C e λ N ɛ 2 . (0.5)
As a price to pay, the constant C   in the right-hand side will be much larger than the one in  0.1 .
Here is a possible application of  0.5 in a numerical perspective. Suppose your system has a limit invariant measure μ = lim μ t   as t   , and you wish to numerically plot itsdensity f   . For that, you run your particle simulation for a long time t = T   , and plot, say,
f ~ t ( x ) : = 1 N i = 1 N ζ α ( x X t i ) , (0.6)
where ζ α = α d ζ ( x / α )   is a smooth approximation of a Dirac mass as α 0   (as usual, ζ   is a nonnegative smooth radial function on R d   with compact support and unit integral).
With the help of estimates such as  0.5 , it is often possible to compute bounds on, say, P [ f ~ T f L > ɛ ]   in terms of N   , ɛ   , T   and α   . In this way one can “guarantee” that all details of the invariant measure are captured by the stochastic system. While this problem is too general to be treated abstractly, we shall show on some concrete model examples how to derive such bounds for the same kind of systems that was considered by Malrieu. In the next section, we shall explain about our main tools and results; the rest of the paper will be devoted to the proofs. Some auxiliary estimates of general interest are postponed in Appendix.

1 Tools and main results

1.1 Wasserstein distances

To measure distances between probability measures, we shall use transportation distances, also called Wasserstein distances. They can be defined in an abstract Polish space X   as follows: given p   in [ 1 , + )   , d   a lower semi-continuous distance on X   , and μ   and ν   two Borel probability measures on X   , the Wasserstein distance of order p   between μ   and ν   is W p ( μ , ν ) : = inf π Π ( μ , ν ) ( d ( x , y ) p d π ( x , y ) ) 1 / p   where π   runs over the set Π ( μ , ν )   of all joint probability measures on the product space X × X   with marginals μ   and ν   ; it is easy to check [29,Theorem7.3that W p   is a distance on the set P p ( X )   of Borel probability measures μ   on X   such that d ( x 0 , x ) p d μ ( x ) < +   .
For this choice of distance, in view of Sanov's theorem, a very natural class of inequalities is the family of so-called transportation inequalities, or Talagrand inequalities (see [17for instance): by definition, given p 1   and λ > 0   , a probability measure μ   on X   satisfies T p ( λ )   if the inequality W p ( ν , μ ) 2 λ H ( ν | μ )   holds for any probability measure ν   . We shall say that μ   satisfies a T p   inequality if it satisfies T p ( λ )   for some λ > 0   . By Jensen's inequality, these inequalities become stronger as p   becomes larger; so the weakest of all is T 1   . Some variants introduced in [8will also be considered.
Of course T p   is not a very explicit condition, and a priori it is not clear how to check that a given probability measure satisfies it. It has been proven [7, 14, 8that T 1   is equivalent to the existence of a square-exponential moment: in other words, a reference measure μ   satisfies T 1   if and only if there is α > 0   such that e α d ( x , y ) 2 d μ ( x ) < +   for some (and thus any) y X   . If that condition is satisfied, then one can find explicitly some λ   such that T 1 ( λ )   holds true: see for instance [8.
This criterion makes T 1   a rather convenient inequality to use. Another popular inequality is T 2   , which appears naturally in many situations where a lot of structure is available, and which has good tensorization properties in many dimensions. Up to now, T 2   inequalities have not been so well characterized: it is known that they are implied by a Logarithmic Sobolev inequality [23, 6, 30, and that they imply a Poincaré, or spectral gap, inequality [23, 6.
See [11for an attempt to a criterion for T 2   . In any case, contrary to the case p = 1   , there is no hope to obtain T 2   inequalities from just integrability or decay estimates.
In this paper, we shall mainly focus on the case p = 1   , which is much more flexible.

1.2 Metric entropy

When X   is a compact space, the minimum number m ( X , r )   of balls of radius r   needed to cover X   is called the metric entropy of X   . This quantity plays an important role in quantitative variants of Sanov's Theorem [12,Exercise4.5.5. In the present paper, to fix ideas we shall always be working in the particular Euclidean space R d   , which of course is not compact; and we shall reduce to the compact case by truncating everything to balls of finite radius R   . This particular choice will influence the results through the function m ( P p ( B R ) , r )   , where B R   is the ball of radius R   centered at some point, say the origin, and P p ( B R )   is the space of probability measures on B R   , metrized by W p   .

1.3 Sanov-type theorems

The core of our estimates is based on variants of Sanov's Theorem, all dealing with independent random variables. Let μ   be a given probability measure on R d   , and let ( X i ) i = 1 , . . . , N   be a sample of independent variables, all distributed according to μ   ; let also μ ^ N : = 1 N i = 1 N δ X i   be the associated empirical measure. In our first main result we assume a T p   inequality for the measure μ   , and deduce from that an upper bound in W p   distance:
Theorem 1.1. Let p [ 1 , 2 ]   and let μ   be a probability measure on R d   satisfying a T p ( λ )   inequality. Then, for any d > d   and λ < λ   , there exists some constant N 0   , depending on λ , d   and some square-exponential moment of μ   , such that for any ɛ > 0   and N N 0 max ( ɛ ( d + 2 ) , 1 )   ,
P [ W p ( μ , μ ^ N ) > ɛ ] e γ p λ 2 N ɛ 2 , (1.1)
where γ p = { 1 if 1 p < 2 3 2 2 if p = 2 .  
Compared to Sanov's Theorem, this result is more restrictive in the sense that it requires some extra assumptions on the reference measure μ   , but under these hypotheses we are able to replace a result which was only asymptotic by a pointwise upper bound on the error probability, together with a lower bound on the required size of the sample.
In view of the Kantorovich-Rubinstein duality formula
W 1 ( μ , ν ) = sup { f d ( μ ν ) ; f L i p 1 } , (1.2)
Theorem  1.1 implies concentration inequalities such as P [ sup f ; f L i p 1 ( 1 N k = 1 N f ( X i ) f d μ ) > ɛ ] e λ 2 N ɛ 2   for λ < λ   , and N   sufficiently large, under the assumption that μ   satisfies a T 1   inequality, or equivalently admits a finite square-exponential moment. Those types of inequalities are of interest in non-parametric statistics and choice models [22.
Remark 1.2. The sole inequality T 1 ( λ )   implies that for all 1-Lipschitz function f   , P [ 1 N k = 1 N f ( X i ) f d μ > ɛ ] e λ 2 N ɛ 2 ,   and it is easy to see that the coefficient λ   in this inequality is the best possible.
While the quantity controlled in Theorem  1.1 is much stronger, the estimate is weakened only in that λ   is replaced by some λ > λ   (arbitrarily close to λ   ) and that N   has to be large enough. In fact, a variant of the proof below would yield estimates such as P [ W p ( μ , μ ^ N ) > ɛ ] C ( ɛ ) e γ λ 2 N ɛ 2 ,   where now there is no restriction on N   , but C ( ɛ )   is a larger constant, explicitly computable from the proof.
Remark 1.3. As pointed out to us by M. Ledoux, there is another way to concentration estimates on the empirical measure when d = p = 1   . Indeed, in this specific case, W 1 ( μ ^ N , μ ) = 1 N i = 1 N H ( X i ) F L 1 ( R )   where H = 1 [ 0 , + )   stands for the Heaviside function on R   and F   denotes the repartition function of μ   , so that P [ W 1 ( μ ^ N , μ ) ɛ ] = P [ 1 N i = 1 N F i L 1 > ɛ ]   where F i : = H ( X i ) F ( 1 i N )   are centered L 1 ( R )   -valued independent identically distributed random variables.
But, according to [1,Exercise3.8.14, a centered L 1 ( R )   -valued random variable Y   satisfies a Central Limit Theorem if and only if R ( E [ Y 2 ( t ) ] ) 1 / 2 d t < + ,   a condition which for the random variables F i   's can be written
R F ( t ) ( 1 F ( t ) ) d t < + . (1.3)
Condition  1.3 in turn holds true as soon as (for instance) R | x | 2 + δ d μ ( x )   is finite for some positive δ   . Then we may apply a quantitative version of the Central Limit Theorem for random varaiables in the Banach space L 1 ( R )   . See [16and [18for related works.
Remark 1.4. Theorem  1.1 applies if N   is at least as large as ɛ r   for some r > d + 2   ; we do not know whether d + 2   here is optimal.
For the applications that we shall treat, in which the tails of the probability distributions will be decaying very fast, Theorem  1.1 will be sufficient. However, it is worthwile pointing out that the technique works under much broader assumptions: weaker estimatescan be proven for probability measures that do not decay fast enough to admit finite square-exponential moments. Here below are some such results using only polynomial moment estimates:
Theorem 1.5. Let q 1   and let μ   be a probability measure on R d   such that R d | x | q d μ ( x ) < + .   Then (i) For any p [ 1 , q / 2 )   , δ ( 0 , q / p 2 )   and d > d   , there exists a constant N 0   such that P [ W p ( μ , μ ^ N ) > ɛ ] ɛ q N q 2 p + δ 2   for any ɛ > 0   and N N 0 max ( ɛ q 2 p + d q p , ɛ d d )   ; (ii) For any p [ q / 2 , q )   , δ ( 0 , q / p 1 )   and d > d   there exists a constant N 0   such that P [ W p ( μ , μ ^ N ) > ɛ ] ɛ q N 1 q p + δ   for any ɛ > 0   and N N 0 max ( ɛ q 2 p + d q p , ɛ d d )   .
Here are also some variants under alternative “regularity” assumptions:
Theorem 1.6.
  • (i) Let p 1   ; assume that α : = e α | x | d μ   is finite for some α > 0   . Then, for all d > d   , there exist some constants K   and N 0   , depending only on d   , α   and α   , such that P [ W p ( μ , μ ^ N ) > ɛ ] e K N 1 / p min ( ɛ , ɛ 2 )   for any ɛ > 0   and N N 0 max ( ɛ ( 2 p + d ) , 1 )   .
  • (ii) Suppose that μ   satifies T 1   and a Poincaré inequality, then for all a < 2   there exists some constants K   and N 0   such that
    P [ W 2 ( μ , μ ^ N ) > ɛ ] e K N min ( ɛ 2 , ɛ a ) (1.4)
    for any ɛ > 0   and N N 0 max ( ɛ ( 4 + d ) , 1 )   .
  • (iii) Let p > 2   and let μ   be a probability measure on R d   satisfying T p ( λ )   .
    Then for all λ < λ   and d > d   there exists some constant N 0   , depending on μ   only through λ   and some square-exponential moment, such that
    P [ W p ( μ , μ ^ N ) > ɛ ] min ( e λ 2 N ɛ 2 + e ( N ɛ d + 2 ) 2 / d , 2 e λ 4 N 2 / p ɛ 2 ) (1.5)
    for any ɛ > 0   and N N 0 max ( ɛ ( d + 2 ) , 1 )   .

1.4 Interacting systems of particles

We now consider a system of N   interacting particles whose time-evolution is governed by the system of coupled stochastic differential equations
d X t i = 2 d B t i V ( X t i ) d t 1 N j = 1 N W ( X t i X t j ) d t , i = 1 , . . . , N . (1.6)
Here X t i   is the position at time t   of particule number i   , the B i   's are N   independent Brownian motions, and V   and W   are smooth potentials, sufficiently nice that  1.6 can be solved globally in time. We shall always assume that W   (which can be interpreted as an interaction potential) is a symmetric function, that is W ( z ) = W ( z )   for all z R d   .
Equation  1.6 is a particularly simple instance of coupled system; in the case when V   is quadratic and W   has cubic growth, it was used as a simple mean-field kinetic model for granular media (see e.g. [19). While many of our results could be extended to more general systems, that particular one will be quite enough for our exposition.
To this system of particles is naturally associated the empirical measure, defined for each time t 0   by
μ ^ t N : = i = 1 N δ X t i . (1.7)
Under suitable assumptions on the potentials V   and W   , it is a classical result that, if the initial positions of the particle system are distributed chaotically (for instance, if they are identically distributed, independent random variables), then the empirical measure μ ^ t N   converges as N   to a solution of the nonlinear partial differential equation
μ t t = Δ μ t + ( μ t ( V + W * μ t ) ) , (1.8)
where   stands for the divergence operator. Equation  1.8 is a simple instance of McKean-Vlasov equation. This convergence result is part of the by now well-developed theory of propagation of chaos, and was studied by Sznitman for pedagogical reasons [27, in the case of potentials that grow at most quadratically at infinity. Later, Benachour, Roynette, Talay and Vallois [2, 3considered the case where the interaction potential grows faster than quadratically. As far as the limit equation  1.8 is concerned, a discussion of its use in the modelling of granular media in kinetic theory was performed by Benedetto, Caglioti, Carrillo and Pulvirenti [4, 5, while the asymptotic behavior in large time was studied by Carrillo, McCann and Villani [9, 10with the help of Wasserstein distances and entropy inequality methods. Then Malrieu [19presented a detailed study of both limits t   and N   by probabilistic methods, and established estimates of the type of  0.1 under adequate convexity assumptions on V   and W   (see also [29,Problem 15).
As announced before, we shall now give some estimates on the convergence at the level of the law itself. To fix ideas, we assume that V   and W   have locally bounded Hessian matrices satisfying
{ (i) D 2 V ( x ) β I , γ I D 2 W ( x ) γ I , x R d , (ii) | V ( x ) | = O ( e a | x | 2 ) for any a > 0 . (1.9)
Under these assumptions, we shall derive the following bounds.
Theorem 1.7. Let μ 0   be a probability measure on R d   , admitting a finite square-exponential moment: α 0 > 0 ; M α 0 : = e α 0 | x | 2 d μ 0 ( x ) < + .   Let ( X 0 i ) 1 i N   be N   independent random variables with common law μ 0   . Let ( X t i )   be the solution of  1.6 with initial value ( X 0 1 , X 0 N )   , where V   and W   are assumed to satisfy  1.9 ; and let μ t   be the solution of  1.8 with initial value μ 0   .
Let also μ ^ t N   be the empirical measure associated with the ( X t i ) 1 i N   . Then, for all T 0   , there exists some constant K = K ( T )   such that, for any d > d   , there exists some constants N 0   and C   such that for all ɛ > 0   N N 0 max ( ɛ ( d + 2 ) , 1 ) P [ sup 0 t T W 1 ( μ ^ t N , μ t ) > ɛ ] C ( 1 + T ɛ 2 ) exp ( K N ɛ 2 ) .  
Note that in the above theorem we have proven not only that for all t   , the empirical measure is close to the limit measure, but also that the probability of observing any significant deviation during a whole time period [ 0 , T ]   is small.
The fact that μ ^ t N   is very close to the deterministic measure μ t   implies the propagation of chaos: two particles drawn from the system behave independently of each other as N   (see Sznitman [27for more details). But we can also directly study correlations between particles and find more precise estimates: for that purpose it is convenient to consider the empirical measure on pairs of particles, defined as μ ^ t N , 2 : = 1 N ( N 1 ) i j δ ( X t i , X t j ) .   By a simple adaptation of the computations appearing in the proof of Theorem  1.7 , one can prove
Theorem 1.8. With the same notation and assumptions as in Theorem  1.7 , for all T 0   and d > d   , there exists some constants K > 0   and N 0   such that for all ɛ > 0   N N 0 max ( ɛ ( d + 2 ) , 1 ) P [ W 1 ( μ ^ t N , 2 , μ t μ t ) > ɛ ] exp ( K N ɛ 2 ) .  
(Here W 1   stands for the Wasserstein distance or order 1   on P 1 ( R d × R d )   .) Of course, one may similarly consider the problem of drawing k   particles with k 2   .
Theorems  1.7 and  1.8 use Theorem  1.1 as a crucial ingredient, which is why a strong integrability assumption is imposed on μ 0   . Note however that, under stronger assumptions on the behaviour at infinity of V   or W   , as the existence of some β R   , B , ɛ > 0   such as D 2 V ( x ) ( B | x | ɛ + β ) I , , x R d ,   it can be proven that any square exponential moment for μ t   becomes instantaneously finite for t > 0   . Note also that, by using Theorem  1.5 , one can obtain weaker but still relevant results of concentration of the empirical measure under just polynomial moment assumptions on μ 0   , provided that V   does not grow too fast at infinity. To limit the size of this paper, we shall not go further into such considerations.

1.5 Uniform in time estimates

In the “uniformly convex case” when β > 0 , β + 2 γ > 0   , it can be proven [19, 9, 10that μ t   converges exponentially fast, as t   , to some equilibrium measure μ   . In that case, it is natural to expect that the empirical measure is a good approximation of μ   as N   and t   , uniformly in time. This is what we shall indeed prove:
Theorem 1.9. With the same notation and assumptions as in Theorem  1.7 , suppose that β > 0 , β + 2 γ > 0   . Then there exists some constant K > 0   such that for any d > d   , there exists some constants C   and N 0   such that for all ɛ > 0   N N 0 max ( ɛ ( d + 2 ) , 1 ) sup t 0 P [ W 1 ( μ ^ t N , μ t ) > ɛ ] C ( 1 + ɛ 2 ) exp ( K N ɛ 2 )   As a consequence, there are constants T 0   , ɛ 0   (depending on the initial datum) and K = K / 4   such that, under the same conditions on N   and ɛ   , sup t T 0 log ( ɛ 0 / ɛ ) P [ W 1 ( μ ^ t N , μ ) > ɛ ] C ( 1 + ɛ 2 ) exp ( K N ɛ 2 ) .  
Remark 1.10. In view of the results in [9, it is natural to expect that a similar conclusion holds true when V = 0   and W   is convex enough. Propositions  3.1 and  3.8 below extend to that case, but it seems trickier to adapt the proof of Proposition  3.8 .
We conclude with an application to the numerical reconstruction of the invariant measure.
Theorem 1.11. With the same notation and assumptions as in Theorem  1.9 , consider the mollified empirical measure  0.6 . Then one can choose α = O ( ɛ )   in such a way that
N N 0 max ( ɛ ( d + 2 ) , 1 ) sup t T 0 log ( ɛ 0 / ɛ ) P [ f ~ t f L > ɛ ] C ( 1 + ɛ ( 2 d + 4 ) ) exp ( K N ɛ 2 d + 4 ) .  
These results are effective: all the constants therein can be estimated explicitly in terms of the data.

1.6 Strategy and plan

The strategy is rather systematic. First, we shall establish Sanov-type bounds for independent variables in R d   (not depending on time), resulting in concentration results such as Theorems  1.1 to  1.6 . This will be achieved along the ideas in [12,Exercices4.5.5and 6.2.19(see also [25,Section5), by first truncating to a compact ball, and then covering the set of probability measures on this ball by a finite number of small balls (in the space of probability measures); the most tricky part will actually lie in the optimization of parameters.
With such results in hand, we will start the study of the particle system by introducing the nonlinear partial differential equation  1.8 . For this equation, the Cauchy problem can be solved in a satisfactory way, in particular existence and uniqueness of a solution, which for t > 0   is reasonably smooth, can be shown under various assumptions on V   and W   (see e.g. [9, 10). Other regularity estimates such as the decay at infinity, or the smoothness in time, can be established; also the convergence to equilibrium in large time can sometimes be proven.
Next, following the presentation by Sznitman [27, we introduce a family of independent processes ( Y t i ) 1 i N   , governed by the stochastic differential equation
{ d Y t i = 2 d B t i V ( Y t i ) d t W * μ t ( Y t i ) d t , Y 0 i = X 0 i . (1.10)
As a consequence of Itô's formula, the law ν t   of each Y t i   is a solution of the linear partial differential equation ν t t = Δ ν t + ( ( V + W * μ t ) ν t ) , ν 0 = μ 0 .   But this linear equation is also solved by μ t   , and a uniqueness theorem implies that actually ν t = μ t   , for all t 0   . See [2, 3for related questions on the stochastic differential equation  1.10 .
For each given t   , the independence of the variables Y t i   and the good decay of μ t   will imply a strong concentration of the empirical measure ν ^ t N : = 1 N i = 1 N δ Y t i .   To go further, we shall establish a more precise information, such as a control on P [ sup 0 t T W 1 ( ν ^ t N , μ t ) > ɛ ] .   Such bounds will be obtained by combining the estimate of concentration at fixed time t   with some estimates of regularity of ν ^ t N   (and μ t   ) in t   , obtained via basic tools of stochastic differential calculus (in particular Doob's inequality).
Finally, we can show by a Gronwall-type argument that the control of the distance of μ ^ t N   to μ t   reduces to the control of the distance of ν ^ t N   to μ t   : for instance,
P [ sup 0 t T W 1 ( μ ^ t N , μ t ) > ɛ ] P [ sup 0 t T W 1 ( ν ^ t N , μ t ) > C ɛ ] (1.11)
for some constant C   . We shall also show how a variant of this computation provides estimates of the type of those in Theorem  1.9 , and how to get data reconstruction estimates as in Theorem  1.11 .

1.7 Remarks and further developments

The results in this paper confirm what seems to be a rather general rule about Wasserstein distances: results in distance W 1   are very robust and can be used in rather hard problems, with no particular structure; on the contrary, results in distance W 2   are stronger, but usually require much more structure and/or assumptions. For instance, in the study of the equation  1.8 , the distance W 2   works beautifully, and this might be explained by the fact that  1.8 has the structure of a gradient flow with respect to the W 2   distance [9, 10. In the problem considered by Malrieu [19, W 2   is also well-adapted, but leads him to impose strong assumptions on the initial datum μ 0   , such as the existence of a Logarithmic Sobolev inequality for μ 0   , considered as a reference measure. As a general rule, in a context of geometric inequalities with more or less subtle isoperimetric content, related to Brenier's transportation mapping theorem, W 2   is also the most natural distance to use [29. On the contrary, here we are considering quite a rough problem (concentration for the law of a random probability measure, driven by a stochastic differential equation with coupling) and we wish to impose only natural integrability conditions; then the distance W 1   is much more convenient.
Further developments could be considered. For instance, one may desire to prove some deviation inequalities for dependent sequences, say Markov chains, as both Sanov's theorem and transportation inequality can be established under appropriate ergodicity and integrability conditions.
Considering again the problem of the particle system, in a numerical context, one may wish to take into account the numerical errors associated with the time-discretization of the dynamics (say an implicit Euler scheme). For concentration estimates in one observable, a beautiful study of these issues was performed by Malrieu [20. For concentration estimates on the whole empirical measure, to our knowledge the study remains to be done. Also errors due to the boundedness of the phase space actually used in the simulation might be taken into account, etc.
At a more technical level, it would be desirable to relax the assumption of boundedness of D 2 W   in Theorem  1.7 , so as to allow for instance the interesting case of cubic interaction.
This is much more technical and will be considered in a separate work.
Another issue of interest would be to consider concentration of the empirical measure on path space, i.e. μ ^ [ 0 , T ] N : = 1 N i = 1 N δ ( X t i ) 0 t T ,   where T   is a fixed time length. Here μ ^ [ 0 , T ] N   is a random measure on C ( [ 0 , T ] ; R d )   and we would like to show that it is close to the law of the trajectories of the nonlinear stochastic differential equation
d Y t = 2 d B t V ( Y t ) d t ( W * μ t ) ( Y t ) d t , (1.12)
where the initial datum Y 0   is drawn randomly according to μ 0   . This will imply a quantitative information on the whole trajectory of a given particle in the system.
When one wishes to adapt the general method to this question, a problem immediately occurs: not only is C ( [ 0 , T ] ; R d )   not compact, but also balls with finite radius in thisspace are not compact either (of course, this is true even if the phase space of particles is compact). One may remedy to this problem by embedding C ( [ 0 , T ] ; B R )   into a space such as L 2 ( [ 0 , T ] ; B R )   , equipped with the weak topology; but we do not know of any “natural” metric on that space. There is (at least) another way out: we know from classical stochastic processes theory that integral trajectories of differential equations driven by white noise are typically Hölder- α   for any α < 1 / 2   . This suggests a natural strategy: choose any fixed α ( 0 , 1 / 2 )   and work in the space α ( [ 0 , T ] ; R d )   , equipped with the norm w α : = sup 0 t T | w ( t ) | + sup s t | w ( t ) w ( s ) | | t s | α .   For any R > 0   , the ball of radius R   and center 0 (the zero function) in α   is compact, and one may estimate its metric entropy. Then one can hope to perform all estimates by using the norm α   ; for instance, establish a bound on, say, a square-exponential moment on the law of Y t   : E exp ( β ( Y t ) 0 t T α 2 ) < + .   Again, to avoid expanding the size of the present paper too much, these issues will be addressed separately.

2 The case of independent variables

In this section we consider the case where we are given N   independent variables X i R d   , distributed according to a certain law μ   . There is no time dependence at this stage. We shall first examine the case when the law μ   has very fast decay (Theorem  1.1 ), then variants in which it decays in a slower way (Theorem  1.5 and  1.6 ).

2.1 Proof of Theorem  1.1 

The proof splits into three steps: (1) Truncation to a compact ball B R   of radius R   , (2) covering of P ( B R )   by small balls of radius r   and Sanov's argument, and (3) optimization of the parameters.
Step 1: Truncation. Let R > 0   , to be chosen later on, and let B R   stand for the ball of radius R   and center 0 (say) in R d   . Let 1 B R   stand for the indicator function of B R   . We truncate μ   into a probability measure μ R   on the ball B R   :
μ R = 1 B R μ μ [ B R ] .   We wish to bound the quantity P [ W p ( μ ^ N , μ ) > ɛ ]   in terms of μ R   and the associated empirical measure. For this purpose, consider independent variables ( X k ) 1 k N   drawn according to μ   , and ( Y k ) 1 k N   drawn according to μ R   , independent of each other; then define X R k : = { X k if | X k | R Y k if | X k | > R .   Since X 1   and X R 1   are distributed according to μ   and μ R   respectively, we have, by definition of Wasserstein distance,
W p p ( μ , μ R ) E | X 1 X R 1 | p = E ( | X 1 Y 1 | p 1 | X 1 | > R ) 2 p E ( | X 1 | p 1 | X 1 | > R ) = 2 p { | x | > R } | x | p d μ ( x ) .  
But μ   satisfies a T p ( λ )   inequality for some p 1   , hence a fortiori a T 1 ( λ )   inequality, so E α : = R d e α | x | 2 d μ ( x ) < +   for some α > 0   (any α < λ / 2   would do). If R   is large enough (say, R p / ( 2 α )   ), then the function r r p e α r 2   is nonincreasing for r R   , and then W p p ( μ , μ R ) 2 p ( R p e α R 2 ) { | x | > R } e α | x | 2 d μ ( x ) .   We conclude that
W p p ( μ , μ R ) 2 p E α R p e α R 2 ( α < λ / 2 , R p / 2 α ) . (2.1)
On the other hand, the empirical measures μ ^ N : = 1 N k = 1 N δ X k , μ ^ R N : = 1 N k = 1 N δ X R k   satisfy W p p ( μ ^ R N , μ ^ N ) 1 N k = 1 N | X R k X k | p 1 N k = 1 N Z k ,   where Z k : = 2 p | X k | p 1 | X k | > R   ( k = 1 , , N )   . Then, for any p [ 1 , 2 ]   , we can introduce parameters ɛ   and θ > 0   , and use Chebyshev's exponential inequality and the independence of the variables Z k   to obtain
P [ W p ( μ ^ R N , μ ^ N ) > ɛ ] P [ 1 N k = 1 N Z k > ɛ p ]
= P [ exp k = 1 N θ ( Z k ɛ p ) > 1 ]
E ( exp k = 1 N θ ( Z k ɛ p ) )
= exp ( N [ θ ɛ p log E exp ( θ Z 1 ) ] ) . (2.2)
In the case when p < 2 , for any α 1 < α < λ 2   , there exists some constant R 0 = R 0 ( α 1 , p )   such that 2 p θ r p α 1 r 2 + C ,   for all θ > 0   and r R 0 θ 1 2 p   , whence E exp ( θ Z 1 ) E exp ( α 1 | X 1 | 2 1 | X k | > R ) 1 + E α e ( α 1 α ) R 2 .   As a consequence,
P [ W p ( μ ^ R N , μ ^ N ) > ɛ ] exp ( N [ θ ɛ p E α e ( α 1 α ) R 2 ] ) . (2.3)
From  2.1 ,  2.3 and the triangular inequality for W p   ,
P [ W p ( μ , μ ^ N ) > ɛ ] P [ W p ( μ , μ R ) + W p ( μ R , μ ^ R N ) + W p ( μ ^ R N , μ ^ N ) > ɛ ]
P [ W p ( μ R , μ ^ R N ) > η ɛ 2 E α 1 / p R e α p R 2 ] + P [ W p ( μ ^ R N , μ ^ N ) > ( 1 η ) ɛ ]
P [ W p ( μ R , μ ^ R N ) > η ɛ 2 E α 1 / p R e α p R 2 ]
+ exp ( N ( θ ( 1 η ) p ɛ p E α e ( α 1 α ) R 2 ) ) . (2.4)
This estimate was established for any given p [ 1 , 2 )   , η ( 0 , 1 )   , ɛ , θ > 0   , α 1 < α < λ 2   and R max ( p / 2 α , R 0 θ 1 2 p )   , where R 0   is a constant depending only on α 1   and p   .
In the case when p = 2 , we let Z k : = | Y k X k | 2 1 | X k | > R   ( k = 1 , , N )   , and starting from inequality  2.2 again, we choose α 1 < α   and then θ : = α 1 / 2   : by definition of Z 1   and μ R   ,
E ( exp ( α 1 2 Z 1 ) ) = R 2 d exp ( α 1 2 | y x | 2 1 | x | R ) d μ ( x ) d μ R ( y )
= μ [ B R ] + 1 μ [ B R ] | y | R | x | R exp ( α 1 2 | y x | 2 ) d μ ( x ) d μ ( y )
1 + ( 1 E α e α R 2 ) 1 | y | R e α 1 | y | 2 d μ ( y ) | x | R e α 1 | x | 2 d μ ( x )
1 + 2 E α 2 e ( α 1 α ) R 2
for R   large enough, from which
P [ W 2 ( μ ^ R N , μ ^ N ) > ɛ ] exp ( N [ α 1 2 ɛ 2 2 E α 2 e ( α 1 α ) R 2 ] ) . (2.5)
To sum up, in the case p = 2   equation  2.4 writes
P [ W 2 ( μ , μ ^ N ) > ɛ ) ] P [ W 2 ( μ R , μ ^ R N ) > η ɛ 2 E α 1 / 2 R e α 2 R 2 ] + exp ( N ( α 1 2 ( 1 η ) 2 ɛ 2 2 E α 2 e ( α 1 α ) R 2 ) ) . (2.6)
So, apart from some error terms, for all p [ 1 , 2 ]   we have reduced the initial problem to establishing the result only for the probability law μ R   , whose support lies in the compact set B R   .
We end up this truncation procedure by proving that μ R   satisfies some modified T p   inequality. Let indeed ν   be a probability measure on B R   , absolutely continuous with respect to μ   (and hence with respect to μ R   ); then, when R   is larger than some constant depending only on E α   , we can write
H ( ν | μ R ) H ( ν | μ ) = B R log d ν d μ R d ν B R log d ν d μ d ν = log μ [ B R ]
log ( 1 E α e α R 2 )
2 E α e α R 2 . (2.7)
But μ   satisfies a T p ( λ )   inequality, so H ( ν | μ ) λ 2 W p 2 ( μ , ν ) λ 2 ( W p ( μ R , ν ) W p ( μ R , μ ) ) 2   by triangular inequality. Combining this with  2.7 , we obtain H ( ν | μ R ) λ 2 ( W p ( μ R , ν ) W p ( μ R , μ ) ) 2 2 E α e α R 2   From this, inequality  2.1 and the elementary inequality
a ( 0 , 1 ) C a > 0 ; x , y R , ( x y ) 2 ( 1 a ) x 2 C a y 2 , (2.8)
we deduce that for any λ 1 < λ   there exists some constant K   such that
H ( ν | μ R ) λ 1 2 W p ( μ R , ν ) 2 K R 2 e α R 2 . (2.9)
Step 2: Covering by small balls. In this second step we derive quantitative estimates on μ ^ R N   . Let φ   be a bounded continuous function on R d   , and let   be a Borel set in P ( B R )   (equipped with the weak topology of convergence against bounded continuous test functions). By Chebyshev's exponential inequality and the independence of the variables X R k   ,
P [ μ ^ R N ] exp ( N inf ν B R φ d ν ) E ( e N B R φ d μ ^ R N )
= exp ( N inf ν [ B R φ d ν 1 N log E ( e N B R φ d μ ^ R N ) ] )
= exp ( N inf ν [ B R φ d ν 1 N log E ( e k = 1 N φ ( X R k ) ) ] )
= exp ( N inf ν [ B R φ d ν log B R e φ d μ R ] ) .
As φ   is arbitrary, we can pass to the supremum and find P [ μ ^ R N ] exp ( N sup φ C b ( R d ) inf ν [ B R φ d ν log B R e φ d μ R ] ) .   Now we note that the quantity φ d ν log e φ d μ R   is linear in ν   and convex lower semi-continuous (with respect to the topology of uniform convergence) in φ   ; if we further assume that   is convex and compact, then (for instance) Sion's min-max theorem [26,Theorem4.2'ensures that sup φ C b ( R d ) inf ν [ B R φ d ν log e φ d μ R ] = inf ν sup φ C b ( R d ) [ B R φ d ν log e φ d μ R ] .   By the dual formulation of the H   functional [12,Lemma6.2.13, we conclude that
P [ μ ^ R N ] exp ( N inf ν H ( ν | μ R ) ) . (2.10)
Now, let δ > 0   and let A   be a measurable subset of P ( B R )   . We cover the latter with N A   balls ( B i ) 1 i N A   with radius δ / 2   in W p   metric. Each of these balls is convex and compact, and it is included in the δ   -thickening of A   in W p   metric, defined as A δ : = { ν P ( B R ) ; ν a A , W p ( ν , ν a ) δ } .   So, by  2.10 we get
P [ μ ^ R N A ] P [ μ ^ R N N A i = 1 B i ]
i = 1 N A P ( μ ^ R N B i )
i = 1 N A exp ( N inf ν B i H ( ν | μ R ) )
N A exp ( N inf ν A δ H ( ν | μ R ) ) . (2.11)
We now apply this estimate with A : = { ν P ( B R ) ; W p ( ν , μ R ) η ɛ 2 E α 1 / p R e α p R 2 } .   From  2.9 we have, for any ν A δ   , H ( ν | μ R ) λ 1 2 W p ( ν , μ R ) 2 K R 2 e α R 2 λ 1 2 ρ 2 K R 2 e α R 2 ,   where ρ : = max ( η ɛ 2 E α 1 / p R e α p R 2 δ , 0 ) .   Combining this with  2.11 , we conclude that
P [ W p ( μ R , μ ^ R N ) η ɛ 2 E α 1 / p R e α p R 2 ] N A exp ( N [ λ 1 2 ρ 2 K R 2 e α R 2 ] ) . (2.12)
Now, given any λ 2 < λ 1   , it follows from  2.8 that there exist δ 1   , η 1   and K 1   , depending on α , λ 1 , λ 2   , such that
λ 1 2 ρ 2 K R 2 e α R 2 λ 2 2 ɛ 2 K 1 R 2 e α R 2 (2.13)
where δ : = δ 1 ɛ   and η : = η 1   .
Though this inequality holds independently of p   , we shall use it only in the case when p < 2   . In the case p = 2   , on the other hand, we note that for any η ( 0 , 1 )   ,
λ 1 2 ρ 2 K R 2 e α R 2 λ 2 2 η 2 ɛ 2 K 1 R 2 e α R 2 (2.14)
where δ : = δ 1 ɛ   . Finally, we bound N A   by means of Theorem  A.1 in Appendix  A : there exists some constant C   (only depending on d   ) such that for all R > 0   and δ > 0   the set P ( B R )   can be covered by ( C R δ 1 ) ( C R δ ) d   balls of radius δ   in W p   metric, where a b   stands for max ( a , b )   . In particular, given δ = δ 1 ɛ   , we can choose
N A ( K 2 R ɛ 1 ) ( K 2 R ɛ ) d (2.15)
balls of radius δ   , for some constant K 2   depending on λ 1   and λ 2   (via δ 1   ) but neither on ɛ   nor on R   . (The purpose of the 1 in ( K 2 R / ɛ 1 )   is to make sure that the estimate is also valid when ɛ > R   .) Combining  2.4 ,  2.12 ,  2.13 and  2.15 , we find that, given p [ 1 , 2 )   , λ 2 < λ   and α 1 < α < λ 2   , there exist some constants K 1   , K 2   , K 3   and R 1   such that for all ɛ , ζ > 0   and R R 1 max ( 1 , ζ 1 2 p )   ,
P [ W p ( μ , μ ^ N ) > ɛ ] ( K 2 R ɛ 1 ) K 2 ( R ɛ ) d exp ( N [ λ 2 ɛ 2 2 K 1 R 2 e α R 2 ] ) + exp ( N ( K 3 ζ ɛ p K 4 e ( α 1 α ) R 2 ) ) (2.16)
for some constant K 4 = K 4 ( θ , α 1 )   . In the case when p = 2 , we obtain similarly
P [ W 2 ( μ , μ ^ N ) > ɛ ] ( K 2 R ɛ 1 ) K 2 ( R ɛ ) d exp ( N [ λ 2 2 η 2 ɛ 2 K 1 R 2 e α R 2 ] ) + exp ( N ( α 1 2 ( 1 η ) 2 ɛ 2 K 4 e ( α 1 α ) R 2 ) ) (2.17)
for any η ( 0 , 1 )   and R R 1   .
These estimates are not really appealing (!), but they are rather precise and general. In the rest of the section we shall show that an adequate choice of R   leads to a simplified expression.
Step 3: Choice of the parameters.
We first consider the case when p [ 1 , 2 ) . Let λ < λ 2   , α < α   and d 1 > d   . We claim that P [ W p ( μ , μ ^ N ) > ɛ ] exp ( λ 2 N ɛ 2 ) + exp ( α N ɛ 2 )   as soon as
R 2 R 2 max ( 1 , ɛ 2 , log ( 1 ɛ 2 ) ) , N ɛ d 1 + 2 K 5 R d 1 (2.18)
for some constants R 2   and K 5   depending on μ   only through λ , α   and E α   .
Indeed, on one hand K 2 ( R ɛ ) d log ( K 2 R ɛ ) K 6 ( R ɛ ) d 1   for some constant K 6   , on the other hand K 1 R 2 e α R 2 e α 1 R 2   for R   large enough, and then K 6 ( R ɛ ) d 1 N [ λ 2 ɛ 2 2 e α 1 R 2 ] N λ ɛ 2 2   for R 2 / log ( 1 ɛ 2 )   and N ɛ d 1 + 2 / R d 1   large enough; this is enough to bound the first term in the right-hand side of  2.16 if moreover R / ɛ   is large enough.
Moreover, letting α 2 ( α , α 1 )   , we can choose ζ   in such a way that K 3 ζ = ɛ 2 p   , so that exp ( N ( K 3 ζ ɛ p K 4 e ( α 1 α ) R 2 ) ) = exp ( N ( α 2 ɛ 2 K 4 e ( α 1 α ) R 2 ) ) ,   which in the end can be bounded by exp ( N α ɛ 2 )   if R   and R 2 / log ( 1 ɛ 2 )   are large enough. With this one can get a bound on the right-hand side of  2.16 .
Now let us check that conditions  2.18 can indeed be fulfilled. Clearly, the first condition holds true for all ɛ ( 0 , 1 )   and R 2 R 3 log ( K 6 ɛ 2 )   , where R 3   and K 6   are positive constants.
Then, we can choose R : = ( N K 5 ɛ d 1 + 2 ) 1 / d 1   so that the second condition holds as an equality. This choice is admissible as soon as ( N K 5 ɛ d 1 + 2 ) 2 / d 1 R 3 log ( K 5 ɛ 2 )   and this, in turn, holds true as soon as
N K 7 ɛ ( d + 2 ) , (2.19)
where d   is such that d > d   , and K 7   is large enough.
If ɛ 1   , then we can choose R 2 = R 2 ɛ 2   , i.e. R = R 2 ɛ   , and then the second inequality in  2.18 will be true as soon as N   is large enough.
To sum up: Given d > d   , λ < λ   and α < α   , there exists some constant N 0   , depending on d   and depending on μ   only through λ , α   and E α   , such that for all ɛ > 0   , P [ W p ( μ , μ ^ N ) > ɛ ] exp ( λ 2 N ɛ 2 ) + exp ( α N ɛ 2 )   as soon as N N 0 max ( ɛ ( d + 2 ) , 1 )   . Then we note that, given K < min ( λ 2 , α )   , the inequality exp ( λ 2 N ɛ 2 ) + exp ( α N ɛ 2 ) exp ( K N ɛ 2 )   holds if condition  2.19 is satisfied for some K 7   large enough. To conclude the proof of Theorem  1.1 in the case when p [ 1 , 2 )   , it is sufficient to choose λ < λ   , α < λ / 2   .
Now, in the case when p = 2 , given λ 3 < λ 2   and α 2 < α 1   , conditions  2.18 imply P [ W 2 ( μ , μ ^ N ) > ɛ ] exp ( λ 3 2 η 2 N ɛ 2 ) + exp ( α 2 2 ( 1 η ) 2 N ɛ 2 ) .   Then we let α 2 : = λ 3 2   and η : = 2 1   , so that λ 3 2 η 2 = α 2 2 ( 1 η ) 2 .   Then P [ W 2 ( μ , μ ^ N ) > ɛ ] 2 exp ( ( 3 2 2 ) λ 3 2 N ɛ 2 ) ;   for λ < λ   , the above quantity is bounded by exp ( ( 3 2 2 ) λ 2 N ɛ 2 )   as soon as  2.19 is enforced with K 7   large enough. This concludes the argument.

2.2 Proof of Theorem  1.5 

It is very similar to the proof of Theorem  1.1 , so we shall only explain where the differences lie. Obviously, the main difficulty will consist in the control of tails.
We first let p [ 1 , q )   , α [ 1 , q p )   and R > 0   , and introduce M q : = R d | x | q d μ ( x ) .   Then  2.1 may be replaced by
W p p ( μ , μ R ) 2 p M q R p q , (2.20)
and  2.2 by
P [ W p ( μ ^ R N , μ ^ N ) > ɛ ] C N α ¯ α R α p q ( ɛ p C R p q ) α (2.21)
for some constant C   depending on α   and M q   .
Let us establish for instance  2.21 . Introduce Z k = | Y k X k | p 1 | X k | > R ( 1 k N ) .   By Chebychev's inequality,
P [ W p ( μ ^ R N , μ ^ N ) > ɛ ] P [ 1 N k = 1 N Z k > ɛ p ] = P [ 1 N k = 1 N ( Z k E Z k ) > ɛ p E Z 1 ] E | k = 1 N ( Z k E Z k ) | α ( N ( ɛ p E Z 1 ) ) α  
provided that ɛ p > E Z 1   . But, since the random variables ( Z k E Z k ) k   are independent and identically distributed, with zero mean, there exists some constant C   depending on α   such that E | k = 1 N ( Z k E Z k ) | α C N α ¯ E | Z 1 E Z 1 | α   where α ¯ : = max ( α / 2 , 1 )   . This inequality is a consequence of Rosenthal's inequality in the case when α 2   , but also holds true if α [ 1 , 2 )   (see for instance [24,pp. 62and 82).
Then, on one hand, E Z 1 = E | Y 1 X 1 | p 1 | X 1 | > R 2 p M q R p q ,   while on the other hand,
E | Z 1 E Z 1 | α = E | | Y 1 X 1 | p 1 | X 1 | > R E | Y 1 X 1 | p 1 | X 1 | > R | α C E | Y 1 X 1 | α p 1 | X 1 | > R C M q R α p q  
with C   standing for various constants. Collecting these two estimates, we conclude to the validity of  2.21 for R q p ɛ p   large enough.
Then  2.20 and  2.21 together ensure that
P [ W p ( μ , μ ^ N ) > ɛ ] P [ W p ( μ R , μ ^ R N ) > η ɛ 2 M q 1 / p R 1 q / p ] + C N α ¯ α R α p q ( ( 1 η ) p ɛ p C R p q ) α (2.22)
for any ɛ ( 0 , 1 )   , η > 0   and R q p ɛ p ( 1 η ) p   large enough.
Since μ R   is supported in B R   , the Csiszár-Kullback-Pinsker inequality and Kantorovich-Rubinstein formulation of the W 1   distance together ensure that it satisfies a T 1 ( R 2 )   inequality (see e.g. [8,ParticularCase 5with p = 1   ). This estimate also extends to any W p   distance, not as a penalized T p   inequality as in  2.9 , but rather as
W p 2 p ( ν , μ R ) 2 2 p 1 R 2 p H ( ν | μ R ) (2.23)
(see again [8,ParticularCase5).
From  2.22 and  2.23 we deduce (as in  2.17 ) that
P [ W p ( μ ^ N , μ ) > ɛ ] ( K 1 R δ ) K 1 ( R δ ) d exp ( N ρ 2 p 2 2 p 1 R 2 p ) + C N α ¯ α R α p q ( ( 1 η ) p ɛ p C R p q ) α (2.24)
for any δ   , where now ρ : = ( η ɛ 2 M 1 / p R 1 q / p δ ) + .   Letting η 1 < η   and d > d   , and choosing δ = δ 0 ɛ   , we deduce P [ ( W p ( μ ^ N , μ ) > ɛ ] exp ( ( R ɛ ) d η 1 2 p 2 2 p 1 N ɛ 2 p R 2 p + K 1 2 2 p 1 N R 2 q ) + C N α ¯ α R α p q ( ( 1 η 1 ) p ɛ p C R p q ) α   for R q p ɛ p ( 1 η 1 ) p   large enough, and then
P [ W p ( μ ^ N , μ ) > ɛ ] exp ( η 2 2 p 2 2 p 1 N ɛ 2 p R 2 p ) + C N α ¯ α R α p q ( 1 η 2 ) α p ɛ α p (2.25)
for η 2 < η 1   , provided that the conditions
R R 1 ɛ p q p , N K 2 ( R ɛ ) 2 p + d (2.26)
hold for some R 1   and K 2   .
Given any choice of R   as a product of powers of N   and ɛ   , the first term in the right-hand side of  2.25 will always be smaller than the second one, if N   goes to infinity while ɛ   is kept fixed; thus we can choose R   minimizing the second term under the above conditions. Then the second condition in  2.26 will be fulfilled as an equality:
R = K 3 ɛ N 1 2 p + d .   As for the first condition in  2.26 , it can be rewritten as N N 0 ɛ q 2 p + d q p ,   and then, by  2.25 , P [ W p ( μ ^ N , μ ) > ɛ ] exp ( K 5 N d 2 p + d ) + K 6 ɛ q N α ¯ α + α p q 2 p + d .   Hence
P [ W p ( μ ^ N , μ ) > ɛ ] ɛ q N α ¯ α (2.27)
for all ɛ ( 0 , 1 )   and N   larger than some constant and, given d > d   , for all ɛ 1   and N M ɛ d d   where M   is large enough.
In the first case when p q / 2   , any admissible α   belongs to [ 1 , q / p ) [ 1 , 2 ]   , so α ¯ = 1   . If δ ( 0 , q / p 1 )   , we get from  2.27 , with α = q / p δ   , that P [ W p ( μ ^ N , μ ) > ɛ ] ɛ q N 1 q / p + δ   for all ɛ > 0   and N N 0 max ( ɛ q 2 p + d q p , ɛ d d ) .   In the second case when p < q / 2   , we only consider admissible α   's in [ 2 , q / p ) [ 1 , q / p )   , so that α ¯ α = α / 2   . Choosing δ ( 0 , q / p 2 )   , we get from  2.27  P [ W p ( μ ^ N , μ ) > ɛ ] ɛ q N q / 2 p + δ / 2   under the same conditions on N   as before. This concludes the argument.

2.3 Proof of Theorem  1.6 

It is again based on the same principles as the proofs of Theorems  1.1 and  1.5 , with the help of functional inequalities investigated in [8and [11. We skip the argument, which the reader can easily reconstruct by following the same lines as above.

2.4 Data reconstruction estimates

Finally, we show how the above concentration estimates imply data reconstruction estimates. This is a rather general estimate, which is treated here along the lines of [25,Section5and [29,Problem 10.
Proposition 2.1. Let μ   be a probability measure on R d   , with density f   with respect to Lebesgue measure. Let X 1 , , X N   be random points in R d   , and let ζ   be a Lipschitz, nonnegative kernel with unit integral. Define the random measure μ ^   and the random function f ^ ζ , α   by μ ^ : = 1 N i = 1 N δ X i , f ^ ζ , α ( x ) : = 1 N i = 1 N ζ α ( x X i ) , ζ α ( x ) = 1 α d ζ ( x α ) .   Then,
sup x R d | f ^ ζ , α ( x ) f ( x ) | ζ L i p α d + 1 W 1 ( μ ^ , μ ) + δ ( α ) , (2.28)
where δ   stands for the modulus of continuity of f   , defined as δ ( ɛ ) : = sup | x y | ɛ | f ( x ) f ( y ) | .   As a consequence, if f   is Lipschitz, then there exist some constants a , K > 0   , only depending on d   , f L i p   and ζ L i p   , such that
P [ f ^ ζ , a ɛ f L > ɛ ] P [ W 1 ( μ ^ , μ ) > K ɛ d + 2 ] (2.29)
for all ɛ > 0   .
  • Proof. First,
    | μ * ζ α ( x ) f ( x ) | = | R d ζ α ( x y ) ( f ( y ) f ( x ) ) d y |
    R d ζ α ( x y ) | f ( y ) f ( x ) | d y .
    Since ζ α ( x y )   is supported in { | x y | α }   , and ζ α   is a probability density, we deduce
    | μ * ζ α ( x ) f ( x ) | δ ( α ) . (2.30)
    Now, if x   is some point in R d   , then, thanks to the Kantorovich-Rubinstein dual formulation  1.2 ,
    | f ^ ζ , α μ * ζ α | ( x ) = | R d ζ α ( x y ) d [ μ ^ μ ] ( y ) |
    ζ α ( x ) L i p W 1 ( μ ^ , μ )
    = ζ L i p α d + 1 W 1 ( μ ^ , μ ) .
    To conclude the proof of  2.28 , it suffices to combine this bound with  2.30 .
    Now, let L : = max ( f L i p , ζ L i p )   , and α : = ɛ / ( 2 L )   . The bound  2.28 turns into f ^ ζ , α f L L ( W 1 ( μ ^ , μ ) α d + 1 + α ) ( ( 2 L ) d + 1 L ɛ d + 1 ) W 1 ( μ ^ , μ ) + ɛ 2 .   In particular, P [ f ^ ζ , α f L > ɛ ] P [ W 1 ( μ ^ , μ ) > ɛ d + 2 ( 2 L ) d + 2 ] ,   which is estimate  2.29 .
Remark 2.2. Estimate  2.29 , combined with Theorem  1.1 or Theorem  1.5 , yields simple quantitative (non-asymptotic) deviation inequalities for empirical distribution functions in supremum norm. We refer to Gao [15for a recent study of deviation inequalities for empirical distribution functions, both in moderate and large deviations regimes.

3 PDE estimates

Now we start the study of our model system for interacting particles. The first step towards our proof of Theorem  1.7 consists in deriving suitable a priori estimates on the solution to the nonlinear limit partial differential equation  1.8 . In this section, we recall some estimates which have already been established by various authors, and derive some new ones. All estimates will be effective.

3.1 Notation

In the sequel, μ 0   is a probability measure, taken as an initial datum for equation  1.8 , and various regularity assumptions will later be made on μ 0   . Assumptions  1.9 will always be made on V   and W   , even if they are not recalled explicitly; we shall only mention additional regularity assumptions, when used in our estimates. Moreover, we shall write
Γ : = max ( | γ | , | γ | ) . (3.1)
The notation μ t   will always stand for the solution (unique under our assumptions) of  1.8 .
We also write e ( t ) : = R d | x | 2 d μ t ( x )   for the (kinetic) energy associated with μ t   , and M α ( t ) : = R d e α | x | 2 d μ t ( x )   for the square exponential moment of order α   .
The scalar product between two vectors v , w R d   will be denoted by v w   . The symbols C   and K   will often be used to denote various positive constants; in general what will matter is an upper bound on constants denoted C   , and a lower bound on constants denoted K   .
The space C k   is the space of k   times differentiable continuous functions.

3.2 Decay at infinity

In this subsection, we prove the propagation of strong decay estimates at infinity:
Proposition 3.1. With the conventions of Subsection  3.1 , let η ¯   be γ   if γ < 0   , and an arbitrary negative number otherwise. Let a : = 2 ( β + η ¯ ) , G ¯ : = 2 d + | V ( 0 ) | 2 2 | η ¯ | .   Then (i) e ( t ) e a t [ e ( 0 ) + G ¯ e a t 1 a ]   ; (ii) For any α 0 > 0   there is a continuous positive function α ( t )   such that α ( 0 ) = α 0   and
M α 0 ( 0 ) < + M α ( t ) ( t ) < + . (3.2)
(iii) Moreover, in the “uniformly convex case” when β > 0   and β + γ > 0   , then there is α > 0   such that sup t 0 e ( t ) < + , sup t 0 M α ( t ) < + .  
Corollary 3.2. If μ 0   admits a finite square exponential moment, then μ t   satisfies T 1 ( λ t )   , for some function λ t > 0   , bounded below on any interval [ 0 , T ]   ( T <   ).
  • Proof. We start with (i). For simplicity we shall pretend that μ t   is a smoothly differentiable function of t   , with rapid decay, so that all computations based on integrating equation  1.8 against | x | 2   are justified. These assumptions are not a priori satisfied, but the resulting bounds can easily be rigorously justified with standard but tedious approximation arguments.
    With that in mind, we compute e ( t ) = 2 d 2 R d ( x V ( x ) + x W * μ t ( x ) ) d μ t ( x )   with 2 R d x V ( x ) d μ t ( x ) 2 β R d | x | 2 d μ t ( x ) 2 V ( 0 ) R d x d μ t ( x ) .   Since W   is an odd function, we have
    2 R d x W * μ t ( x ) d μ t ( x ) = 2 x W ( x y ) d μ t ( y ) d μ t ( x )
    = ( x y ) W ( x y ) d μ t ( y ) d μ t ( x )
    γ | x y | 2 d μ t ( y ) d μ t ( x )
    = 2 γ [ | x | 2 d μ t ( x ) | x d μ t ( x ) | 2 ] .
    If γ < 0   , then
    e ( t ) 2 d 2 ( γ + β ) e ( t ) + 2 γ | x d μ t ( x ) + V ( 0 ) 2 | γ | | 2 + | V ( 0 ) | 2 2 | γ |
    2 d 2 ( γ + β ) e ( t ) + | V ( 0 ) | 2 2 | γ | ,
    and if γ 0   , then for any η ¯ < 0  
    e ( t ) 2 d 2 ( η ¯ + β ) e ( t ) 2 γ ( | x | 2 d μ t ( x ) | x d μ t ( x ) | 2 ) + | V ( 0 ) | 2 2 | η ¯ |
    2 d 2 ( η ¯ + β ) e ( t ) + | V ( 0 ) | 2 2 | η ¯ |
    This leads to e ( t ) G ¯ a e ( t ) ,   and the conclusion follows easily by Gronwall's lemma.
    We now turn to (ii). Let α   be some arbitrary nonnegative C 1   function on R +   . By using the equation  1.8 , we compute d d t e α ( t ) | x | 2 d μ t ( x ) = [ 2 d α + 4 α 2 | x | 2 2 α x V ( x ) 2 α x W * μ t ( x ) + α ( t ) | x | 2 ] e α | x | 2 d μ t ( x ) .   Since D 2 V ( x ) β I   for all x R d   , we can write
    x V ( x ) x V ( 0 ) β | x | 2 β | x | 2 + | V ( 0 ) | | x | ( δ β ) | x | 2 + C 4 δ (3.3)
    for any δ > 0   and x R d   .
    Next, our assumptions on W   imply W ( 0 ) = 0   , and γ I D 2 W ( x ) γ I   , so x W ( x ) γ | x | 2 and | x D 2 W ( z ) y | Γ | x | | y |   for all x , y , z R d   , with Γ   defined by  3.1 . Hence, by Taylor's formula,
    x W * μ t ( x ) = R d x W ( x y ) d μ t ( y )
    = x W ( x ) + R d 0 1 x D 2 W ( x s y ) y d μ t ( y ) d s
    γ | x | 2 + Γ | x | R d | y | d μ t ( y )
    ( γ + Γ η ) | x | 2 + Γ 4 η e ( t ) , (3.4)
    where η   is any positive number.
    From  3.3 and  3.4 we obtain
    d d t ( M α ( t ) ( t ) ) R d [ A ( t ) + B ( t ) | x | 2 ] e α ( t ) | x | 2 d μ t ( x ) (3.5)
    where A ( t ) = C α ( t ) ( 1 + e ( t ) ) , B ( t ) = α ( t ) + 4 α ( t ) 2 + b α ( t ) ,   and C   is a finite constant, while b = 2 ( γ + β δ Γ η )   .
    We now choose α ( t )   in such a way that B ( t ) 0   , i.e.
    α ( t ) + 4 α 2 ( t ) + b α ( t ) = 0 , α ( 0 ) = α 0 .   This integrates to α ( t ) = e b t ( 1 α 0 + 4 1 e b t b ) 1 ( = ( 1 α 0 + 4 t ) 1 if b = 0 ) .   Obviously α   is a continuous positive function, and our estimates imply d d t ( M α ( t ) ( t ) ) A ( t ) M α ( t ) ( t ) .   We conclude by using Gronwall's lemma that M α ( t ) ( t ) exp ( 0 t A ( s ) d s ) M α 0 ( 0 ) .   Next, the estimate (iii) for e ( t )   is an easy consequence of our explicit estimates when β > 0 , β + γ > 0   (in the case when γ 0   and β > 0   , we choose η ¯ ( 0 , β )   ).
    As for the estimate about M α ( t )   , it will result from a slightly more precise computation.
    From  3.5 , we have
    d d t R d e α | x | 2 d μ t ( x ) R d [ A ( t ) + B | x | 2 ] e α | x | 2 d μ t ( x ) (3.6)
    where A   is bounded on R +   by some constant a   , and B = 2 α [ 2 α ( β + γ δ Γ η ) ] .   Since β + γ > 0   , for any fixed α   in ( 0 , β + γ 2 )   we can choose δ , η > 0   such that B < 0   .
    Letting R 2 = a / B   and G = B > 0   , equation  3.6 becomes
    d d t R d e α | x | 2 d μ t ( x ) G R d ( R 2 | x | 2 ) e α | x | 2 d μ t ( x ) . (3.7)
    Let p > 1   . The formula
    | x | > p R ( R 2 | x | 2 ) e α | x | 2 d μ t ( x ) R 2 ( 1 p 2 ) | x | > p R e α | x | 2 d μ t ( x )
    = R 2 ( 1 p 2 ) [ R d e α | x | 2 d μ t ( x ) | x | p R e α | x | 2 d μ t ( x ) ]
    leads to R d ( R 2 | x | 2 ) e α | x | 2 d μ t ( x ) | x | p R ( R 2 p 2 | x | 2 ) e α | x | 2 d μ t ( x ) + R 2 ( 1 p 2 ) M α .   by decomposing the integral on the sets { | x | p R }   and { | x | > p R }   . From  3.7 we deduce ( M α ) ( t ) + ω 1 M α ( t ) ω 2   where ω 1   and ω 2   are positive constants. It follows that M α ( t )   remains bounded on R +   if M α ( 0 ) < +   , and this concludes the argument.

3.3 Time-regularity

Now we study the time-regularity of μ t   .
Proposition 3.3. With the conventions of Subsection  3.1 , for any T < +   there exists a constant C ( T )   such that
s , t [ 0 , T ] , W 1 ( μ t , μ s ) C ( T ) | t s | 1 / 2 . (3.8)
Remark 3.4. The exponent 1 / 2   is natural in small time if no regularity assumption is made on μ 0   ; it can be improved if t , s   are assumed to be bounded below by some t 0 > 0   . Also, in view of the results of convergence to equilibrium recalled later on, the constant C ( T )   might be chosen independent of T   if β > 0 , β + 2 γ > 0   .
Remark 3.5. A stochastic proof of  3.8 is possible, via the study of continuity estimates for Y t   , which in any case will be useful later on. But here we prefer to present an analytical proof, to stress the fact that estimates in this section are purely analytical statements.
  • Proof. Let L   be the linear operator Δ ( V + ( W * μ t ) )   , and let e t L   be the associated semigroup: from our assumptions and estimates it follows that it is well-defined, at least for initial data which admit a finite square exponential moment. Of course μ t = e t L μ 0   . It follows that
    W 2 ( μ s , μ t ) = W 2 ( μ s , e ( t s ) L μ s ) = W 2 ( R d δ y d μ s ( y ) , R d e ( t s ) L δ y d μ s ( y ) )
    R d W 2 ( δ y , e ( t s ) L δ y ) d μ s ( y ) .
    Our goal is to bound this by O ( t s )   . In view of Proposition  3.1 , it is sufficient to prove that for all a > 0   , W 2 2 ( δ y , e ( t s ) L δ y ) = O ( t s ) O ( e a | y | 2 ) .   This estimate is rather easy, since the left-hand side is just the variance of the solution of a linear diffusion equation, starting with a Dirac mass at y   as initial datum. Without loss of generality, we assume s = 0   , and write μ ~ t : = e t L δ y   . For simplicity we write the computations in a sketchy way, but they are not hard to justify.
    Since the initial datum is δ y   , its square exponential moment M ~ α   of order α   is e α | y | 2   .
    With an argument similar to the proof of Proposition  3.1 (ii), one can show that 0 t T e α | x | 2 d μ ~ t ( x ) C ( T ) ( 1 + M ~ α ) C ( T ) e α | y | 2 .   Now, since | V | ( x ) = O ( e a | x | 2 )   , a < α   , | W * μ t |   grows at most polynomially, and μ ~ t   admits a square exponential moment of order α   , we easily obtain d d t x d μ ~ t = ( V + W * μ t ) d μ ~ t = O ( e a | x | 2 ) d μ ~ t = O ( e α | y | 2 ) ;   d d t | x | 2 2 d μ ~ t = d x ( V + W * μ t ) d μ ~ t = O ( e α | y | 2 ) .   From these estimates we deduce that the time-derivative of the variance V ( μ ~ t ) : = | x | 2 d μ ~ t ( x d μ ~ t ) 2   is bounded by O ( e b | y | 2 )   for any b > 0   . Since μ ~ 0   has zero variance, it follows that the variance of μ ~ t   is O ( t e b | y | 2 )   , which was our goal.

3.4 Regularity in phase space

Regularity estimates will be useful for Theorem  1.11 . Equation  1.8 is a (weakly nonlinear) parabolic equation, for which regularization effects can be studied by standard tools. Some limits to the strength of the regularization are imposed by the regularity of V   . So as not to be bothered by these nonessential considerations, we shall assume strong regularity conditions on V   here. Then in Appendix  B we shall prove the following estimates:
Proposition 3.6. With the conventions of Subsection  3.1 , assume in addition that V   has all its derivatives growing at most polynomially at infinity.
Then, for each k 0   and for all t 0 > 0   , T > t 0   there is a finite constant C ( t 0 , T )   , only depending on t 0 , T , k   and a square exponential moment of the initial measure μ 0   , such that the density f t   of μ t   is of class C k   , with sup t 0 t T f t C k C ( t 0 , T ) .   If moreover β > 0   , β + γ > 0   , then C ( t 0 , T )   can be chosen to be independent of T   for any fixed t 0   .
Remark 3.7. For regular initial data and under some adequate assumptions on V   and W   , some regularity estimates on f t / f   , where f   is the limit density in large time, are established in [10,Lemma 6.7. These estimates allow a much more precise uniform decay, but are limited to just one derivative. Here there will be no need for them.

3.5 Asymptotic behavior

In the “uniformly convex” case when β + γ > 0   , the measure μ t   converges to a definite limit μ   as t   . This was investigated in [19, 9, 10. The following statement is a simple variant of [9,Theorems 2.1and5.1.
Proposition 3.8. With the conventions of Subsection  3.1 , assuming that β > 0 , β + 2 γ > 0   , there exists a probability measure μ   such that W 2 ( μ t , μ ) C e λ t , λ > 0 .   Here the constants C   and λ   only depend on the initial datum μ 0   .

4 The limit empirical measure

Consider the random time-dependent measure
ν ^ t N : = 1 N i = 1 N δ Y t i , (4.1)
where ( Y t i ) t 0   , 1 i N   , are N   independent processes solving the same stochastic differential equation d Y t i = 2 d B t i [ ( V + W * μ t ) ] ( Y t i ) d t ,   and such that the law of Y 0 i   is μ 0   . As we already mentioned, for each t   and i   , Y t i   is distributed according to the law μ t   . We call ν ^ t N   the “limit empirical measure” because it is expected to be a rather accurate description, in some well-chosen sense, of the empirical measure μ ^ t N   as N   .
Our estimates on μ t   , and the fact that ν ^ t N   is the empirical measure for independent processes, are sufficient to imply good properties of concentration of ν ^ t N   around its mean μ t   , as N   , for each t   . But later on we shall use some estimates about the time-dependent measure (even to obtain a result of concentration for μ ^ t N   with fixed t   ). To get such results, we shall study the time-regularity of ν ^ t N   . Our final goal in this section is the following
Proposition 4.1. With the conventions of Subsection  3.1 , for any T 0   there are constants C = C ( T )   and a = a ( T ) > 0   such that the limit empirical measure  4.1 satisfies Δ [ 0 , T ] , ɛ > 0 , P [ sup t 0 s , t t 0 + Δ W 1 ( ν ^ s N , ν ^ t N ) > ɛ ] exp ( N ( a ɛ 2 C Δ ) ) .  
To prove Proposition  4.1 , we shall use a bit of classical stochastic calculus tools.

4.1 SDE estimates

In this subsection we establish the following estimates of time regularity for the stochastic process Y t   : For all T > 0   , there exist positive constants a   and C   such that, for all s , t , t 0 , Δ [ 0 , T ]   ,
(i) E | Y t Y s | 2 C | t s |
(ii) E | Y t Y s | 4 C | t s | 2
(iii) E [ sup t 0 s t t 0 + Δ exp ( a | Y t Y s | 2 ) ] 1 + C Δ .
  • Proof. We start with (i). We use Itô's formula to write a stochastic equation on the process ( | Y t Y s | 2 ) t s   :
    | Y t Y s | 2 = M s , t + 2 d ( t s ) 2 s t ( V ( Y u ) + W * μ u ( Y u ) ) ( Y u Y s ) d u ,   where M s , t   , viewed as a process depending on t   , is a martingale with zero expectation.
    Hence
    E | Y t Y s | 2 = 2 d ( t s ) 2 s t E ( V ( Y u ) + W * μ u ( Y u ) ) ( Y u Y s ) d u . (4.2)
    On one hand
    E | ( V ( Y u ) + W * μ u ( Y u ) ) ( Y u Y s ) | 2 4 ( E | V ( Y u ) | 2 + E | W * μ u ( Y u ) | 2 ) ( E | Y u | 2 + E | Y s | 2 ) . (4.3)
    On the other hand, by Proposition  3.1 , μ u   has a finite square exponential moment, uniformly bounded for u [ 0 , T ]   . More precisely, there exist α > 0   and M < +   such that e α | x | 2 d μ u ( x ) M   for all u T   . Since by assumption | W ( z ) | L | z |   and | V ( x ) | = O ( e α | x | 2 )   , we deduce sup s u T ( E ( V ( Y u ) + W * μ u ( Y u ) ) ( Y u Y s ) ) < + .   In view of  4.2 , it follows that there exists a constant C = C ( T )   such that E | Y t Y s | 2 ( 2 d + C ) ( t s ) .   This concludes the proof of (i).
    To establish (ii), we perform a very similar computation. For given s   , let Z s , t : = ( | Y t Y s | 4 ) t s   . Another application of Itô's formula yields
    E Z s , t = 4 ( 2 + d ) s t E | Y u Y s | 2 d u 4 s t E | Y u Y s | 2 ( Y u Y s ) ( V ( Y u ) + W * μ u ( Y u ) ) d u .  
    On one hand, from (i), s t E | Y u Y s | 2 d u 2 C s t ( u s ) d s = C ( t s ) 2 .   On the other hand
    s t E | Y u Y s | 2 ( Y u Y s ) ( V ( Y u ) + W * μ u ( Y u ) ) d u ( s t E Z s , u d u ) 3 / 4 ( s t E | V ( Y u ) + W * μ u ( Y u ) | 4 d u ) 1 / 4 (4.4)
    by Hölder's inequality. But again, since the measures μ t   admit a bounded square exponential moment, E | V ( Y u ) + W * μ u ( Y u ) | 4   is bounded on [ 0 , T ]   . We conclude that
    E Z s , t C ( ( t s ) 2 + ( t s ) 1 / 4 ( s t E Z s , u d u ) 3 / 4 ) . (4.5)
    Then, with C   standing again for various constants which are independent of s   and t   , E Z s , u C ( E | Y u | 4 + E | Y s | 4 ) 2 C sup 0 t T | x | 4 d μ u ( x ) C ;   so, from  4.5 , E Z s , t C ( ( t s ) 2 + ( t s ) 1 / 4 ( t s ) 3 / 4 ) C ( t s ) ,   and by  4.5 again we successively obtain E Z s , t C ( t s ) 7 / 4 ,   and finally E Z s , t C ( t s ) 2 .   This concludes the proof of (ii).
    We finally turn to the proof of (iii). Without real loss of generality, we set t 0 = 0   . We shall proceed as in the proof of Proposition  3.1 , and prove the existence of some constant C   and some continuous positive function a   on R +   such that
    E ( sup 0 s t Δ T exp ( a ( t ) | Y t Y s | 2 ) ) 1 + C Δ . (4.6)
    Let a ( t )   be a smooth function, and Z s , t : = e a ( t ) | Y t Y s | 2 .   By Itô's formula,
    Z s , t = 1 + M s , t + s t [ 2 a ( u ) ( d + 2 a | Y u Y s | 2 ( V + W * μ u ) ( Y u ) ( Y u Y s ) ) + a ( u ) | Y u Y s | 2 ] Z s , u d u  
    where M s , t : = s t a ( u ) ( Y u Y s ) Z u d B u .   For each s   , M s , t   , viewed as a stochastic process in t   , is a martingale.
    By Young's inequality, for any b > 0   , 2 ( V + W * μ u ) ( Y u ) ( Y u Y s ) b | Y u Y s | 2 + 1 b | V + W * μ u | 2 ( Y u ) .   So, by letting A u : = a ( u ) [ 2 d + 1 b | V ( Y u ) + W * μ u ( Y u ) | 2 ]   and B ( u ) : = a ( u ) + 4 a 2 ( u ) + b a ( u )   we obtain Z s , t 1 + M s , t + s t [ A u + B ( u ) | Y u Y s | 2 ] Z s , u d u .   We choose a   in such a way that the function B   is identically zero, that is a ( u ) = e b u ( 1 a ( 0 ) + 4 1 e b u b ) 1 ,   where a ( 0 )   is to be fixed later. Then Z s , t 1 + M s , t + s t A u Z s , u d u   from which it is clear that
    E sup s t Δ Z s , t 1 + E sup s t Δ M s , t + s Δ E A u Z s , u d u . (4.7)
    By Cauchy-Schwarz and Doob's inequalities,
    ( E sup s t Δ M s , t ) 2 E | sup s t Δ M s , t | 2 2 sup s t Δ E | M s , t | 2 . (4.8)
    Also, by Itô's formula and the Cauchy-Schwarz inequality again,
    E | M s , t | 2 = s t a ( u ) 2 E | Y u Y s | 2 Z s , u 2 d u 1 2 s t a ( u ) 2 ( E | Y u Y s | 4 ) 1 / 2 ( E Z s , u 4 ) 1 / 2 d u . (4.9)
    In view of (ii), there exists a constant C   such that
    E | Y u Y s | 4 C ( u s ) 2 . (4.10)
    Furthermore,
    E Z s , u 4 = E exp 4 a ( u ) | Y u Y s | 2 ( E exp 16 a ( u ) | Y u | 2 ) 1 / 2 ( E exp 16 a ( u ) | Y s | 2 ) 1 / 2 . (4.11)
    Recall from Proposition  3.1 that there exist constants M   and α > 0   such that sup s u Δ e α | y | 2 d μ u ( y ) M .   If we choose a ( 0 ) α / 16   , the decreasing property of a   will ensure that a ( u ) α / 16   for all u [ 0 , Δ ]   , and E exp 16 a ( u ) | Y u | 2 ( = e 16 a ( u ) | y | 2 d μ u ( y ) ) M .   Then, from  4.11 , sup s u Δ E Z s , u 4 M .   Now, from  4.9 and  4.10 we deduce sup s t Δ E | M s , t | 2 C ( t s ) 2 .   Combining this with  4.8 , we conclude that E sup s t Δ M s , t C Δ .   In the same way, we can prove that E ( A t Z s , t )   is bounded for t [ s , Δ ]   by bounding E Z s , t 2   and E A t 2   . This concludes the proof of  4.6 , and therefore of (iii) above.

4.2 Time-regularity of the limit empirical measure

We are now ready to prove Proposition  4.1 .
On one hand W 1 ( ν ^ s N , ν ^ t N ) 1 N i = 1 N | Y t i Y s i | ,   so
P [ sup 0 s t Δ W 1 ( ν ^ s N , ν ^ t N ) > ɛ ] P [ 1 N i = 1 N V i > ɛ ] (4.12)
where V i : = sup 0 s t Δ | Y t i Y s i | .   By Chebyshev's exponential inequality and the independence of the ( Y t i Y s i )   , P [ 1 N i = 1 N V i > ɛ ] exp ( N sup ζ 0 [ ɛ ζ log E exp ( ζ V 1 ) ] ) .   But, for any given ζ   and ω 0   , E exp ( ζ V 1 ) E exp ( ζ ( ω 2 + ( V 1 ) 2 2 ω ) ) exp ζ ω 2 E exp ζ 2 ω ( V 1 ) 2 .   Let ω = ζ 2 a   , so that ζ 2 ω = a   . Then, from estimate (iii) in Subsection  4.1 , E exp ζ 2 ω ( V 1 ) 2 1 + C Δ ,   uniformly in s   and Δ   . Hence, for any ζ > 0   , E exp ( ζ V 1 ) E exp ζ 2 4 a ( 1 + C Δ ) .   Consequently,
P [ 1 N i = 1 N V i > ɛ ] exp ( N sup ζ 0 [ ɛ ζ ζ 2 4 log ( 1 + C Δ ) ] )
= exp ( N [ a ɛ 2 log ( 1 + C Δ ) ] )
exp ( N [ a ɛ 2 C Δ ] ) .
The proof of Proposition  4.1 follows by  4.12 .

5 Coupling

We now (as is classical) reduce the proof of convergence for μ ^ t N   to a proof of convergence for the empirical measure ν ^ t N   constructed on the auxiliary independent system ( Y t i )   . The final goal of this section is the following estimate.
Proposition 5.1. With the conventions of Subsection  3.1 , W 1 ( μ ^ t N , μ t ) Γ 0 t e α ( t s ) W 1 ( ν ^ s N , μ s ) d s + W 1 ( ν ^ t N , μ t ) ,   where Γ   is defined by  3.1 , and α : = β + 2 min ( γ , 0 )   .
  • Proof. For the sake of simplicity we give a slightly sketchy proof. We couple the stochastic systems ( X t i )   and ( Y t i )   by assuming that (i) X 0 i = Y 0 i   and (ii) both systems are driven by the same Brownian processes B t i   . In particular, for each i { 1 , , N }   , the process X t i Y t i   satisfies the equation
    d ( X t i Y t i ) = ( V ( X t i ) V ( Y t i ) ) d t ( W * μ ^ t N ( X t i ) W * μ t ( Y t i ) ) d t . (5.1)
    From  5.1 we deduce
    1 2 d d t | X t i Y t i | 2 = ( V ( X t i ) V ( Y t i ) ) ( X t i Y t i ) ( W * μ ^ t N ( X t i ) W * μ t ( Y t i ) ) ( X t i Y t i ) . (5.2)
    Our convexity assumption on V   implies ( V ( X t i ) V ( Y t i ) ) ( X t i Y t i ) β | X t i Y t i | 2 ;   so the main issue consists in the treatment of the quantity W * μ ^ t N ( X t i ) W * μ t ( Y t i )   appearing in the right-hand side of  5.2 . There are (at least) two options here. The first one consists in writing
    W * μ ^ t N ( X t i ) W * μ t ( Y t i ) = ( W * μ ^ t N W * μ t ) ( X t i ) + ( W * μ t ( X t i ) W * μ t ( Y t i ) ) ; (5.3)
    while the second one consists in forcing the introduction of ν ^ t N   as follows:
    W * μ ^ t N ( X t i ) W * μ t ( Y t i ) = 1 N j = 1 N [ W ( X t i X t j ) W ( Y t i Y t j ) ] ( W * ν ^ t N W * μ t ) ( Y t i ) . (5.4)
    Both options are interesting and lead to slightly different computations. Since both lines of computations might be useful in other contexts, we shall sketch them one after the other. The second option leads to better bounds, but at the price of more complications (in particular, we shall need to sum over the index i   at an early stage).
    First option: We start as in  5.3 . In view of our assumption on D 2 W   , the Lipschitz norm of W ( X t i )   is bounded by Γ   . Therefore, by the Kantorovich-Rubinstein dual formulation  1.2 , | W * ( μ ^ t N μ t ) ( X t i ) | = | R d W ( X t i y ) d ( μ ^ t N μ t ) ( y ) | Γ W 1 ( μ ^ t N , μ t ) ,   and then our assumptions on V   and W   imply 1 2 d d t | X t i Y t i | 2 ( γ + β ) | X t i Y t i | 2 + Γ W 1 ( μ ^ t N , μ t ) | X t i Y t i | .   In other words, | X t i Y t i |   satisfies the differential inequality d d t | X t i Y t i | + ( β + γ ) | X t i Y t i | Γ W 1 ( μ ^ t N , μ t )   ( X t i   and Y t i   separately are not Lipschitz functions of t   , but their difference is). Hence, by Gronwall's lemma, | X t i Y t i | Γ 0 t e ( β + γ ) ( t s ) W 1 ( μ ^ s N , μ s ) d s .   Now we sum over i   ; by convexity of the distance W 1   and triangular inequality, we obtain
    W 1 ( μ ^ t N , ν ^ t N ) 1 N i = 1 N | X t i Y t i | Γ 0 t e ( β + γ ) ( t s ) W 1 ( μ ^ s N , μ s ) d s
    Γ 0 t e ( β + γ ) ( t s ) [ W 1 ( μ ^ s N , ν ^ s N ) + W 1 ( ν ^ s N , μ s ) ] d s .
    By using Gronwall's lemma again, we deduce W 1 ( μ ^ t N , ν ^ t N ) Γ 0 t e ( β + γ Γ ) ( t s ) W 1 ( ν ^ s N , μ s ) d s .   By applying the triangular inequality for W 1   , we conclude to the validity of Proposition  5.1 , only with α   replaced by the (a priori smaller) quantity β + γ Γ   .
    Second option: Now we start with  5.4 . This time we sum over i   right from the beginning:
    1 2 d d t i = 1 N | X t i Y t i | 2 = i = 1 N ( V ( X t i ) V ( Y t i ) ) ( X t i Y t i ) 1 N i , j = 1 N ( A t i j + B t i j )   where A t i j = ( W ( X t i X t j ) W ( Y t i Y t j ) ) ( X t i Y t i )   and B t i j = ( W ( Y t i Y t j ) W * μ t ( Y t i ) ) ( X t i Y t i ) .   Since W   is an odd function and D 2 W ( x ) γ I   for all x R d   , we have
    A t i j + A t j i = ( W ( X t i X t j ) W ( Y t i Y t j ) ) ( ( X t i X t j ) ( Y t i Y t j ) ) γ | ( X t i X t j ) ( Y t i Y t j ) | 2 ,  
    whence i , j = 1 N A t i j γ 2 i , j = 1 N | ( X t i X t j ) ( Y t i Y t j ) | 2 2 N γ i = 1 N | X t i Y t i | 2   where γ = min ( γ , 0 )   .
    Then j = 1 N B t i j = ( X t i Y t i ) ( W * ν ^ t N ( Y t i ) W * μ t ( Y t i ) ) .   Our assumption on D 2 W   implies that the Lipschitz norm of W ( Y t i )   is bounded by Γ   ; so, by the Kantorovich-Rubinstein dual formulation  1.2 , | W * ( ν ^ t N μ t ) ( Y t i ) | = | R d W ( Y t i y ) d ( ν ^ t N μ t ) ( y ) | Γ W 1 ( ν ^ t N , μ t ) .   Collecting all terms we finally obtain 1 2 d d t i = 1 N | X t i Y t i | 2 ( β + 2 γ ) i = 1 N | X t i Y t i | 2 + Γ i = 1 N | X t i Y t i | W 1 ( ν ^ t N , μ t ) .   Then, since i = 1 N | X t i Y t i | ( N i = 1 N | X t i Y t i | 2 ) 1 / 2   , the function y ( t ) : = ( 1 N i = 1 N | X t i Y t i | 2 ) 1 / 2   satisfies the differential inequality y ( t ) + ( β + 2 γ ) y ( t ) Γ W 1 ( ν ^ t N , μ t ) ,   so that ( 1 N i = 1 N | X t i Y t i | 2 ) 1 / 2 Γ 0 t e ( β + 2 γ ) ( t s ) W 1 ( ν ^ s N , μ s ) d s .   The conclusion follows by triangular inequality again since W 1 ( μ ^ t N , ν ^ t N ) W 2 ( μ ^ t N , ν ^ t N ) ( 1 N i = 1 N | X t i Y t i | 2 ) 1 / 2 .  
Remark 5.2. Not only does the “second option” in the proof lead to better bounds, it also provides an estimate of the distance between μ ^   and ν ^   in the W 2   distance, which is stronger than the W 1   distance. However, we do not take any advantage of this refinement.

6 Conclusion

In this section, we paste together all the estimates established in the previous sections, so as to prove Theorems  1.7 to  1.11 .

6.1 Concentration estimates

We start with the proof of Theorem  1.7 . By C   we shall denote various constants depending on T   , on our assumptions on V   and W   , and also on e α | x | 2 d μ 0 ( x )   , for some α > 0   .
From Proposition  5.1 , sup 0 t T W 1 ( μ ^ t N , μ t ) ( Γ e | α | T + 1 ) sup 0 t T W 1 ( ν ^ t N , μ t ) d s .   In particular, there is a constant C   such that
P [ sup 0 t T W 1 ( μ ^ t N , μ t ) > ɛ ] P [ sup 0 t T W 1 ( ν ^ t N , μ t ) > ɛ ~ ] , ɛ ~ = ɛ C . (6.1)
From Corollary  3.2 and Theorem  1.1 we know that sup 0 t T P [ W 1 ( ν ^ t N , μ t ) > ɛ ~ ] e K N ɛ ~ 2   for all t [ 0 , T ]   , N N 0 max ( ɛ ~ ( d + 2 ) , 1 )   ( d > d   ). The issue now is to “exchange” sup   and P   in this estimate. As we shall see, this is authorized by the continuity estimates on ν ^ t N   and μ t   .
Let Δ > 0   (to be fixed later on), and let M   be the integer part of T / Δ + 1   . We decompose the interval [ 0 , T ]   as [ 0 , T ] = [ 0 , Δ ] [ Δ , 2 Δ ] [ ( M 1 ) Δ , T ] M 1 h = 0 [ h Δ , ( h + 1 ) Δ ] .   Proposition  3.3 guarantees that, if Δ a ɛ ~ 2   for some a   small enough, then
h Δ t ( h + 1 ) Δ W 1 ( μ t , μ h Δ ) ɛ ~ 2 . (6.2)
Then, by triangular inequality and  6.2 , P [ sup 0 t T W 1 ( ν ^ t N , μ t ) > ɛ ~ ]   P [ sup h = 0 , . . . , M 1 sup h Δ t ( h + 1 ) Δ W 1 ( ν ^ t N , μ t ) > ɛ ~ ]  
P [ sup h = 0 , . . . , M 1 sup h Δ t ( h + 1 ) Δ W 1 ( ν ^ t N , ν h Δ N ) + sup h = 0 , . . . , M 1 W 1 ( ν ^ h Δ N , μ h Δ ) + sup h = 0 , . . . , M 1 sup h Δ t ( h + 1 ) Δ W 1 ( μ h Δ , μ t ) > ɛ ~ ) ]  
P [ sup h = 0 , . . . , M 1 sup h Δ t ( h + 1 ) Δ W 1 ( ν ^ t N , ν ^ h Δ N ) + sup h = 0 , . . . , M 1 W 1 ( ν ^ h Δ N , μ h Δ ) > ɛ ~ 2 ] ,   which can be bounded by P [ sup h = 0 , . . . , M 1 sup h Δ t ( h + 1 ) Δ W 1 ( ν ^ t N , ν ^ h Δ N ) > ɛ ~ 4 ] + P [ sup h = 0 , . . . , M 1 W 1 ( ν ^ h Δ N , μ h Δ ) > ɛ ~ 4 ] .   By Corollary  3.2 and Theorem  1.1 , there exist some constants C   and N 0   such that P [ W 1 ( ν ^ h Δ N , μ h Δ ) ɛ ~ 4 ] exp ( C N ɛ ~ 2 )   for all h = 0 , . . . , M 1   , and N N 0 max ( ɛ ~ ( d + 2 ) , 1 )   . Hence
P [ sup h = 0 , . . . , M 1 W 1 ( ν ^ h Δ N , μ h Δ ) > ɛ ~ 4 ] h = 0 M 1 P [ W 1 ( ν ^ h Δ N , μ h Δ ) > ɛ ~ 4 ] M exp ( C N ɛ ~ 2 ) . (6.3)
On the other hand, from Proposition  4.1 we deduce P [ sup h Δ t ( h + 1 ) Δ W 1 ( ν ^ t N , ν ^ h Δ N ) > ɛ ~ 4 ] exp ( N ( a 4 ɛ ~ 2 C Δ ) )   for all h = 0 , . . . , M 1   and ɛ ~ > 0   , so
P [ sup h = 0 , . . . , M 1 sup h Δ t ( h + 1 ) Δ W 1 ( ν ^ t N , ν ^ h Δ N ) > ɛ ~ 4 ] M exp ( N ( a 4 ɛ ~ 2 C Δ ) ) . (6.4)
We can assume that Δ a 8 C ɛ ~ 2   , and M C T / ɛ ~ 2 + 1   ; then we can bound the right-hand side of  6.4 by
M exp ( a 8 N ɛ ~ 2 ) C ( 1 + T ɛ ~ 2 ) exp ( a 8 N ɛ ~ 2 ) (6.5)
From  6.3 and  6.5 we deduce that, for Δ   small enough (depending on ɛ   !),
P [ sup 0 t T W 1 ( ν ^ t N , μ t ) > ɛ ~ ] 2 C ( 1 + T ɛ ~ 2 ) exp ( K N ɛ ~ 2 ) (6.6)
for N N 0 max ( ɛ ~ ( d + 2 ) , 1 )   . So we deduce from  6.6 that P [ sup 0 t T W 1 ( ν ^ t N , μ t ) > ɛ ~ ] exp ( log ( C ( T ɛ ~ 2 + 1 ) ) K N ɛ ~ 2 ) ,   where again C , K   stand for various positive constants, and N max ( N 0 ɛ ( d + 2 ) , 1 )   . This concludes the proof of Theorem  1.7 .

6.2 Uniform in time estimates

Now, we shall focus on the case when β > 0 , β + 2 γ > 0   is positive, and derive Theorem  1.9 by a slightly refined estimate.
Let us start again from the bound W 1 ( μ ^ t N , μ t ) Γ 0 t e α ( t s ) W 1 ( ν ^ s N , μ s ) d s + W 1 ( ν ^ t N , μ t )   where α : = β + 2 min ( γ , 0 )   is positive. Let Δ > 0   (to be fixed later on), and k   be the integer part of t / Δ   . If W 1 ( μ ^ t N , μ t )   is larger than ɛ   , then { either W 1 ( ν ^ t N , μ t ) ɛ 2 or j { 0 , , k } ; j Δ ( j + 1 ) Δ e α ( t s ) W 1 ( ν ^ s N , μ s ) d s ɛ 2 k + 2 j Γ .   Indeed, ( ɛ / 2 ) + j k ( ɛ / 2 k + 2 j ) ɛ   . As a consequence, { either W 1 ( ν ^ t N , μ t ) > ɛ 2 or j { 0 , , k } ; sup j Δ s ( j + 1 ) Δ W 1 ( ν ^ s N , μ s ) > ɛ α e α [ t ( j + 1 ) Δ ] 2 k + 2 j Γ .   Since, for t [ j Δ , ( j + 1 ) Δ ]   , e α [ t ( j + 1 ) Δ ] 2 k + 2 j e α ( k j 1 ) Δ 2 k j + 2 = ( 1 4 e α Δ ) ( e α Δ 2 ) k j ,   we conclude to the existence of a constant C   such that
P [ W 1 ( μ ^ t N , μ t ) > ɛ ] P [ W 1 ( ν ^ t N , μ t ) > ɛ 2 ] + j = 0 k P [ sup j Δ s ( j + 1 ) Δ W 1 ( ν ^ s N , μ s ) > C ɛ ( e α Δ 2 ) k j ] . (6.7)
We already know that the first term in the right-hand side in  6.7 is bounded by e λ N ɛ 2   for some constant λ > 0   , and so we focus on the other terms.
In the proof of Theorem  1.7 , we have established that there are constant C   and λ   , depending on Δ   and on bounds on square exponential moments for μ 0   , such that
P [ sup 0 s Δ W 1 ( ν ^ s N , μ s ) > δ ] C ( 1 + Δ δ 2 ) e λ N δ 2 . (6.8)
Proposition  3.1 guarantees that these square exponential bounds also hold true for μ t   , uniformly in t   . Thus we can apply  6.8 with μ j Δ   taken as initial datum, and get
P [ sup j Δ s ( j + 1 ) Δ W 1 ( ν ^ s N , μ s ) > δ ] C e λ N δ 2 , (6.9)
as soon as N N 0 max ( δ ( d + 2 ) , 1 )   .
We now use  6.9 to bound the sum appearing in the right-hand side of  6.7 . Choose Δ   large enough that θ : = e α Δ 2 > 1 .   Applying  6.9 with δ   replaced by C θ j k ɛ   , we can bound the sum in the right-hand side of  6.7 by C j = 0 k exp ( K θ 2 ( k j ) N ɛ 2 )   for N N 0 max ( ɛ ( d + 2 ) , 1 )   , where C   , K   and N 0   are again positive constants. Since again θ   is larger than 1, there is a constant a > 0   such that θ 2 ( k j ) a ( k j )   , so the sum above is bounded by C ( e K N ɛ 2 + = 1 e K N ɛ 2 ) C ( e K N ɛ 2 + e K N ɛ 2 1 e K N ɛ 2 ) .   If N 0   is large enough, our assumption N N 0 max ( ɛ ( d + 2 ) , 1 )   implies that e K N ɛ 2   is always less than 1 / 2   , so that the above sum can be bounded by just C e K N ɛ 2   . This concludes the proof of the first point of Theorem  1.9 .
The second point is proved by writing
W 1 ( μ ^ t N , μ ) W 1 ( μ ^ t N , μ t ) + W 1 ( μ t , μ )
W 1 ( μ ^ t N , μ t ) + C e λ t
successively by the triangular inequality for Wasserstein distance and use of Proposition  3.8 . Then the result follows from the uniform estimate obtained above.

6.3 Data reconstruction

We finally consider Theorem  1.11 . Proposition  3.6 ensures that, as t   , f t   is uniformly bounded in C k   , where k   is arbitrarily large. Since f t   converges to f   as t   , we deduce that f   is Lipschitz. Then Theorem  1.9 and Proposition  2.1 together imply Theorem  1.11 .

A Metric entropy of a probability space

We now prove the covering result used in Section  2.1 , as a particular case of a more general estimate. Let E   be a Polish space, we look for an upper bound on the number N p ( E , δ ) : = m ( P ( E ) , δ )   of balls of radius δ   in Wasserstein distance W p   needed to cover the space P ( E )   of probability measures on E   . We use the same strategy as in [12,Exercise 6.2.19, where the Lévy distance is used instead of the Wasserstein distance.
Theorem A.1. Let ( E , d )   be a Polish space with finite diameter D   . For any r > 0   , define N ( E , r )   as the minimal number of balls needed to cover E   by balls of radius r   . Then there exists a numerical constant C   such that for all p 1   and δ ( 0 , D )   , the space P ( E )   can be covered by N p ( E , δ )   balls of radius δ   in W p   distance, with
N p ( E , δ ) ( C D δ ) p N ( E , δ 2 ) . (A.1)
Remark A.2. The W p   distance between any two probability measures on E   is at most D   , so, for all δ D   , we have the trivial estimate N p ( E , δ ) = 1   .
  • Proof. Let r > 0   , and let { x j } 1 j N ( E , r )   be such that E   is covered by the balls B ( x j , r )   with centers x j E   and radius r   . For simplicity we shall write N = N ( E , r )   .
    In a first step we prove that for any μ P ( E )   there exist nonnegative real numbers ( β j ) 1 j N   , with j = 1 N β j = 1   , such that W p ( μ , μ ~ ) r , μ ~ : = j = 1 N β j δ x j .   For this we first replace the balls B ( x j , r )   's by the sets B ~ j   's defined by j , B ~ j = B ( x j , r ) \ k j 1 B ( x k , r ) ,   so that E   is partitioned into the B ~ j   's. Next define β j = μ [ B ~ j ] .   It is easy to check that the required properties are fulfilled. Indeed, we may transport μ   onto μ ~ = j = 1 N β j δ x j   by sending all x   's in B ~ j   onto x j   , for each j = 1 , . . . , N   : the cost of this transport is bounded by j = 1 N r p μ ( B ~ j ) = r p   . In the second step we introduce an integer K   (whose value will be made more precise later on), and consider the set C K : = { j = 1 N α j δ x j ; ( α j ) 1 j N A K } P ( E ) ,   where A K   is the set of all N   -tuples ( α j ) 1 j N   , such that each α j   is of the form k j / K   , k j N   , and j = 1 N α j = 1   .
    Given a probability measure μ ~ = i = 1 N β i δ x i   (where ( β i ) i   does not necessarily belong to A K   ), there exists μ   in C K   such that
    W p ( μ , μ ~ ) D ( N K ) 1 / p . (A.2)
    To prove  A.2 , we define n j   as the integer part [ K β j ]   of K β j   and J   as the first integer such that j = 1 J ( n j + 1 ) + j = J + 1 N n j = K .   Since j = 1 N β j = 1   , it is clear that J N   . Then we define a measure μ C K   by μ = j = 1 N α j δ x j   , where α j = { n j + 1 K for j = 1 , . . . , J n j K for j = J + 1 , . . . , N .   Let us bound the distance between μ   and μ   . For that we gradually define a transport plan between μ ~   and μ   in the following way: first of all, at each point x i   , the mass n i / K   stays in place. Then, the remaining masses β i n i / K   are redistributed as follows: all the remaining mass at x 1 , , x   is brought to x 1   , together with possibly a bit of mass at x + 1   , until a total mass 1 / K   has been added at location x 1   (for   large enough). If J 2   , then we again bring mass from x + 1 ,   , until another mass 1 / K   has been added at x 2   . We carry on until all the mass at x J   has been used, thus building a transport plan ( π i j ) 1 i , j N   which sends μ ~   onto μ   , in such a way that π i i n i K   for all i   . Hence, j i π i j β i π i i = β i n i K 1 K ,   and this plan yields an upper bound on the Wasserstein distance:
    W p p ( μ ~ , μ ) i , j = 1 N d ( x i , x j ) p π i j = i = 1 N j i d ( x i , x j ) p π i j N D p K .   To summarize the first two steps: for any μ   in P ( E )   there exists μ C K   such that W p ( μ , μ ) r + D ( N K ) 1 / p .   In other words, the family ( B ( μ , r + D ( N / K ) 1 / p ) ) μ C K   covers P ( E )   .
    In the third step we choose some suitable K   and r   for a given δ   .
    We first choose K   in such a way that r   and D ( N / K ) 1 / p   have the same order of magnitude, for instance K = [ N ( D r ) p ] + 1 .   Then r + D ( N / K ) 1 / p 2 r ,   and the balls B ( μ , r + D ( N / K ) 1 / p )   have radius at most δ   if r = δ 2 .   Now K   and r   are fixed, N = N ( E , δ / 2 )   , and we just have to estimate the cardinality C K   of C K   . For this we first note that C K = ( K + N 1 ) ! ( K 1 ) ! N ! = ( K + N 1 ) . . . K N !   Without loss of generality, we have assumed δ < D   , so K > N   . Then K < < K + N 1 < 2 K   , and hence C K ( 2 K ) N N ! ( 2 K e N ) N .   Since N 1   and 2 D δ   , we can write K N ( 2 D δ ) p + 1 2 N ( 2 D δ ) p ,   and we deduce C K ( C D δ ) p N ( E , δ 2 )   with C = 2 ( 4 e ) 1 / p 8 e   .
    Consequently, we have covered P ( E )   by the ( C D δ ) p N ( E , δ 2 )   balls ( B ( μ , δ ) ) μ C K   with radius δ   . This concludes the argument.
In the particular case when E   is the Euclidean ball B R   of radius R   in R d   , we have
N ( B R , r ) k ( R r ) d (A.3)
for some constant k   . To see this, one may for instance consider the balls with center in the lattice r d Z d   in R d   . Then Theorem  A.1 yields the bound N p ( B R , δ ) ( C R δ ) p k ( R δ ) d ,   which is used in the present paper.

B Regularity estimates on the limit PDE

In this appendix we study solutions to the limit equation
t ρ = Δ ρ + . ( ρ ( V + W * ρ ) ) , t 0 , x R d (B.1)
and establish the regularity results stated in Proposition  3.6 . Following the method in [13, we shall measure the regularity in terms of L 2   -Sobolev spaces H s ( R d ) = { u L 2 ( R d ) ; α u L 2 ( R d ) , α N d , | α | s } ( s N ) .   Our main result is as follows.
Theorem B.1. Let V   and W   such that all their partial derivatives α V   and α W   are continuous and grow at most polynomially at infinity, for any multi-index α N d   with | α | s + 1   . Let a , E > 0   and let ρ 0   be a probability density such that R d e a | x | 2 d ρ 0 ( x ) E .   Then, there exists a continuous function f : ( 0 , + ) ( 0 , + )   , only depending on d   , s   , V   , W   , a   and E   , such that any classical solution ρ = ρ ( t , x )   to  B.1 , starting from ρ 0   , satisfies ρ ( t , ) H s ( R d ) f ( t ) .  
  • Proof. For the sake of simplicity we only give a formal proof, which can be turned rigorous by means of regularization arguments.
    Let then ρ = ( ρ ( t , . ) ) t 0   be a solution of t ρ = Δ ρ + . ( ρ ( V + W * ρ ) ) , t 0 , x R d ;   we rewrite the equation as t ρ = i = 1 d i i ρ + i [ ρ i φ ] ,   where i = e i   if e i   is the i   -th vector of the canonical base of R d   , and φ ( t , x ) = V ( x ) + W * ρ ( t , x ) .   Let α N d   be given. By integration by parts and Cauchy-Schwarz inequality,
    1 2 d d t R d | α ρ | 2 = R d α ρ t ( α ρ ) = R d α ρ α ( t ρ ) = i = 1 d R d α ρ α ( i i ρ + i [ ρ i φ ] ) = i = 1 d R d | α + e i ρ | 2 + α + e i ρ α [ ρ i φ ] i = 1 d R d | α + e i ρ | 2 + i d [ R d | α + e i ρ | 2 ] 1 / 2 [ β α C α , β R d | α β + e i φ β ρ | 2 ] 1 / 2 1 2 i = 1 d R d | α + e i ρ | 2 + β α C α , β R d | α β + e i φ β ρ | 2 .  
    By summing over α N d   with | α | s   , we find d d t | α | s R d | α ρ | 2 | α | s i = 1 d R d | α + e i ρ | 2 + | α | s β α C α , β R d | α β + e i φ β ρ | 2 .   Given T > 0   , by Proposition  3.1 there exist constants a ^   and E ^   , depending only on d   , a   , E   and T   , such that
    e a ^ | x | 2 d ρ ( t , x ) E ^ (B.2)
    for all t [ 0 , T ]   . In particular, it follows from our assumptions on the derivatives of V   and W   that all | α β + e i φ | 2   terms are bounded by some polynomial in | x |   , uniformly in t [ 0 , T ]   .
    Let x : = 1 + | x | 2   . For k , s 0   , we introduce the weighted norms u H k s : = ( | α | s R d x k | α u ( x ) | 2 d x ) 1 / 2   and u L k 1 : = R d x k | u ( x ) | d x .   Then for any s N   and T 0   there exist k   and C 0   such that
    0 t T d d t u H s 2 u H s + 1 2 + C u H k s 2 . (B.3)
    We shall prove later on the following interpolation lemma:
    Lemma B.2. Given d 1   , s N   an k 0   , there exist nonnegative constants C ( d , s , k )   and h ( d , s , k )   , and θ ( d , s ) ( 0 , 1 )   such that for all u L 1 ( R d ) H s + 1 ( R d )   , u H k s C ( d , s , k ) u L h ( d , s , k ) 1 1 θ ( d , s ) u H s + 1 θ ( d , s ) .  
    Then, again from  B.2 , all u L h ( d , s , k ) 1 ( t )   norms are bounded on [ 0 , T ]   , so from  B.3 and Lemma  B.2 there exists some constants C   such that d d t u H s 2 u H s + 1 2 + C u H s + 1 2 θ 1 2 u H s + 1 2 + C C u H s 2 / θ + C .   In other words A ( t ) = u H s 2 ( t )   satisfies on [ 0 , T ]   the differential inequality
    A ( t ) + c A ( t ) p C (B.4)
    for some constants c , C 0   and p = 1 / θ > 1   depending only on d   , a   , E   , s   and T   .
    Let us distinguish two cases. If A ( 0 ) 1   , then we only use the inequality A ( t ) C   to make sure that A ( t ) A ( 0 ) + C t 1 + C T   for any t [ 0 , T ]   .
    If on the other hand A ( 0 ) 1   , we deduce from  B.4 that A ( t ) + c A ( t ) p C A ( t ) ,   as long as A ( t ) 1   , so that D ( t ) : = A ( t ) 1 p   satisfies the inequality D ( t ) + ( p 1 ) C D ( t ) ( p 1 ) c   which integrates to D ( t ) D ( 0 ) e ( 1 p ) C t + c C ( 1 e ( p 1 ) C t ) c C ( 1 e ( p 1 ) t ) .   As a consequence, as long as A ( t ) 1   , we have A ( t ) ( c / C ) 1 / 1 p ( 1 e ( p 1 ) t ) 1 / ( 1 p ) .   In the end, we have obtained an a priori bound on A ( t ) = | α ρ | 2 ( t )   for t ( 0 , T ]   , depending only on d , s , a , E   and T   , but not on the initial value A ( 0 )   . Then the proof can be concluded by an approximation argument.
  • Proof of Lemma  B.2 . We proceed by induction on s   .
    In the first step we prove the result for s = 0   . Given d 1   and a ( 0 , 1 ]   , we write R d x k | u ( x ) | 2 d x = R d x k | u ( x ) | a | u ( x ) | 2 a d x ,   so, by Hölder's inequality, u L k 2 2 u L k a 1 a u L 2 a 1 a 2 a   (with 2 a 1 a =   if a = 1   ). Then by Sobolev embedding, u L k 2 2 C ( d , a ) u L k a 1 a u H 1 2 a ,   where a = 1   if d = 1   , a   is arbitrary in ( 0 , 1 )   if d = 2   , and a = 4 d + 2   if d 3   , that is, u L k 2 C ( d ) u L k a 1 1 θ ( d ) u H 1 θ ( d )   where θ ( 1 ) = 1 2   , any θ ( 2 ) ( 1 2 , 1 )   for d = 2   , and θ ( d ) = d d + 2   for d 3   .
    In the second step we let s 1   and assume by induction that there exist some constants C ( d , s 1 , k ) , h ( d , s 1 , k ) 0   and θ ( d , s 1 ) ( 0 , 1 )   such that for all u L 1 ( R d ) H s ( R d )   :
    u H k s 1 C ( d , s 1 , k ) u L h ( d , s 1 , k ) 1 1 θ ( d , s 1 ) u H s θ ( d , s 1 ) .   Let then u L 1 ( R d ) H s + 1 ( R d )   .
    Given α N d   with | α | = j   and 1 j s   , we split α   into α = α 1 + α 2   with | α 2 | = 1   , and integrate by parts:
    α u L k 2 2 k α 1 u L 2 k 2 2 α u L 2 + α 1 u L 2 k 2 α + α 2 u L 2 ( k + 1 ) α 1 u L 2 k 2 sup | α | j + 1 α u L 2 ,  
    whence
    sup | α | = j α u L k 2 2 ( k + 1 ) sup | α | = j 1 α u L 2 k 2 sup | α | j + 1 α u L 2 ( k + 1 ) sup | α | s 1 α u L 2 k 2 sup | α | s + 1 α u L 2 .  
    Since this holds for any 1 j s   we obtain sup 1 | α | s α u L k 2 2 ( k + 1 ) sup | α | s 1 α u L 2 k 2 sup | α | s + 1 α u L 2 .   Moreover u L k 2 2 u L 2 k 2 u L 2 sup | α | s 1 α u L 2 k 2 sup | α | s + 1 α u L 2 ,   so that finally u H k s 2 ( k + 1 ) u H 2 k s 1 u H s + 1 .   Then, by induction hypothesis, u H k s 2 ( k + 1 ) C ( d , s 1 , 2 k ) u L h ( d , s 1 , 2 k ) 1 1 θ ( d , s 1 ) u H s θ ( d , s 1 ) u H s + 1 ,   whence u H k s C ( d , k , s ) u L h ( d , s , k ) 1 1 θ ( d , s ) u H s + 1 θ ( d , s )   where θ ( d , s ) = 1 2 θ ( d , s 1 ) ( 0 , 1 )   and h ( d , s , k ) = h ( d , s 1 , 2 k ) 0   . This concludes the argument.
Acknowledgments: The authors thank M. Ledoux for his relevant comments and his interest during the preparation of this work, as well as providing Reference [16.
References

  1. Araujo, A. and Gine, E. The central limit theorem for real and Banach valued random variables John Wiley & Sons, New York, (1980).
  2. Benachour, S., Roynette, B., Talay, D., and Vallois, P. Nonlinear self-stabilizing processes. I. Existence, invariant probability, propagation of chaos. Stochastic Process. Appl. 75, 2 (1998), 173–201.
  3. Benachour, S., Roynette, B., and Vallois, P. Nonlinear self-stabilizing processes. II. Convergence to invariant probability. Stochastic Process. Appl. 75, 2 (1998), 203–224.
  4. Benedetto, D., Caglioti, E., Carrillo, J. A., and Pulvirenti, M. A non-Maxwellian steady distribution for one-dimensional granular media. J. Statist. Phys. 91, 5-6 (1998), 979–990.
  5. Benedetto, D., Caglioti, E., and Pulvirenti, M. A kinetic equation for granular media. RAIRO Modél. Math. Anal. Numér. 31, 5 (1997), 615–641.
  6. Bobkov, S., Gentil, I., and Ledoux, M. Hypercontractivity of Hamilton-Jacobi equations. J. Math. Pures Appl. 80, 7 (2001), 669–696.
  7. Bobkov, S., and Gotze, F. Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. J. Funct. Anal. 163 (1999), 1–28.
  8. Bolley, F., and Villani, C. Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities. To appear in Ann. Fac. Sci. Toulouse. Available online via http://www.umpa.ens-lyon.fr/~cvillani/cv.html#publicationlist, 2004.
  9. Carrillo, J. A., McCann, R. J., and Villani, C. Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates. Rev. Mat. Iberoamericana 19, 3 (2003), 971–1018.
  10. Carrillo, J. A., McCann, R. J., and Villani, C. Contractions in the 2-Wasserstein length space and thermalization of granular media. Preprint, 2004.
  11. Cattiaux, P., and Guillin, A. Talagrand's like quadratic transportation cost inequalities. Available online via http://www.ceremade.dauphine.fr/~guillin/index3.html. Preprint, 2004.
  12. Dembo, A., and Zeitouni, O. Large Deviations Techniques And Applications, second ed. Springer Verlag, New York, 1998.
  13. Desvillettes, L., and Villani, C. On the spatially homogeneous Landau equation for hard potentials. I. Existence, uniqueness and smoothness. Comm. Partial Differential Equations 25, 1-2 (2000), 179–259.
  14. Djellout, H., Guillin, A., and Wu, L. Transportation cost-information inequalities and applications to random dynamical systems and diffusions. Ann. Probab. 32, 3B (2004), 2702–2732.
  15. Gao, F. Moderate deviations and large deviations for kernel density estimators. J. Theor. Prob., 16 (2003), 401–418.
  16. Gine, E. and Zinn, J. Empirical processes indexed by Lipschitz functions Ann. Probab.14 , 4 (1986), 1329–1338.
  17. Ledoux, M. The concentration of measure phenomenon, vol. 89 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, 2001.
  18. Ledoux, M. and Talagrand, M., Probability in Banach spaces. Springer-Verlag, Berlin, 1991.
  19. Malrieu, F. Logarithmic Sobolev inequalities for some nonlinear PDE's. Stochastic Process. Appl. 95, 1 (2001), 109–132.
  20. Malrieu, F. Convergence to equilibrium for granular media equations and their Euler schemes. Ann. Appl. Probab. 13, 2 (2003), 540–560.
  21. Marchioro, C., and Pulvirenti, M. Mathematical theory of incompressible nonviscuous fluids. Springer-Verlag, New York, 1994.
  22. Massart, P. Saint-Flour Lecture Notes. Available at http://www.math.u-psud.fr/~massart, 2003.
  23. Otto, F., and Villani, C. Generalization of an inequality by Talagrand, and links with the logarithmic Sobolev inequality. J. Funct. Anal. 173 (2000), 361–400.
  24. Petrov, V. V. Limit theorems of probability theory. The Clarendon Press Oxford University Press, New York, 1995.
  25. Schochet, S. The point-vortex method for periodic weak solutions of the 2-D Euler equations. Comm. Pure Appl. Math. 49, 9 (1996), 911–965.
  26. Sion, M. On general minimax theorems. Pac. J. Math. 8 (1958), 171–176.
  27. Sznitman, A.-S. Topics in propagation of chaos. In École d'Été de Probabilités de Saint-Flour XIX—1989, vol. 1464 of Lecture Notes in Math. Springer, Berlin, 1991.
  28. Talagrand, M. Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 6 (1996), 587–600.
  29. Villani, C. Topics in optimal transportation. Grad. Stud. Math. (58), American Mathematical Society, Providence, 2003.
  30. Wang, F.-Y. Probability distance inequalities on Riemannian manifolds and path spaces. J. Funct. Anal. 206, 1 (2004), 167–190.

ENS Lyon, Umpa, 46 allee d'Italie, F-69364 Lyon Cedex 07 E-mail address : fbolley@umpa.ens-lyon.fr CEREMADE, Universite Paris Dauphine E-mail address : guillin@ceremade.dauphine.fr ENS Lyon, Umpa, 46 allee d'Italie, F-69364 Lyon Cedex 07 E-mail address : cvillani@umpa.ens-lyon.fr