, , ,

Introduction

Large stochastic particle systems constitute a popular way to perform numerical simulations in many contexts, either because they are used in some physical model (as in e.g. stellar or granular media) or as an approximation of a continuous model (as in e.g. vortex simulation for Euler equation, see [21,Chapter5] for instance). For such systems one may wish to establish concentration estimates showing that the behavior of the system is sharply stabilized as the number

N

of particles goes to infinity. It is natural to search for these estimates in the setting of large (or moderate) deviations, since one wishes to make sure that the numerical method has a very small probability to give wrong results. From a physical perspective, concentration estimates may be useful to establish the validity of a continuous approximation such as a mean-field limit.

When one is interested in the asymptotic behavior of just one, or a few observables (such as the mean position...), there are efficient methods, based for instance on concentration of measure theory. As a good example, Malrieu [19] recently applied tools from the fields of Logarithmic Sobolev inequalities, optimal transportation and concentration of measure, to prove very neat bounds like

\begin{matrix} {sup}_{∥ φ ∥_{L i p} \leq 1} P [| \frac{1}{N} \sum_{i = 1}^{N} φ (X_{t}^{i}) - \int φ d μ_{t} | > ɛ] \leq 2 e^{- λ N ɛ^{2}} . \end{matrix}

(0.1)

Here

(X_{t}^{i})_{1 \leq i \leq N}

stand for the positions of particles (in phase space) at time

t

ɛ

is a given error,

P

stands for the probability,

μ_{t}

is a probability measure governing the limit behavior of the system, and

λ > 0

is a positive constant depending on the particular system he is considering (a simple instance of McKean-Vlasov model used in particular in the modelling of granular media). Moreover,

∥ φ ∥_{L i p} : = {sup}_{x \neq y} \frac{| f (x) - f (y) |}{d (x, y)},

where

d

is the distance in phase space (say the Euclidean norm

| \cdot |

R^{d}

This approach can lead to nice bounds, but has the drawback to be limited to a finite number of observables. Of course, one may apply 0.1 to many functions

φ

, and obtain something like

\begin{matrix} P [\sum_{k = 1}^{\infty} \frac{1}{k^{2}} | \frac{1}{N} \sum_{i = 1}^{N} φ_{k} (X_{t}^{i}) - \int φ_{k} d μ_{t} | > ɛ] \leq C e^{- N λ ɛ^{2}}, \end{matrix}

(0.2)

where

(φ_{k})_{k \in N}

is an arbitrarily chosen dense family in the set of all

1

-Lipschitz functions converging to 0 at infinity. If we denote by

δ_{x}

the Dirac mass at point

x

, and by

{\hat{μ}}_{t}^{N} : = \frac{1}{N} \sum_{i = 1}^{N} δ_{X_{t}^{i}}

the empirical measure associated with the system (this is a random probability measure), then estimate 0.2 can be interpreted as a bound on how close

{\hat{μ}}_{t}^{N}

is to

μ_{t}

. Indeed,

\begin{matrix} d (μ, ν) : = \sum_{k = 1}^{\infty} \frac{1}{k^{2}} | \int φ_{k} d (μ - ν) | \end{matrix}

(0.3)

defines a distance on probability measures, associated with a topology which is at least as strong as the weak convergence of measures (convergence against bounded continuous test functions). However, this point of view is deceiving: for practical purposes, the distance

d

can hardly be estimated, and in any case 0.2 does not contain more information than 0.1 :

it is only useful if one considers a finite number of observables.

Sanov's large deviation principle [12,Theorem6.2.10] provides a more satisfactory tool to estimate the distance between the empirical measure and its limit. Roughly speaking, it implies, for independent variables

X_{t}^{i}

, an estimate of the form

P [d i s t ({\hat{μ}}_{t}^{N}, μ) \geq ɛ] ≃ e^{- N α (ɛ)} as N \to \infty,

where

\begin{matrix} α (ɛ) : = inf {H (ν | μ); d i s t (ν, μ) \geq ɛ} \end{matrix}

(0.4)

and

H

is the relative

H

functional:

H (ν | μ) = \int \frac{d ν}{d μ} log \frac{d ν}{d μ} d μ

(to be interpreted as

+ \infty

ν

is not absolutely continuous with respect to

μ

). Since

H

behaves in many ways like a square distance, one can hope that

α (ɛ) \geq c o n s t . ɛ^{2}

. Here “

d i s t

” may be any distance which is continuous with respect to the weak topology, a condition which might cause trouble on a non-compact phase space.

Yet Sanov's theorem is not the final answer either: it is actually asymptotic, and only implies a bound like

limsup \frac{1}{N} log P [d i s t ({\hat{μ}}_{t}^{N}, μ) \geq ɛ] \leq - α (ɛ),

which, unlike 0.1 , does not contain any explicit estimate for a given

N

. Fortunately, there are known techniques to obtain quantitative upper bounds for such theorems, see in particular [12,Exercise 4.5.5] . Since these techniques are devised for compact phase spaces, a further truncation will be necessary to treat more general situations.

In this paper, we shall show how to combine these ideas with recent results about measure concentration and transportation distances, in order to derive in a systematic way estimates that are explicit, deal with the empirical measure as a whole, apply to non-compact phase spaces, and can be used to study some particle systems arising in practical problems.

Typical estimates will be of the form

\begin{matrix} P [{sup}_{∥ φ ∥_{L i p} \leq 1} (\frac{1}{N} \sum_{i = 1}^{N} φ (X_{t}^{i}) - \int φ d μ_{t}) > ɛ] \leq C e^{- λ N ɛ^{2}} . \end{matrix}

(0.5)

As a price to pay, the constant

C

in the right-hand side will be much larger than the one in 0.1 .

Here is a possible application of 0.5 in a numerical perspective. Suppose your system has a limit invariant measure

μ_{\infty} = lim μ_{t}

t \to \infty

, and you wish to numerically plot itsdensity

f_{\infty}

. For that, you run your particle simulation for a long time

t = T

, and plot, say,

\begin{matrix} {\tilde{f}}_{t} (x) : = \frac{1}{N} \sum_{i = 1}^{N} ζ_{α} (x - X_{t}^{i}), \end{matrix}

(0.6)

where

ζ_{α} = α^{- d} ζ (x / α)

is a smooth approximation of a Dirac mass as

α \to 0

(as usual,

ζ

is a nonnegative smooth radial function on

R^{d}

with compact support and unit integral).

With the help of estimates such as 0.5 , it is often possible to compute bounds on, say,

P [∥ {\tilde{f}}_{T} - f_{\infty} ∥_{L^{\infty}} > ɛ]

in terms of

N

ɛ

T

and

α

. In this way one can “guarantee” that all details of the invariant measure are captured by the stochastic system. While this problem is too general to be treated abstractly, we shall show on some concrete model examples how to derive such bounds for the same kind of systems that was considered by Malrieu. In the next section, we shall explain about our main tools and results; the rest of the paper will be devoted to the proofs. Some auxiliary estimates of general interest are postponed in Appendix.

1 Tools and main results

1.1 Wasserstein distances

To measure distances between probability measures, we shall use transportation distances, also called Wasserstein distances. They can be defined in an abstract Polish space

X

as follows: given

p

[1, + \infty)

d

a lower semi-continuous distance on

X

, and

μ

and

ν

two Borel probability measures on

X

, the Wasserstein distance of order

p

between

μ

and

ν

W_{p} (μ, ν) : = {inf}_{π \in Π (μ, ν)} {(\int \int d (x, y)^{p} d π (x, y))}^{1 / p}

where

π

runs over the set

Π (μ, ν)

of all joint probability measures on the product space

X \times X

with marginals

μ

and

ν

; it is easy to check [29,Theorem7.3] that

W_{p}

is a distance on the set

P_{p} (X)

of Borel probability measures

μ

X

such that

\int d (x_{0}, x)^{p} d μ (x) < + \infty

For this choice of distance, in view of Sanov's theorem, a very natural class of inequalities is the family of so-called transportation inequalities, or Talagrand inequalities (see [17] for instance): by definition, given

p \geq 1

and

λ > 0

, a probability measure

μ

X

satisfies

T_{p} (λ)

if the inequality

W_{p} (ν, μ) \leq \sqrt{\frac{2}{λ} H (ν | μ)}

holds for any probability measure

ν

. We shall say that

μ

satisfies a

T_{p}

inequality if it satisfies

T_{p} (λ)

for some

λ > 0

. By Jensen's inequality, these inequalities become stronger as

p

becomes larger; so the weakest of all is

T_{1}

. Some variants introduced in [8] will also be considered.

Of course

T_{p}

is not a very explicit condition, and a priori it is not clear how to check that a given probability measure satisfies it. It has been proven [7, 14, 8] that

T_{1}

is equivalent to the existence of a square-exponential moment: in other words, a reference measure

μ

satisfies

T_{1}

if and only if there is

α > 0

such that

\int e^{α d (x, y)^{2}} d μ (x) < + \infty

for some (and thus any)

y \in X

. If that condition is satisfied, then one can find explicitly some

λ

such that

T_{1} (λ)

holds true: see for instance [8] .

This criterion makes

T_{1}

a rather convenient inequality to use. Another popular inequality is

T_{2}

, which appears naturally in many situations where a lot of structure is available, and which has good tensorization properties in many dimensions. Up to now,

T_{2}

inequalities have not been so well characterized: it is known that they are implied by a Logarithmic Sobolev inequality [23, 6, 30] , and that they imply a Poincaré, or spectral gap, inequality [23, 6] .

See [11] for an attempt to a criterion for

T_{2}

. In any case, contrary to the case

p = 1

, there is no hope to obtain

T_{2}

inequalities from just integrability or decay estimates.

In this paper, we shall mainly focus on the case

p = 1

, which is much more flexible.

1.2 Metric entropy

When

X

is a compact space, the minimum number

m (X, r)

of balls of radius

r

needed to cover

X

is called the metric entropy of

X

. This quantity plays an important role in quantitative variants of Sanov's Theorem [12,Exercise4.5.5] . In the present paper, to fix ideas we shall always be working in the particular Euclidean space

R^{d}

, which of course is not compact; and we shall reduce to the compact case by truncating everything to balls of finite radius

R

. This particular choice will influence the results through the function

m (P_{p} (B_{R}), r)

, where

B_{R}

is the ball of radius

R

centered at some point, say the origin, and

P_{p} (B_{R})

is the space of probability measures on

B_{R}

, metrized by

W_{p}

1.3 Sanov-type theorems

The core of our estimates is based on variants of Sanov's Theorem, all dealing with independent random variables. Let

μ

be a given probability measure on

R^{d}

, and let

(X^{i})_{i = 1, . . ., N}

be a sample of independent variables, all distributed according to

μ

; let also

{\hat{μ}}^{N} : = \frac{1}{N} \sum_{i = 1}^{N} δ_{X^{i}}

be the associated empirical measure. In our first main result we assume a

T_{p}

inequality for the measure

μ

, and deduce from that an upper bound in

W_{p}

distance:

Theorem 1.1. Let

p \in [1, 2]

and let

μ

be a probability measure on

R^{d}

satisfying a

T_{p} (λ)

inequality. Then, for any

d^{'} > d

and

λ^{'} < λ

, there exists some constant

N_{0}

, depending on

λ^{'}, d^{'}

and some square-exponential moment of

μ

, such that for any

ɛ > 0

and

N \geq N_{0} max (ɛ^{- (d^{'} + 2)}, 1)

\begin{matrix} P [W_{p} (μ, {\hat{μ}}^{N}) > ɛ] \leq e^{- γ_{p} \frac{λ^{'}}{2} N ɛ^{2}}, \end{matrix}

(1.1)

where

γ_{p} = {\begin{matrix} 1 & if 1 \leq p < 2 \\ 3 - 2 \sqrt{2} & if p = 2 . \end{matrix}

Compared to Sanov's Theorem, this result is more restrictive in the sense that it requires some extra assumptions on the reference measure

μ

, but under these hypotheses we are able to replace a result which was only asymptotic by a pointwise upper bound on the error probability, together with a lower bound on the required size of the sample.

In view of the Kantorovich-Rubinstein duality formula

\begin{matrix} W_{1} (μ, ν) = sup {\int f d (μ - ν); ∥ f ∥_{L i p} \leq 1}, \end{matrix}

(1.2)

Theorem 1.1 implies concentration inequalities such as

P [{sup}_{f; ∥ f ∥_{L i p} \leq 1} (\frac{1}{N} \sum_{k = 1}^{N} f (X_{i}) - \int f d μ) > ɛ] \leq e^{- \frac{λ^{'}}{2} N ɛ^{2}}

for

λ^{'} < λ

, and

N

sufficiently large, under the assumption that

μ

satisfies a

T_{1}

inequality, or equivalently admits a finite square-exponential moment. Those types of inequalities are of interest in non-parametric statistics and choice models [22] .

Remark 1.2. The sole inequality

T_{1} (λ)

implies that for all 1-Lipschitz function

f

P [\frac{1}{N} \sum_{k = 1}^{N} f (X_{i}) - \int f d μ > ɛ] \leq e^{- \frac{λ}{2} N ɛ^{2}},

and it is easy to see that the coefficient

λ

in this inequality is the best possible.

While the quantity controlled in Theorem 1.1 is much stronger, the estimate is weakened only in that

λ

is replaced by some

λ^{'} > λ

(arbitrarily close to

λ

) and that

N

has to be large enough. In fact, a variant of the proof below would yield estimates such as

P [W_{p} (μ, {\hat{μ}}^{N}) > ɛ] \leq C (ɛ) e^{- γ \frac{λ^{'}}{2} N ɛ^{2}},

where now there is no restriction on

N

, but

C (ɛ)

is a larger constant, explicitly computable from the proof.

Remark 1.3. As pointed out to us by M. Ledoux, there is another way to concentration estimates on the empirical measure when

d = p = 1

. Indeed, in this specific case,

W_{1} ({\hat{μ}}^{N}, μ) = ∥ \frac{1}{N} \sum_{i = 1}^{N} H (\cdot - X_{i}) - F ∥_{L^{1} (R)}

where

H = 1_{[0, + \infty)}

stands for the Heaviside function on

R

and

F

denotes the repartition function of

μ

, so that

P [W_{1} ({\hat{μ}}^{N}, μ) \geq ɛ] = P [∥ \frac{1}{N} \sum_{i = 1}^{N} F_{i} ∥_{L^{1}} > ɛ]

where

F_{i} : = H (\cdot - X_{i}) - F (1 \leq i \leq N)

are centered

L^{1} (R)

-valued independent identically distributed random variables.

But, according to [1,Exercise3.8.14] , a centered

L^{1} (R)

-valued random variable

Y

satisfies a Central Limit Theorem if and only if

\int_{R} {(E [Y^{2} (t)])}^{1 / 2} d t < + \infty,

a condition which for the random variables

F_{i}

's can be written

\begin{matrix} \int_{R} \sqrt{F (t) (1 - F (t))} d t < + \infty . \end{matrix}

(1.3)

Condition 1.3 in turn holds true as soon as (for instance)

\int_{R} | x |^{2 + δ} d μ (x)

is finite for some positive

δ

. Then we may apply a quantitative version of the Central Limit Theorem for random varaiables in the Banach space

L^{1} (R)

. See [16] and [18] for related works.

Remark 1.4. Theorem 1.1 applies if

N

is at least as large as

ɛ^{- r}

for some

r > d + 2

; we do not know whether

d + 2

here is optimal.

For the applications that we shall treat, in which the tails of the probability distributions will be decaying very fast, Theorem 1.1 will be sufficient. However, it is worthwile pointing out that the technique works under much broader assumptions: weaker estimatescan be proven for probability measures that do not decay fast enough to admit finite square-exponential moments. Here below are some such results using only polynomial moment estimates:

Theorem 1.5. Let

q \geq 1

and let

μ

be a probability measure on

R^{d}

such that

\int_{R^{d}} | x |^{q} d μ (x) < + \infty .

Then (i) For any

p \in [1, q / 2)

δ \in (0, q / p - 2)

and

d^{'} > d

, there exists a constant

N_{0}

such that

P [W_{p} (μ, {\hat{μ}}^{N}) > ɛ] \leq ɛ^{- q} N^{- \frac{q}{2 p} + \frac{δ}{2}}

for any

ɛ > 0

and

N \geq N_{0} max (ɛ^{- q \frac{2 p + d^{'}}{q - p}}, ɛ^{d^{'} - d})

; (ii) For any

p \in [q / 2, q)

δ \in (0, q / p - 1)

and

d^{'} > d

there exists a constant

N_{0}

such that

P [W_{p} (μ, {\hat{μ}}^{N}) > ɛ] \leq ɛ^{- q} N^{1 - \frac{q}{p} + δ}

for any

ɛ > 0

and

N \geq N_{0} max (ɛ^{- q \frac{2 p + d^{'}}{q - p}}, ɛ^{d^{'} - d})

Here are also some variants under alternative “regularity” assumptions:

Theorem 1.6.

(i) Let $p \geq 1$ ; assume that $ℰ_{α} : = \int e^{α | x |} d μ$ is finite for some $α > 0$ . Then, for all $d^{'} > d$ , there exist some constants $K$ and $N_{0}$ , depending only on $d$ , $α$ and $ℰ_{α}$ , such that $P [W_{p} (μ, {\hat{μ}}^{N}) > ɛ] \leq e^{- K N^{1 / p} min (ɛ, ɛ^{2})}$ for any $ɛ > 0$ and $N \geq N_{0} max (ɛ^{- (2 p + d^{'})}, 1)$ .

(ii) Suppose that

μ

satifies

T_{1}

and a Poincaré inequality, then for all

a < 2

there exists some constants

K

and

N_{0}

such that

\begin{matrix} P [W_{2} (μ, {\hat{μ}}^{N}) > ɛ] \leq e^{- K N min (ɛ^{2}, ɛ^{a})} \end{matrix}

(1.4)

for any

ɛ > 0

and

N \geq N_{0} max (ɛ^{- (4 + d^{'})}, 1)

(iii) Let

p > 2

and let

μ

be a probability measure on

R^{d}

satisfying

T_{p} (λ)

Then for all

λ^{'} < λ

and

d^{'} > d

there exists some constant

N_{0}

, depending on

μ

only through

λ

and some square-exponential moment, such that

\begin{matrix} P [W_{p} (μ, {\hat{μ}}^{N}) > ɛ] \leq min (e^{- \frac{λ^{'}}{2} N ɛ^{2}} + e^{- (N ɛ^{d^{'} + 2})^{2 / d^{'}}}, 2 e^{- \frac{λ^{'}}{4} N^{2 / p} ɛ^{2}}) \end{matrix}

(1.5)

for any

ɛ > 0

and

N \geq N_{0} max (ɛ^{- (d^{'} + 2)}, 1)

1.4 Interacting systems of particles

We now consider a system of

N

interacting particles whose time-evolution is governed by the system of coupled stochastic differential equations

\begin{matrix} d X_{t}^{i} = \sqrt{2} d B_{t}^{i} - \nabla V (X_{t}^{i}) d t - \frac{1}{N} \sum_{j = 1}^{N} \nabla W (X_{t}^{i} - X_{t}^{j}) d t, i = 1, . . ., N . \end{matrix}

(1.6)

Here

X_{t}^{i}

is the position at time

t

of particule number

i

, the

B^{i}

's are

N

independent Brownian motions, and

V

and

W

are smooth potentials, sufficiently nice that 1.6 can be solved globally in time. We shall always assume that

W

(which can be interpreted as an interaction potential) is a symmetric function, that is

W (- z) = W (z)

for all

z \in R^{d}

Equation 1.6 is a particularly simple instance of coupled system; in the case when

V

is quadratic and

W

has cubic growth, it was used as a simple mean-field kinetic model for granular media (see e.g. [19] ). While many of our results could be extended to more general systems, that particular one will be quite enough for our exposition.

To this system of particles is naturally associated the empirical measure, defined for each time

t \geq 0

\begin{matrix} {\hat{μ}}_{t}^{N} : = \sum_{i = 1}^{N} δ_{X_{t}^{i}} . \end{matrix}

(1.7)

Under suitable assumptions on the potentials

V

and

W

, it is a classical result that, if the initial positions of the particle system are distributed chaotically (for instance, if they are identically distributed, independent random variables), then the empirical measure

{\hat{μ}}_{t}^{N}

converges as

N \to \infty

to a solution of the nonlinear partial differential equation

\begin{matrix} \frac{\partial μ_{t}}{\partial t} = Δ μ_{t} + \nabla \cdot (μ_{t} \nabla (V + W * μ_{t})), \end{matrix}

(1.8)

where

\nabla \cdot

stands for the divergence operator. Equation 1.8 is a simple instance of McKean-Vlasov equation. This convergence result is part of the by now well-developed theory of propagation of chaos, and was studied by Sznitman for pedagogical reasons [27] , in the case of potentials that grow at most quadratically at infinity. Later, Benachour, Roynette, Talay and Vallois [2, 3] considered the case where the interaction potential grows faster than quadratically. As far as the limit equation 1.8 is concerned, a discussion of its use in the modelling of granular media in kinetic theory was performed by Benedetto, Caglioti, Carrillo and Pulvirenti [4, 5] , while the asymptotic behavior in large time was studied by Carrillo, McCann and Villani [9, 10] with the help of Wasserstein distances and entropy inequality methods. Then Malrieu [19] presented a detailed study of both limits

t \to \infty

and

N \to \infty

by probabilistic methods, and established estimates of the type of 0.1 under adequate convexity assumptions on

V

and

W

(see also [29,Problem 15] ).

As announced before, we shall now give some estimates on the convergence at the level of the law itself. To fix ideas, we assume that

V

and

W

have locally bounded Hessian matrices satisfying

\begin{matrix} {\begin{matrix} (i) & D^{2} V (x) \geq β I, γ I \leq D^{2} W (x) \leq γ^{'} I, \forall x \in R^{d}, \\ (ii) & | \nabla V (x) | = O (e^{a | x |^{2}}) for any a > 0 . \end{matrix} \end{matrix}

(1.9)

Under these assumptions, we shall derive the following bounds.

Theorem 1.7. Let

μ_{0}

be a probability measure on

R^{d}

, admitting a finite square-exponential moment:

\exists α_{0} > 0; M_{α_{0}} : = \int e^{α_{0} | x |^{2}} d μ_{0} (x) < + \infty .

Let

(X_{0}^{i})_{1 \leq i \leq N}

N

independent random variables with common law

μ_{0}

. Let

(X_{t}^{i})

be the solution of 1.6 with initial value

(X_{0}^{1}, \dots X_{0}^{N})

, where

V

and

W

are assumed to satisfy 1.9 ; and let

μ_{t}

be the solution of 1.8 with initial value

μ_{0}

Let also

{\hat{μ}}_{t}^{N}

be the empirical measure associated with the

(X_{t}^{i})_{1 \leq i \leq N}

. Then, for all

T \geq 0

, there exists some constant

K = K (T)

such that, for any

d^{'} > d

, there exists some constants

N_{0}

and

C

such that for all

ɛ > 0

N \geq N_{0} max (ɛ^{- (d^{'} + 2)}, 1) ⟹ P [{sup}_{0 \leq t \leq T} W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}) > ɛ] \leq C (1 + T ɛ^{- 2}) exp (- K N ɛ^{2}) .

Note that in the above theorem we have proven not only that for all

t

, the empirical measure is close to the limit measure, but also that the probability of observing any significant deviation during a whole time period

[0, T]

is small.

The fact that

{\hat{μ}}_{t}^{N}

is very close to the deterministic measure

μ_{t}

implies the propagation of chaos: two particles drawn from the system behave independently of each other as

N \to \infty

(see Sznitman [27] for more details). But we can also directly study correlations between particles and find more precise estimates: for that purpose it is convenient to consider the empirical measure on pairs of particles, defined as

{\hat{μ}}_{t}^{N, 2} : = \frac{1}{N (N - 1)} \sum_{i \neq j} δ_{(X_{t}^{i}, X_{t}^{j})} .

By a simple adaptation of the computations appearing in the proof of Theorem 1.7 , one can prove

Theorem 1.8. With the same notation and assumptions as in Theorem 1.7 , for all

T \geq 0

and

d^{'} > d

, there exists some constants

K > 0

and

N_{0}

such that for all

ɛ > 0

N \geq N_{0} max (ɛ^{- (d^{'} + 2)}, 1) ⟹ P [W_{1} ({\hat{μ}}_{t}^{N, 2}, μ_{t} \otimes μ_{t}) > ɛ] \leq exp (- K N ɛ^{2}) .

(Here

W_{1}

stands for the Wasserstein distance or order

1

P_{1} (R^{d} \times R^{d})

.) Of course, one may similarly consider the problem of drawing

k

particles with

k \geq 2

Theorems 1.7 and 1.8 use Theorem 1.1 as a crucial ingredient, which is why a strong integrability assumption is imposed on

μ_{0}

. Note however that, under stronger assumptions on the behaviour at infinity of

V

W

, as the existence of some

β \in R

B, ɛ > 0

such as

D^{2} V (x) \geq (B | x |^{ɛ} + β) I,, \forall x \in R^{d},

it can be proven that any square exponential moment for

μ_{t}

becomes instantaneously finite for

t > 0

. Note also that, by using Theorem 1.5 , one can obtain weaker but still relevant results of concentration of the empirical measure under just polynomial moment assumptions on

μ_{0}

, provided that

\nabla V

does not grow too fast at infinity. To limit the size of this paper, we shall not go further into such considerations.

1.5 Uniform in time estimates

In the “uniformly convex case” when

β > 0, β + 2 γ > 0

, it can be proven [19, 9, 10] that

μ_{t}

converges exponentially fast, as

t \to \infty

, to some equilibrium measure

μ_{\infty}

. In that case, it is natural to expect that the empirical measure is a good approximation of

μ_{\infty}

N \to \infty

and

t \to \infty

, uniformly in time. This is what we shall indeed prove:

Theorem 1.9. With the same notation and assumptions as in Theorem 1.7 , suppose that

β > 0, β + 2 γ > 0

. Then there exists some constant

K > 0

such that for any

d^{'} > d

, there exists some constants

C

and

N_{0}

such that for all

ɛ > 0

N \geq N_{0} max (ɛ^{- (d^{'} + 2)}, 1) ⟹ {sup}_{t \geq 0} P [W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}) > ɛ] \leq C (1 + ɛ^{- 2}) exp (- K N ɛ^{2})

As a consequence, there are constants

T_{0}

ɛ_{0}

(depending on the initial datum) and

K^{'} = K / 4

such that, under the same conditions on

N

and

ɛ

{sup}_{t \geq T_{0} log (ɛ_{0} / ɛ)} P [W_{1} ({\hat{μ}}_{t}^{N}, μ_{\infty}) > ɛ] \leq C (1 + ɛ^{- 2}) exp (- K^{'} N ɛ^{2}) .

Remark 1.10. In view of the results in [9] , it is natural to expect that a similar conclusion holds true when

V = 0

and

W

is convex enough. Propositions 3.1 and 3.8 below extend to that case, but it seems trickier to adapt the proof of Proposition 3.8 .

We conclude with an application to the numerical reconstruction of the invariant measure.

Theorem 1.11. With the same notation and assumptions as in Theorem 1.9 , consider the mollified empirical measure 0.6 . Then one can choose

α = O (ɛ)

in such a way that

N \geq N_{0} max (ɛ^{- (d^{'} + 2)}, 1) ⟹ {sup}_{t \geq T_{0} log (ɛ_{0} / ɛ)} P [∥ {\tilde{f}}_{t} - f_{\infty} ∥_{L^{\infty}} > ɛ] \leq C (1 + ɛ^{- (2 d + 4)}) exp (- K^{'} N ɛ^{2 d + 4}) .

These results are effective: all the constants therein can be estimated explicitly in terms of the data.

1.6 Strategy and plan

The strategy is rather systematic. First, we shall establish Sanov-type bounds for independent variables in

R^{d}

(not depending on time), resulting in concentration results such as Theorems 1.1 to 1.6 . This will be achieved along the ideas in [12,Exercices4.5.5and 6.2.19] (see also [25,Section5] ), by first truncating to a compact ball, and then covering the set of probability measures on this ball by a finite number of small balls (in the space of probability measures); the most tricky part will actually lie in the optimization of parameters.

With such results in hand, we will start the study of the particle system by introducing the nonlinear partial differential equation 1.8 . For this equation, the Cauchy problem can be solved in a satisfactory way, in particular existence and uniqueness of a solution, which for

t > 0

is reasonably smooth, can be shown under various assumptions on

V

and

W

(see e.g. [9, 10] ). Other regularity estimates such as the decay at infinity, or the smoothness in time, can be established; also the convergence to equilibrium in large time can sometimes be proven.

Next, following the presentation by Sznitman [27] , we introduce a family of independent processes

(Y_{t}^{i})_{1 \leq i \leq N}

, governed by the stochastic differential equation

\begin{matrix} {\begin{matrix} d Y_{t}^{i} & = & \sqrt{2} d B_{t}^{i} - \nabla V (Y_{t}^{i}) d t - \nabla W * μ_{t} (Y_{t}^{i}) d t, \\ Y_{0}^{i} & = & X_{0}^{i} . \end{matrix} \end{matrix}

(1.10)

As a consequence of Itô's formula, the law

ν_{t}

of each

Y_{t}^{i}

is a solution of the linear partial differential equation

\frac{\partial ν_{t}}{\partial t} = Δ ν_{t} + \nabla \cdot (\nabla (V + W * μ_{t}) ν_{t}), ν_{0} = μ_{0} .

But this linear equation is also solved by

μ_{t}

, and a uniqueness theorem implies that actually

ν_{t} = μ_{t}

, for all

t \geq 0

. See [2, 3] for related questions on the stochastic differential equation 1.10 .

For each given

t

, the independence of the variables

Y_{t}^{i}

and the good decay of

μ_{t}

will imply a strong concentration of the empirical measure

{\hat{ν}}_{t}^{N} : = \frac{1}{N} \sum_{i = 1}^{N} δ_{Y_{t}^{i}} .

To go further, we shall establish a more precise information, such as a control on

P [{sup}_{0 \leq t \leq T} W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) > ɛ] .

Such bounds will be obtained by combining the estimate of concentration at fixed time

t

with some estimates of regularity of

{\hat{ν}}_{t}^{N}

(and

μ_{t}

) in

t

, obtained via basic tools of stochastic differential calculus (in particular Doob's inequality).

Finally, we can show by a Gronwall-type argument that the control of the distance of

{\hat{μ}}_{t}^{N}

μ_{t}

reduces to the control of the distance of

{\hat{ν}}_{t}^{N}

μ_{t}

: for instance,

\begin{matrix} P [{sup}_{0 \leq t \leq T} W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}) > ɛ] \leq P [{sup}_{0 \leq t \leq T} W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) > C ɛ] \end{matrix}

(1.11)

for some constant

C

. We shall also show how a variant of this computation provides estimates of the type of those in Theorem 1.9 , and how to get data reconstruction estimates as in Theorem 1.11 .

1.7 Remarks and further developments

The results in this paper confirm what seems to be a rather general rule about Wasserstein distances: results in distance

W_{1}

are very robust and can be used in rather hard problems, with no particular structure; on the contrary, results in distance

W_{2}

are stronger, but usually require much more structure and/or assumptions. For instance, in the study of the equation 1.8 , the distance

W_{2}

works beautifully, and this might be explained by the fact that 1.8 has the structure of a gradient flow with respect to the

W_{2}

distance [9, 10] . In the problem considered by Malrieu [19] ,

W_{2}

is also well-adapted, but leads him to impose strong assumptions on the initial datum

μ_{0}

, such as the existence of a Logarithmic Sobolev inequality for

μ_{0}

, considered as a reference measure. As a general rule, in a context of geometric inequalities with more or less subtle isoperimetric content, related to Brenier's transportation mapping theorem,

W_{2}

is also the most natural distance to use [29] . On the contrary, here we are considering quite a rough problem (concentration for the law of a random probability measure, driven by a stochastic differential equation with coupling) and we wish to impose only natural integrability conditions; then the distance

W_{1}

is much more convenient.

Further developments could be considered. For instance, one may desire to prove some deviation inequalities for dependent sequences, say Markov chains, as both Sanov's theorem and transportation inequality can be established under appropriate ergodicity and integrability conditions.

Considering again the problem of the particle system, in a numerical context, one may wish to take into account the numerical errors associated with the time-discretization of the dynamics (say an implicit Euler scheme). For concentration estimates in one observable, a beautiful study of these issues was performed by Malrieu [20] . For concentration estimates on the whole empirical measure, to our knowledge the study remains to be done. Also errors due to the boundedness of the phase space actually used in the simulation might be taken into account, etc.

At a more technical level, it would be desirable to relax the assumption of boundedness of

D^{2} W

in Theorem 1.7 , so as to allow for instance the interesting case of cubic interaction.

This is much more technical and will be considered in a separate work.

Another issue of interest would be to consider concentration of the empirical measure on path space, i.e.

{\hat{μ}}_{[0, T]}^{N} : = \frac{1}{N} \sum_{i = 1}^{N} δ_{(X_{t}^{i})_{0 \leq t \leq T}},

where

T

is a fixed time length. Here

{\hat{μ}}_{[0, T]}^{N}

is a random measure on

C ([0, T]; R^{d})

and we would like to show that it is close to the law of the trajectories of the nonlinear stochastic differential equation

\begin{matrix} d Y_{t} = \sqrt{2} d B_{t} - \nabla V (Y_{t}) d t - (\nabla W * μ_{t}) (Y_{t}) d t, \end{matrix}

(1.12)

where the initial datum

Y_{0}

is drawn randomly according to

μ_{0}

. This will imply a quantitative information on the whole trajectory of a given particle in the system.

When one wishes to adapt the general method to this question, a problem immediately occurs: not only is

C ([0, T]; R^{d})

not compact, but also balls with finite radius in thisspace are not compact either (of course, this is true even if the phase space of particles is compact). One may remedy to this problem by embedding

C ([0, T]; B_{R})

into a space such as

L^{2} ([0, T]; B_{R})

, equipped with the weak topology; but we do not know of any “natural” metric on that space. There is (at least) another way out: we know from classical stochastic processes theory that integral trajectories of differential equations driven by white noise are typically Hölder-

α

for any

α < 1 / 2

. This suggests a natural strategy: choose any fixed

α \in (0, 1 / 2)

and work in the space

ℋ^{α} ([0, T]; R^{d})

, equipped with the norm

∥ w ∥_{ℋ^{α}} : = {sup}_{0 \leq t \leq T} | w (t) | + {sup}_{s \neq t} \frac{| w (t) - w (s) |}{| t - s |^{α}} .

For any

R > 0

, the ball of radius

R

and center 0 (the zero function) in

ℋ^{α}

is compact, and one may estimate its metric entropy. Then one can hope to perform all estimates by using the norm

ℋ^{α}

; for instance, establish a bound on, say, a square-exponential moment on the law of

Y_{t}

E exp (β ∥ (Y_{t})_{0 \leq t \leq T} ∥_{ℋ^{α}}^{2}) < + \infty .

Again, to avoid expanding the size of the present paper too much, these issues will be addressed separately.

2 The case of independent variables

In this section we consider the case where we are given

N

independent variables

X^{i} \in R^{d}

, distributed according to a certain law

μ

. There is no time dependence at this stage. We shall first examine the case when the law

μ

has very fast decay (Theorem 1.1 ), then variants in which it decays in a slower way (Theorem 1.5 and 1.6 ).

2.1 Proof of Theorem 1.1

The proof splits into three steps: (1) Truncation to a compact ball

B_{R}

of radius

R

, (2) covering of

P (B_{R})

by small balls of radius

r

and Sanov's argument, and (3) optimization of the parameters.

Step 1: Truncation. Let

R > 0

, to be chosen later on, and let

B_{R}

stand for the ball of radius

R

and center 0 (say) in

R^{d}

. Let

1_{B_{R}}

stand for the indicator function of

B_{R}

. We truncate

μ

into a probability measure

μ_{R}

on the ball

B_{R}

μ_{R} = \frac{1_{B_{R}} μ}{μ [B_{R}]} .

We wish to bound the quantity

P [W_{p} ({\hat{μ}}^{N}, μ) > ɛ]

in terms of

μ_{R}

and the associated empirical measure. For this purpose, consider independent variables

(X^{k})_{1 \leq k \leq N}

drawn according to

μ

, and

(Y^{k})_{1 \leq k \leq N}

drawn according to

μ_{R}

, independent of each other; then define

X_{R}^{k} : = {\begin{matrix} X^{k} & if | X^{k} | \leq R \\ Y^{k} & if | X^{k} | > R . \end{matrix}

Since

X^{1}

and

X_{R}^{1}

are distributed according to

μ

and

μ_{R}

respectively, we have, by definition of Wasserstein distance,

W_{p}^{p} (μ, μ_{R}) \leq E | X^{1} - X_{R}^{1} |^{p} = E (| X^{1} - Y^{1} |^{p} 1_{| X^{1} | > R}) \leq 2^{p} E (| X^{1} |^{p} 1_{| X^{1} | > R}) = 2^{p} \int_{{| x | > R}} | x |^{p} d μ (x) .

But

μ

satisfies a

T_{p} (λ)

inequality for some

p \geq 1

, hence a fortiori a

T_{1} (λ)

inequality, so

E_{α} : = \int_{R^{d}} e^{α | x |^{2}} d μ (x) < + \infty

for some

α > 0

(any

α < λ / 2

would do). If

R

is large enough (say,

R \geq \sqrt{p / (2 α)}

), then the function

r \mapsto \frac{r^{p}}{e^{α r^{2}}}

is nonincreasing for

r \geq R

, and then

W_{p}^{p} (μ, μ_{R}) \leq 2^{p} (\frac{R^{p}}{e^{α R^{2}}}) \int_{{| x | > R}} e^{α | x |^{2}} d μ (x) .

We conclude that

\begin{matrix} W_{p}^{p} (μ, μ_{R}) \leq 2^{p} E_{α} R^{p} e^{- α R^{2}} (α < λ / 2, R \geq \sqrt{p / 2 α}) . \end{matrix}

(2.1)

On the other hand, the empirical measures

{\hat{μ}}^{N} : = \frac{1}{N} \sum_{k = 1}^{N} δ_{X^{k}}, {\hat{μ}}_{R}^{N} : = \frac{1}{N} \sum_{k = 1}^{N} δ_{X_{R}^{k}}

satisfy

W_{p}^{p} ({\hat{μ}}_{R}^{N}, {\hat{μ}}^{N}) \leq \frac{1}{N} \sum_{k = 1}^{N} | X_{R}^{k} - X^{k} |^{p} \leq \frac{1}{N} \sum_{k = 1}^{N} Z^{k},

where

Z^{k} : = 2^{p} | X^{k} |^{p} 1_{| X^{k} | > R}

(k = 1, \dots, N)

. Then, for any

p \in [1, 2]

, we can introduce parameters

ɛ

and

θ > 0

, and use Chebyshev's exponential inequality and the independence of the variables

Z^{k}

to obtain

\begin{matrix} P [W_{p} ({\hat{μ}}_{R}^{N}, {\hat{μ}}^{N}) > ɛ] & \leq & P [\frac{1}{N} \sum_{k = 1}^{N} Z^{k} > ɛ^{p}] \end{matrix}

\begin{matrix} = & P [exp \sum_{k = 1}^{N} θ (Z^{k} - ɛ^{p}) > 1] \end{matrix}

\begin{matrix} \leq & E (exp \sum_{k = 1}^{N} θ (Z^{k} - ɛ^{p})) \end{matrix}

\begin{matrix} = & exp (- N [θ ɛ^{p} - log E exp (θ Z_{1})]) . \end{matrix}

(2.2)

In the case when

p < 2

, for any

α_{1} < α < \frac{λ}{2}

, there exists some constant

R_{0} = R_{0} (α_{1}, p)

such that

2^{p} θ r^{p} \leq α_{1} r^{2} + C,

for all

θ > 0

and

r \geq R_{0} θ^{\frac{1}{2 - p}}

, whence

E exp (θ Z_{1}) \leq E exp (α_{1} | X_{1} |^{2} 1_{| X^{k} | > R}) \leq 1 + E_{α} e^{(α_{1} - α) R^{2}} .

As a consequence,

\begin{matrix} P [W_{p} ({\hat{μ}}_{R}^{N}, {\hat{μ}}^{N}) > ɛ] \leq exp (- N [θ ɛ^{p} - E_{α} e^{(α_{1} - α) R^{2}}]) . \end{matrix}

(2.3)

From 2.1 , 2.3 and the triangular inequality for

W_{p}

\begin{matrix} P [W_{p} (μ, {\hat{μ}}^{N}) > ɛ] & \leq & P [W_{p} (μ, μ_{R}) + W_{p} (μ_{R}, {\hat{μ}}_{R}^{N}) + W_{p} ({\hat{μ}}_{R}^{N}, {\hat{μ}}^{N}) > ɛ] \end{matrix}

\begin{matrix} \leq & P [W_{p} (μ_{R}, {\hat{μ}}_{R}^{N}) > η ɛ - 2 E_{α}^{1 / p} R e^{- \frac{α}{p} R^{2}}] + P [W_{p} ({\hat{μ}}_{R}^{N}, {\hat{μ}}^{N}) > (1 - η) ɛ] \end{matrix}

\begin{matrix} \leq & P [W_{p} (μ_{R}, {\hat{μ}}_{R}^{N}) > η ɛ - 2 E_{α}^{1 / p} R e^{- \frac{α}{p} R^{2}}] \end{matrix}

\begin{matrix} + exp (- N (θ (1 - η)^{p} ɛ^{p} - E_{α} e^{(α_{1} - α) R^{2}})) . \end{matrix}

(2.4)

This estimate was established for any given

p \in [1, 2)

η \in (0, 1)

ɛ, θ > 0

α_{1} < α < \frac{λ}{2}

and

R \geq max (\sqrt{p / 2 α}, R_{0} θ^{\frac{1}{2 - p}})

, where

R_{0}

is a constant depending only on

α_{1}

and

p

In the case when

p = 2

, we let

Z^{k} : = | Y_{k} - X_{k} |^{2} 1_{| X_{k} | > R}

(k = 1, \dots, N)

, and starting from inequality 2.2 again, we choose

α_{1} < α

and then

θ : = α_{1} / 2

: by definition of

Z_{1}

and

μ_{R}

\begin{matrix} E (exp (\frac{α_{1}}{2} Z_{1})) & = & \int_{R^{2 d}} exp (\frac{α_{1}}{2} | y - x |^{2} 1_{| x | \geq R}) d μ (x) d μ_{R} (y) \end{matrix}

\begin{matrix} = & μ [B_{R}] + \frac{1}{μ [B_{R}]} \int_{| y | \leq R} \int_{| x | \geq R} exp (\frac{α_{1}}{2} | y - x |^{2}) d μ (x) d μ (y) \end{matrix}

\begin{matrix} \leq & 1 + (1 - E_{α} e^{- α R^{2}})^{- 1} \int_{| y | \leq R} e^{α_{1} | y |^{2}} d μ (y) \int_{| x | \geq R} e^{α_{1} | x |^{2}} d μ (x) \end{matrix}

\begin{matrix} \leq & 1 + 2 E_{α}^{2} e^{(α_{1} - α) R^{2}} \end{matrix}

for

R

large enough, from which

\begin{matrix} P [W_{2} ({\hat{μ}}_{R}^{N}, {\hat{μ}}^{N}) > ɛ] \leq exp (- N [\frac{α_{1}}{2} ɛ^{2} - 2 E_{α}^{2} e^{(α_{1} - α) R^{2}}]) . \end{matrix}

(2.5)

To sum up, in the case

p = 2

equation 2.4 writes

\begin{matrix} P [W_{2} (μ, {\hat{μ}}^{N}) > ɛ)] \leq P [W_{2} (μ_{R}, {\hat{μ}}_{R}^{N}) > η ɛ - 2 E_{α}^{1 / 2} R e^{- \frac{α}{2} R^{2}}] + exp (- N (\frac{α_{1}}{2} (1 - η)^{2} ɛ^{2} - 2 E_{α}^{2} e^{(α_{1} - α) R^{2}})) . \end{matrix}

(2.6)

So, apart from some error terms, for all

p \in [1, 2]

we have reduced the initial problem to establishing the result only for the probability law

μ_{R}

, whose support lies in the compact set

B_{R}

We end up this truncation procedure by proving that

μ_{R}

satisfies some modified

T_{p}

inequality. Let indeed

ν

be a probability measure on

B_{R}

, absolutely continuous with respect to

μ

(and hence with respect to

μ_{R}

); then, when

R

is larger than some constant depending only on

E_{α}

, we can write

\begin{matrix} H (ν | μ_{R}) - H (ν | μ) = \int_{B_{R}} log \frac{d ν}{d μ_{R}} d ν - \int_{B_{R}} log \frac{d ν}{d μ} d ν = log μ [B_{R}] \end{matrix}

\begin{matrix} \geq log (1 - E_{α} e^{- α R^{2}}) \end{matrix}

\begin{matrix} \geq - 2 E_{α} e^{- α R^{2}} . \end{matrix}

(2.7)

\begin{matrix}  \end{matrix}

But

μ

satisfies a

T_{p} (λ)

inequality, so

H (ν | μ) \geq \frac{λ}{2} W_{p}^{2} (μ, ν) \geq \frac{λ}{2} {(W_{p} (μ_{R}, ν) - W_{p} (μ_{R}, μ))}^{2}

by triangular inequality. Combining this with 2.7 , we obtain

H (ν | μ_{R}) \geq \frac{λ}{2} {(W_{p} (μ_{R}, ν) - W_{p} (μ_{R}, μ))}^{2} - 2 E_{α} e^{- α R^{2}}

From this, inequality 2.1 and the elementary inequality

\begin{matrix} \forall a \in (0, 1) \exists C_{a} > 0; \forall x, y \in R, (x - y)^{2} \geq (1 - a) x^{2} - C_{a} y^{2}, \end{matrix}

(2.8)

we deduce that for any

λ_{1} < λ

there exists some constant

K

such that

\begin{matrix} H (ν | μ_{R}) \geq \frac{λ_{1}}{2} W_{p} (μ_{R}, ν)^{2} - K R^{2} e^{- α R^{2}} . \end{matrix}

(2.9)

Step 2: Covering by small balls. In this second step we derive quantitative estimates on

{\hat{μ}}_{R}^{N}

. Let

φ

be a bounded continuous function on

R^{d}

, and let

ℬ

be a Borel set in

P (B_{R})

(equipped with the weak topology of convergence against bounded continuous test functions). By Chebyshev's exponential inequality and the independence of the variables

X_{R}^{k}

\begin{matrix} P [{\hat{μ}}_{R}^{N} \in ℬ] & \leq & exp (- N {inf}_{ν \in ℬ} \int_{B_{R}} φ d ν) E (e^{N \int_{B_{R}} φ d {\hat{μ}}_{R}^{N}}) \end{matrix}

\begin{matrix} = & exp (- N {inf}_{ν \in ℬ} [\int_{B_{R}} φ d ν - \frac{1}{N} log E (e^{N \int_{B_{R}} φ d {\hat{μ}}_{R}^{N}})]) \end{matrix}

\begin{matrix} = & exp (- N {inf}_{ν \in ℬ} [\int_{B_{R}} φ d ν - \frac{1}{N} log E (e^{\sum_{k = 1}^{N} φ (X_{R}^{k})})]) \end{matrix}

\begin{matrix} = & exp (- N {inf}_{ν \in ℬ} [\int_{B_{R}} φ d ν - log \int_{B_{R}} e^{φ} d μ_{R}]) . \end{matrix}

φ

is arbitrary, we can pass to the supremum and find

P [{\hat{μ}}_{R}^{N} \in ℬ] \leq exp (- N {sup}_{φ \in C_{b} (R^{d})} {inf}_{ν \in ℬ} [\int_{B_{R}} φ d ν - log \int_{B_{R}} e^{φ} d μ_{R}]) .

Now we note that the quantity

\int φ d ν - log \int e^{φ} d μ_{R}

is linear in

ν

and convex lower semi-continuous (with respect to the topology of uniform convergence) in

φ

; if we further assume that

ℬ

is convex and compact, then (for instance) Sion's min-max theorem [26,Theorem4.2'] ensures that

{sup}_{φ \in C_{b} (R^{d})} {inf}_{ν \in ℬ} [\int_{B_{R}} φ d ν - log \int e^{φ} d μ_{R}] = {inf}_{ν \in ℬ} {sup}_{φ \in C_{b} (R^{d})} [\int_{B_{R}} φ d ν - log \int e^{φ} d μ_{R}] .

By the dual formulation of the

H

functional [12,Lemma6.2.13] , we conclude that

\begin{matrix} P [{\hat{μ}}_{R}^{N} \in ℬ] \leq exp (- N {inf}_{ν \in ℬ} H (ν | μ_{R})) . \end{matrix}

(2.10)

Now, let

δ > 0

and let

A

be a measurable subset of

P (B_{R})

. We cover the latter with

N^{A}

balls

(B_{i})_{1 \leq i \leq N^{A}}

with radius

δ / 2

W_{p}

metric. Each of these balls is convex and compact, and it is included in the

δ

-thickening of

A

W_{p}

metric, defined as

A_{δ} : = {ν \in P (B_{R}); \exists ν_{a} \in A, W_{p} (ν, ν_{a}) \leq δ} .

So, by 2.10 we get

\begin{matrix} P [{\hat{μ}}_{R}^{N} \in A] & \leq & P [{\hat{μ}}_{R}^{N} \in^{N^{A}} ⋃_{i = 1} B_{i}] \end{matrix}

\begin{matrix} \leq & \sum_{i = 1}^{N^{A}} P ({\hat{μ}}_{R}^{N} \in B_{i}) \end{matrix}

\begin{matrix} \leq & \sum_{i = 1}^{N^{A}} exp (- N {inf}_{ν \in B_{i}} H (ν | μ_{R})) \end{matrix}

\begin{matrix} \leq & N^{A} exp (- N {inf}_{ν \in A_{δ}} H (ν | μ_{R})) . \end{matrix}

(2.11)

We now apply this estimate with

A : = {ν \in P (B_{R}); W_{p} (ν, μ_{R}) \geq η ɛ - 2 E_{α}^{1 / p} R e^{- \frac{α}{p} R^{2}}} .

From 2.9 we have, for any

ν \in A_{δ}

H (ν | μ_{R}) \geq \frac{λ_{1}}{2} W_{p} (ν, μ_{R})^{2} - K R^{2} e^{- α R^{2}} \geq \frac{λ_{1}}{2} ρ^{2} - K R^{2} e^{- α R^{2}},

where

ρ : = max (η ɛ - 2 E_{α}^{1 / p} R e^{- \frac{α}{p} R^{2}} - δ, 0) .

Combining this with 2.11 , we conclude that

\begin{matrix} P [W_{p} (μ_{R}, {\hat{μ}}_{R}^{N}) \geq η ɛ - 2 E_{α}^{1 / p} R e^{- \frac{α}{p} R^{2}}] \leq N^{A} exp (- N [\frac{λ_{1}}{2} ρ^{2} - K R^{2} e^{- α R^{2}}]) . \end{matrix}

(2.12)

Now, given any

λ_{2} < λ_{1}

, it follows from 2.8 that there exist

δ_{1}

η_{1}

and

K_{1}

, depending on

α, λ_{1}, λ_{2}

, such that

\begin{matrix} \frac{λ_{1}}{2} ρ^{2} - K R^{2} e^{- α R^{2}} \geq \frac{λ_{2}}{2} ɛ^{2} - K_{1} R^{2} e^{- α R^{2}} \end{matrix}

(2.13)

where

δ : = δ_{1} ɛ

and

η : = η_{1}

Though this inequality holds independently of

p

, we shall use it only in the case when

p < 2

. In the case

p = 2

, on the other hand, we note that for any

η \in (0, 1)

\begin{matrix} \frac{λ_{1}}{2} ρ^{2} - K R^{2} e^{- α R^{2}} \geq \frac{λ_{2}}{2} η^{2} ɛ^{2} - K_{1} R^{2} e^{- α R^{2}} \end{matrix}

(2.14)

where

δ : = δ_{1} ɛ

. Finally, we bound

N^{A}

by means of Theorem A.1 in Appendix A : there exists some constant

C

(only depending on

d

) such that for all

R > 0

and

δ > 0

the set

P (B_{R})

can be covered by

{(C \frac{R}{δ} \lor 1)}^{{(C \frac{R}{δ})}^{d}}

balls of radius

δ

W_{p}

metric, where

a \lor b

stands for

max (a, b)

. In particular, given

δ = δ_{1} ɛ

, we can choose

\begin{matrix} N^{A} \leq {(K_{2} \frac{R}{ɛ} \lor 1)}^{{(K_{2} \frac{R}{ɛ})}^{d}} \end{matrix}

(2.15)

balls of radius

δ

, for some constant

K_{2}

depending on

λ_{1}

and

λ_{2}

(via

δ_{1}

) but neither on

ɛ

nor on

R

. (The purpose of the 1 in

(K_{2} R / ɛ \lor 1)

is to make sure that the estimate is also valid when

ɛ > R

.) Combining 2.4 , 2.12 , 2.13 and 2.15 , we find that, given

p \in [1, 2)

λ_{2} < λ

and

α_{1} < α < \frac{λ}{2}

, there exist some constants

K_{1}

K_{2}

K_{3}

and

R_{1}

such that for all

ɛ, ζ > 0

and

R \geq R_{1} max (1, ζ^{\frac{1}{2 - p}})

\begin{matrix} P [W_{p} (μ, {\hat{μ}}^{N}) > ɛ] \leq {(K_{2} \frac{R}{ɛ} \lor 1)}^{K_{2} {(\frac{R}{ɛ})}^{d}} exp (- N [\frac{λ_{2} ɛ^{2}}{2} - K_{1} R^{2} e^{- α R^{2}}]) + exp (- N (K_{3} ζ ɛ^{p} - K_{4} e^{(α_{1} - α) R^{2}})) \end{matrix}

(2.16)

for some constant

K_{4} = K_{4} (θ, α_{1})

. In the case when

p = 2

, we obtain similarly

\begin{matrix} P [W_{2} (μ, {\hat{μ}}^{N}) > ɛ] \leq {(K_{2} \frac{R}{ɛ} \lor 1)}^{K_{2} {(\frac{R}{ɛ})}^{d}} exp (- N [\frac{λ_{2}}{2} η^{2} ɛ^{2} - K_{1} R^{2} e^{- α R^{2}}]) + exp (- N (\frac{α_{1}}{2} (1 - η)^{2} ɛ^{2} - K_{4} e^{(α_{1} - α) R^{2}})) \end{matrix}

(2.17)

for any

η \in (0, 1)

and

R \geq R_{1}

These estimates are not really appealing (!), but they are rather precise and general. In the rest of the section we shall show that an adequate choice of

R

leads to a simplified expression.

Step 3: Choice of the parameters.

We first consider the case when

p \in [1, 2)

. Let

λ^{'} < λ_{2}

α^{'} < α

and

d_{1} > d

. We claim that

P [W_{p} (μ, {\hat{μ}}^{N}) > ɛ] \leq exp (- \frac{λ^{'}}{2} N ɛ^{2}) + exp (- α^{'} N ɛ^{2})

as soon as

\begin{matrix} R^{2} \geq R_{2} max (1, ɛ^{2}, log (\frac{1}{ɛ^{2}})), N ɛ^{d_{1} + 2} \geq K_{5} R^{d_{1}} \end{matrix}

(2.18)

for some constants

R_{2}

and

K_{5}

depending on

μ

only through

λ, α

and

E_{α}

Indeed, on one hand

K_{2} {(\frac{R}{ɛ})}^{d} log (K_{2} \frac{R}{ɛ}) \leq K_{6} {(\frac{R}{ɛ})}^{d_{1}}

for some constant

K_{6}

, on the other hand

K_{1} R^{2} e^{- α R^{2}} \leq e^{- α_{1} R^{2}}

for

R

large enough, and then

K_{6} {(\frac{R}{ɛ})}^{d_{1}} - N [\frac{λ_{2} ɛ^{2}}{2} - e^{- α_{1} R^{2}}] \leq - N \frac{λ^{'} ɛ^{2}}{2}

for

R^{2} / log (\frac{1}{ɛ^{2}})

and

N ɛ^{d_{1} + 2} / R^{d_{1}}

large enough; this is enough to bound the first term in the right-hand side of 2.16 if moreover

R / ɛ

is large enough.

Moreover, letting

α_{2} \in (α^{'}, α_{1})

, we can choose

ζ

in such a way that

K_{3} ζ = ɛ^{2 - p}

, so that

exp (- N (K_{3} ζ ɛ^{p} - K_{4} e^{(α_{1} - α) R^{2}})) = exp (- N (α_{2} ɛ^{2} - K_{4} e^{(α_{1} - α) R^{2}})),

which in the end can be bounded by

exp (- N α^{'} ɛ^{2})

R

and

R^{2} / log (\frac{1}{ɛ^{2}})

are large enough. With this one can get a bound on the right-hand side of 2.16 .

Now let us check that conditions 2.18 can indeed be fulfilled. Clearly, the first condition holds true for all

ɛ \in (0, 1)

and

R^{2} \geq R_{3} log (\frac{K_{6}}{ɛ^{2}})

, where

R_{3}

and

K_{6}

are positive constants.

Then, we can choose

R : = {(\frac{N}{K_{5}} ɛ^{d_{1} + 2})}^{1 / d_{1}}

so that the second condition holds as an equality. This choice is admissible as soon as

{(\frac{N}{K_{5}} ɛ^{d_{1} + 2})}^{2 / d_{1}} \geq R_{3} log (\frac{K_{5}}{ɛ^{2}})

and this, in turn, holds true as soon as

\begin{matrix} N \geq K_{7} ɛ^{- (d^{'} + 2)}, \end{matrix}

(2.19)

where

d^{'}

is such that

d^{'} > d

, and

K_{7}

is large enough.

ɛ \geq 1

, then we can choose

R^{2} = R_{2} ɛ^{2}

, i.e.

R = \sqrt{R_{2}} ɛ

, and then the second inequality in 2.18 will be true as soon as

N

is large enough.

To sum up: Given

d^{'} > d

λ^{'} < λ

and

α^{'} < α

, there exists some constant

N_{0}

, depending on

d^{'}

and depending on

μ

only through

λ, α

and

E_{α}

, such that for all

ɛ > 0

P [W_{p} (μ, {\hat{μ}}^{N}) > ɛ] \leq exp (- \frac{λ^{'}}{2} N ɛ^{2}) + exp (- α^{'} N ɛ^{2})

as soon as

N \geq N_{0} max (ɛ^{- (d^{'} + 2)}, 1)

. Then we note that, given

K < min (\frac{λ^{'}}{2}, α^{'})

, the inequality

exp (- \frac{λ^{'}}{2} N ɛ^{2}) + exp (- α^{'} N ɛ^{2}) \leq exp (- K N ɛ^{2})

holds if condition 2.19 is satisfied for some

K_{7}

large enough. To conclude the proof of Theorem 1.1 in the case when

p \in [1, 2)

, it is sufficient to choose

λ^{'} < λ

α < λ / 2

Now, in the case when

p = 2

, given

λ_{3} < λ_{2}

and

α_{2} < α_{1}

, conditions 2.18 imply

P [W_{2} (μ, {\hat{μ}}^{N}) > ɛ] \leq exp (- \frac{λ_{3}}{2} η^{2} N ɛ^{2}) + exp (- \frac{α_{2}}{2} (1 - η)^{2} N ɛ^{2}) .

Then we let

α_{2} : = \frac{λ_{3}}{2}

and

η : = \sqrt{2} - 1

, so that

\frac{λ_{3}}{2} η^{2} = \frac{α_{2}}{2} (1 - η)^{2} .

Then

P [W_{2} (μ, {\hat{μ}}^{N}) > ɛ] \leq 2 exp (- (3 - 2 \sqrt{2}) \frac{λ_{3}}{2} N ɛ^{2});

for

λ^{'} < λ

, the above quantity is bounded by

exp (- (3 - 2 \sqrt{2}) \frac{λ^{'}}{2} N ɛ^{2})

as soon as 2.19 is enforced with

K_{7}

large enough. This concludes the argument.

2.2 Proof of Theorem 1.5

It is very similar to the proof of Theorem 1.1 , so we shall only explain where the differences lie. Obviously, the main difficulty will consist in the control of tails.

We first let

p \in [1, q)

α \in [1, \frac{q}{p})

and

R > 0

, and introduce

M_{q} : = \int_{R^{d}} | x |^{q} d μ (x) .

Then 2.1 may be replaced by

\begin{matrix} W_{p}^{p} (μ, μ_{R}) \leq 2^{p} M_{q} R^{p - q}, \end{matrix}

(2.20)

and 2.2 by

\begin{matrix} P [W_{p} ({\hat{μ}}_{R}^{N}, {\hat{μ}}^{N}) > ɛ] \leq C N^{\bar{α} - α} \frac{R^{α p - q}}{(ɛ^{p} - C R^{p - q})^{α}} \end{matrix}

(2.21)

for some constant

C

depending on

α

and

M_{q}

Let us establish for instance 2.21 . Introduce

Z_{k} = | Y_{k} - X_{k} |^{p} 1_{| X_{k} | > R} (1 \leq k \leq N) .

By Chebychev's inequality,

P [W_{p} ({\hat{μ}}_{R}^{N}, {\hat{μ}}^{N}) > ɛ] \leq P [\frac{1}{N} \sum_{k = 1}^{N} Z_{k} > ɛ^{p}] = P [\frac{1}{N} \sum_{k = 1}^{N} (Z_{k} - E Z_{k}) > ɛ^{p} - E Z_{1}] \leq \frac{E {| \sum_{k = 1}^{N} (Z_{k} - E Z_{k}) |}^{α}}{(N (ɛ^{p} - E Z_{1}))^{α}}

provided that

ɛ^{p} > E Z_{1}

. But, since the random variables

(Z_{k} - E Z_{k})_{k}

are independent and identically distributed, with zero mean, there exists some constant

C

depending on

α

such that

E {| \sum_{k = 1}^{N} (Z_{k} - E Z_{k}) |}^{α} \leq C N^{\bar{α}} E | Z_{1} - E Z_{1} |^{α}

where

\bar{α} : = max (α / 2, 1)

. This inequality is a consequence of Rosenthal's inequality in the case when

α \geq 2

, but also holds true if

α \in [1, 2)

(see for instance [24,pp. 62and 82] ).

Then, on one hand,

E Z_{1} = E | Y_{1} - X_{1} |^{p} 1_{| X_{1} | > R} \leq 2^{p} M_{q} R^{p - q},

while on the other hand,

E | Z_{1} - E Z_{1} |^{α} = E {| | Y_{1} - X_{1} |^{p} 1_{| X_{1} | > R} - E | Y_{1} - X_{1} |^{p} 1_{| X_{1} | > R} |}^{α} \leq C E | Y_{1} - X_{1} |^{α p} 1_{| X_{1} | > R} \leq C M_{q} R^{α p - q}

with

C

standing for various constants. Collecting these two estimates, we conclude to the validity of 2.21 for

R^{q - p} ɛ^{p}

large enough.

Then 2.20 and 2.21 together ensure that

\begin{matrix} P [W_{p} (μ, {\hat{μ}}^{N}) > ɛ] \leq P [W_{p} (μ_{R}, {\hat{μ}}_{R}^{N}) > η ɛ - 2 M_{q}^{1 / p} R^{1 - q / p}] + C N^{\bar{α} - α} \frac{R^{α p - q}}{((1 - η)^{p} ɛ^{p} - C R^{p - q})^{α}} \end{matrix}

(2.22)

for any

ɛ \in (0, 1)

η > 0

and

R^{q - p} ɛ^{p} (1 - η)^{p}

large enough.

Since

μ_{R}

is supported in

B_{R}

, the Csiszár-Kullback-Pinsker inequality and Kantorovich-Rubinstein formulation of the

W_{1}

distance together ensure that it satisfies a

T_{1} (R^{- 2})

inequality (see e.g. [8,ParticularCase 5] with

p = 1

). This estimate also extends to any

W_{p}

distance, not as a penalized

T_{p}

inequality as in 2.9 , but rather as

\begin{matrix} W_{p}^{2 p} (ν, μ_{R}) \leq 2^{2 p - 1} R^{2 p} H (ν | μ_{R}) \end{matrix}

(2.23)

(see again [8,ParticularCase5] ).

From 2.22 and 2.23 we deduce (as in 2.17 ) that

\begin{matrix} P [W_{p} ({\hat{μ}}^{N}, μ) > ɛ] \leq {(K_{1} \frac{R}{δ})}^{K_{1} {(\frac{R}{δ})}^{d}} exp (- \frac{N ρ^{2 p}}{2^{2 p - 1} R^{2 p}}) + C N^{\bar{α} - α} \frac{R^{α p - q}}{((1 - η)^{p} ɛ^{p} - C R^{p - q})^{α}} \end{matrix}

(2.24)

for any

δ

, where now

ρ : = {(η ɛ - 2 M^{1 / p} R^{1 - q / p} - δ)}^{+} .

Letting

η_{1} < η

and

d^{'} > d

, and choosing

δ = δ_{0} ɛ

, we deduce

P [(W_{p} ({\hat{μ}}^{N}, μ) > ɛ] \leq exp ({(\frac{R}{ɛ})}^{d^{'}} - \frac{η_{1}^{2 p}}{2^{2 p - 1}} \frac{N ɛ^{2 p}}{R^{2 p}} + \frac{K_{1}}{2^{2 p - 1}} \frac{N}{R^{2 q}}) + C N^{\bar{α} - α} \frac{R^{α p - q}}{((1 - η_{1})^{p} ɛ^{p} - C R^{p - q})^{α}}

for

R^{q - p} ɛ^{p} (1 - η_{1})^{p}

large enough, and then

\begin{matrix} P [W_{p} ({\hat{μ}}^{N}, μ) > ɛ] \leq exp (- \frac{η_{2}^{2 p}}{2^{2 p - 1}} \frac{N ɛ^{2 p}}{R^{2 p}}) + C N^{\bar{α} - α} \frac{R^{α p - q}}{(1 - η_{2})^{α p} ɛ^{α p}} \end{matrix}

(2.25)

for

η_{2} < η_{1}

, provided that the conditions

\begin{matrix} R \geq R_{1} ɛ^{- \frac{p}{q - p}}, N \geq K_{2} {(\frac{R}{ɛ})}^{2 p + d^{'}} \end{matrix}

(2.26)

hold for some

R_{1}

and

K_{2}

Given any choice of

R

as a product of powers of

N

and

ɛ

, the first term in the right-hand side of 2.25 will always be smaller than the second one, if

N

goes to infinity while

ɛ

is kept fixed; thus we can choose

R

minimizing the second term under the above conditions. Then the second condition in 2.26 will be fulfilled as an equality:

R = K_{3} ɛ N^{\frac{1}{2 p + d^{'}}} .

As for the first condition in 2.26 , it can be rewritten as

N \geq N_{0} ɛ^{- q \frac{2 p + d^{'}}{q - p}},

and then, by 2.25 ,

P [W_{p} ({\hat{μ}}^{N}, μ) > ɛ] \leq exp (- K_{5} N^{\frac{d^{'}}{2 p + d^{'}}}) + K_{6} ɛ^{- q} N^{\bar{α} - α + \frac{α p - q}{2 p + d^{'}}} .

Hence

\begin{matrix} P [W_{p} ({\hat{μ}}^{N}, μ) > ɛ] \leq ɛ^{- q} N^{\bar{α} - α} \end{matrix}

(2.27)

for all

ɛ \in (0, 1)

and

N

larger than some constant and, given

d^{'} > d

, for all

ɛ \geq 1

and

N \geq M ɛ^{d^{'} - d}

where

M

is large enough.

In the first case when

p \geq q / 2

, any admissible

α

belongs to

[1, q / p) \subset [1, 2]

, so

\bar{α} = 1

. If

δ \in (0, q / p - 1)

, we get from 2.27 , with

α = q / p - δ

, that

P [W_{p} ({\hat{μ}}^{N}, μ) > ɛ] \leq ɛ^{- q} N^{1 - q / p + δ}

for all

ɛ > 0

and

N \geq N_{0} max (ɛ^{- q \frac{2 p + d^{'}}{q - p}}, ɛ^{d^{'} - d}) .

In the second case when

p < q / 2

, we only consider admissible

α

's in

[2, q / p) \subset [1, q / p)

, so that

\bar{α} - α = - α / 2

. Choosing

δ \in (0, q / p - 2)

, we get from 2.27

P [W_{p} ({\hat{μ}}^{N}, μ) > ɛ] \leq ɛ^{- q} N^{- q / 2 p + δ / 2}

under the same conditions on

N

as before. This concludes the argument.

2.3 Proof of Theorem 1.6

It is again based on the same principles as the proofs of Theorems 1.1 and 1.5 , with the help of functional inequalities investigated in [8] and [11] . We skip the argument, which the reader can easily reconstruct by following the same lines as above.

2.4 Data reconstruction estimates

Finally, we show how the above concentration estimates imply data reconstruction estimates. This is a rather general estimate, which is treated here along the lines of [25,Section5] and [29,Problem 10] .

Proposition 2.1. Let

μ

be a probability measure on

R^{d}

, with density

f

with respect to Lebesgue measure. Let

X_{1}, \dots, X_{N}

be random points in

R^{d}

, and let

ζ

be a Lipschitz, nonnegative kernel with unit integral. Define the random measure

\hat{μ}

and the random function

{\hat{f}}_{ζ, α}

\hat{μ} : = \frac{1}{N} \sum_{i = 1}^{N} δ_{X_{i}}, {\hat{f}}_{ζ, α} (x) : = \frac{1}{N} \sum_{i = 1}^{N} ζ_{α} (x - X_{i}), ζ_{α} (x) = \frac{1}{α^{d}} ζ (\frac{x}{α}) .

Then,

\begin{matrix} {sup}_{x \in R^{d}} | {\hat{f}}_{ζ, α} (x) - f (x) | \leq \frac{∥ ζ ∥_{L i p}}{α^{d + 1}} W_{1} (\hat{μ}, μ) + δ (α), \end{matrix}

(2.28)

where

δ

stands for the modulus of continuity of

f

, defined as

δ (ɛ) : = {sup}_{| x - y | \leq ɛ} | f (x) - f (y) | .

As a consequence, if

f

is Lipschitz, then there exist some constants

a, K > 0

, only depending on

d

∥ f ∥_{L i p}

and

∥ ζ ∥_{L i p}

, such that

\begin{matrix} P [∥ {\hat{f}}_{ζ, a ɛ} - f ∥_{L^{\infty}} > ɛ] \leq P [W_{1} (\hat{μ}, μ) > K ɛ^{d + 2}] \end{matrix}

(2.29)

for all

ɛ > 0

Proof. First,

\begin{matrix} | μ * ζ_{α} (x) - f (x) | = & | \int_{R^{d}} ζ_{α} (x - y) (f (y) - f (x)) d y | \end{matrix}

\begin{matrix} \leq \int_{R^{d}} ζ_{α} (x - y) | f (y) - f (x) | d y . \end{matrix}

\begin{matrix}  \end{matrix}

Since

ζ_{α} (x - y)

is supported in

{| x - y | \leq α}

, and

ζ_{α}

is a probability density, we deduce

\begin{matrix} | μ * ζ_{α} (x) - f (x) | \leq δ (α) . \end{matrix}

(2.30)

Now, if

x

is some point in

R^{d}

, then, thanks to the Kantorovich-Rubinstein dual formulation 1.2 ,

\begin{matrix} | {\hat{f}}_{ζ, α} - μ * ζ_{α} | (x) & = | \int_{R^{d}} ζ_{α} (x - y) d [\hat{μ} - μ] (y) | \end{matrix}

\begin{matrix} \leq ∥ ζ_{α} (x - \cdot) ∥_{L i p} W_{1} (\hat{μ}, μ) \end{matrix}

\begin{matrix} = \frac{∥ ζ ∥_{L i p}}{α^{d + 1}} W_{1} (\hat{μ}, μ) . \end{matrix}

\begin{matrix}  \end{matrix}

To conclude the proof of 2.28 , it suffices to combine this bound with 2.30 .

Now, let

L : = max (∥ f ∥_{L i p}, ∥ ζ ∥_{L i p})

, and

α : = ɛ / (2 L)

. The bound 2.28 turns into

∥ {\hat{f}}_{ζ, α} - f ∥_{L^{\infty}} \leq L (\frac{W_{1} (\hat{μ}, μ)}{α^{d + 1}} + α) \leq (\frac{(2 L)^{d + 1} L}{ɛ^{d + 1}}) W_{1} (\hat{μ}, μ) + \frac{ɛ}{2} .

In particular,

P [∥ {\hat{f}}_{ζ, α} - f ∥_{L^{\infty}} > ɛ] \leq P [W_{1} (\hat{μ}, μ) > \frac{ɛ^{d + 2}}{(2 L)^{d + 2}}],

which is estimate 2.29 . □

Remark 2.2. Estimate 2.29 , combined with Theorem 1.1 or Theorem 1.5 , yields simple quantitative (non-asymptotic) deviation inequalities for empirical distribution functions in supremum norm. We refer to Gao [15] for a recent study of deviation inequalities for empirical distribution functions, both in moderate and large deviations regimes.

3 PDE estimates

Now we start the study of our model system for interacting particles. The first step towards our proof of Theorem 1.7 consists in deriving suitable a priori estimates on the solution to the nonlinear limit partial differential equation 1.8 . In this section, we recall some estimates which have already been established by various authors, and derive some new ones. All estimates will be effective.

3.1 Notation

In the sequel,

μ_{0}

is a probability measure, taken as an initial datum for equation 1.8 , and various regularity assumptions will later be made on

μ_{0}

. Assumptions 1.9 will always be made on

V

and

W

, even if they are not recalled explicitly; we shall only mention additional regularity assumptions, when used in our estimates. Moreover, we shall write

\begin{matrix} Γ : = max (| γ |, | γ^{'} |) . \end{matrix}

(3.1)

The notation

μ_{t}

will always stand for the solution (unique under our assumptions) of 1.8 .

We also write

e (t) : = \int_{R^{d}} | x |^{2} d μ_{t} (x)

for the (kinetic) energy associated with

μ_{t}

, and

M_{α} (t) : = \int_{R^{d}} e^{α | x |^{2}} d μ_{t} (x)

for the square exponential moment of order

α

The scalar product between two vectors

v, w \in R^{d}

will be denoted by

v \cdot w

. The symbols

C

and

K

will often be used to denote various positive constants; in general what will matter is an upper bound on constants denoted

C

, and a lower bound on constants denoted

K

The space

C^{k}

is the space of

k

times differentiable continuous functions.

3.2 Decay at infinity

In this subsection, we prove the propagation of strong decay estimates at infinity:

Proposition 3.1. With the conventions of Subsection 3.1 , let

\bar{η}

- γ

γ < 0

, and an arbitrary negative number otherwise. Let

a : = 2 (β + \bar{η}), \bar{G} : = 2 d + \frac{| \nabla V (0) |^{2}}{2 | \bar{η} |} .

Then (i)

e (t) \leq e^{- a t} [e (0) + \bar{G} \frac{e^{a t} - 1}{a}]

; (ii) For any

α_{0} > 0

there is a continuous positive function

α (t)

such that

α (0) = α_{0}

and

\begin{matrix} M_{α_{0}} (0) < + \infty ⟹ M_{α (t)} (t) < + \infty . \end{matrix}

(3.2)

(iii) Moreover, in the “uniformly convex case” when

β > 0

and

β + γ > 0

, then there is

α > 0

such that

{sup}_{t \geq 0} e (t) < + \infty, {sup}_{t \geq 0} M_{α} (t) < + \infty .

Corollary 3.2. If

μ_{0}

admits a finite square exponential moment, then

μ_{t}

satisfies

T_{1} (λ_{t})

, for some function

λ_{t} > 0

, bounded below on any interval

[0, T]

(

T < \infty

Proof. We start with (i). For simplicity we shall pretend that

μ_{t}

is a smoothly differentiable function of

t

, with rapid decay, so that all computations based on integrating equation 1.8 against

| x |^{2}

are justified. These assumptions are not a priori satisfied, but the resulting bounds can easily be rigorously justified with standard but tedious approximation arguments.

With that in mind, we compute

e^{'} (t) = 2 d - 2 \int_{R^{d}} (x \cdot \nabla V (x) + x \cdot \nabla W * μ_{t} (x)) d μ_{t} (x)

with

- 2 \int_{R^{d}} x \cdot \nabla V (x) d μ_{t} (x) \leq - 2 β \int_{R^{d}} | x |^{2} d μ_{t} (x) - 2 \nabla V (0) \cdot \int_{R^{d}} x d μ_{t} (x) .

Since

\nabla W

is an odd function, we have

\begin{matrix} - 2 \int_{R^{d}} x \cdot \nabla W * μ_{t} (x) d μ_{t} (x) & = & - 2 \int \int x \cdot \nabla W (x - y) d μ_{t} (y) d μ_{t} (x) \end{matrix}

\begin{matrix} = & - \int \int (x - y) \cdot \nabla W (x - y) d μ_{t} (y) d μ_{t} (x) \end{matrix}

\begin{matrix} \leq & - γ \int \int | x - y |^{2} d μ_{t} (y) d μ_{t} (x) \end{matrix}

\begin{matrix} = & - 2 γ [\int | x |^{2} d μ_{t} (x) - {| \int x d μ_{t} (x) |}^{2}] . \end{matrix}

γ < 0

, then

\begin{matrix} e^{'} (t) & \leq & 2 d - 2 (γ + β) e (t) + 2 γ {| \int x d μ_{t} (x) + \frac{\nabla V (0)}{2 | γ |} |}^{2} + \frac{| \nabla V (0) |^{2}}{2 | γ |} \end{matrix}

\begin{matrix} \leq & 2 d - 2 (γ + β) e (t) + \frac{| \nabla V (0) |^{2}}{2 | γ |}, \end{matrix}

and if

γ \geq 0

, then for any

\bar{η} < 0

\begin{matrix} e^{'} (t) & \leq & 2 d - 2 (\bar{η} + β) e (t) - 2 γ (\int | x |^{2} d μ_{t} (x) - {| \int x d μ_{t} (x) |}^{2}) + \frac{| \nabla V (0) |^{2}}{2 | \bar{η} |} \end{matrix}

\begin{matrix} \leq & 2 d - 2 (\bar{η} + β) e (t) + \frac{| \nabla V (0) |^{2}}{2 | \bar{η} |} \cdot \end{matrix}

This leads to

e^{'} (t) \leq \bar{G} - a e (t),

and the conclusion follows easily by Gronwall's lemma.

We now turn to (ii). Let

α

be some arbitrary nonnegative

C^{1}

function on

R_{+}

. By using the equation 1.8 , we compute

\frac{d}{d t} \int e^{α (t) | x |^{2}} d μ_{t} (x) = \int [2 d α + 4 α^{2} | x |^{2} - 2 α x \cdot \nabla V (x) - 2 α x \cdot \nabla W * μ_{t} (x) + α^{'} (t) | x |^{2}] e^{α | x |^{2}} d μ_{t} (x) .

Since

D^{2} V (x) \geq β I

for all

x \in R^{d}

, we can write

\begin{matrix} - x \cdot \nabla V (x) \leq - x \cdot \nabla V (0) - β | x |^{2} \leq - β | x |^{2} + | \nabla V (0) | | x | \leq (δ - β) | x |^{2} + \frac{C}{4 δ} \end{matrix}

(3.3)

for any

δ > 0

and

x \in R^{d}

Next, our assumptions on

W

imply

\nabla W (0) = 0

, and

γ I \leq D^{2} W (x) \leq γ^{'} I

, so

x \cdot \nabla W (x) \geq γ | x |^{2} and | x \cdot D^{2} W (z) y | \leq Γ | x | | y |

for all

x, y, z \in R^{d}

, with

Γ

defined by 3.1 . Hence, by Taylor's formula,

\begin{matrix} - x \cdot \nabla W * μ_{t} (x) & = & - \int_{R^{d}} x \cdot \nabla W (x - y) d μ_{t} (y) \end{matrix}

\begin{matrix} = & - x \cdot \nabla W (x) + \int_{R^{d}} \int_{0}^{1} x \cdot D^{2} W (x - s y) y d μ_{t} (y) d s \end{matrix}

\begin{matrix} \leq & - γ | x |^{2} + Γ | x | \int_{R^{d}} | y | d μ_{t} (y) \end{matrix}

\begin{matrix} \leq & (- γ + Γ η) | x |^{2} + \frac{Γ}{4 η} e (t), \end{matrix}

(3.4)

where

η

is any positive number.

From 3.3 and 3.4 we obtain

\begin{matrix} \frac{d}{d t} (M_{α (t)} (t)) \leq \int_{R^{d}} [A (t) + B (t) | x |^{2}] e^{α (t) | x |^{2}} d μ_{t} (x) \end{matrix}

(3.5)

where

A (t) = C α (t) (1 + e (t)), B (t) = α^{'} (t) + 4 α (t)^{2} + b α (t),

and

C

is a finite constant, while

b = - 2 (γ + β - δ - Γ η)

We now choose

α (t)

in such a way that

B (t) \equiv 0

, i.e.

α^{'} (t) + 4 α^{2} (t) + b α (t) = 0, α (0) = α_{0} .

This integrates to

α (t) = e^{- b t} {(\frac{1}{α_{0}} + 4 \frac{1 - e^{- b t}}{b})}^{- 1} (= {(\frac{1}{α_{0}} + 4 t)}^{- 1} if b = 0) .

Obviously

α

is a continuous positive function, and our estimates imply

\frac{d}{d t} (M_{α (t)} (t)) \leq A (t) M_{α (t)} (t) .

We conclude by using Gronwall's lemma that

M_{α (t)} (t) \leq exp (\int_{0}^{t} A (s) d s) M_{α_{0}} (0) .

Next, the estimate (iii) for

e (t)

is an easy consequence of our explicit estimates when

β > 0, β + γ > 0

(in the case when

γ \geq 0

and

β > 0

, we choose

\bar{η} \in (0, β)

As for the estimate about

M_{α} (t)

, it will result from a slightly more precise computation.

From 3.5 , we have

\begin{matrix} \frac{d}{d t} \int_{R^{d}} e^{α | x |^{2}} d μ_{t} (x) \leq \int_{R^{d}} [A (t) + B | x |^{2}] e^{α | x |^{2}} d μ_{t} (x) \end{matrix}

(3.6)

where

A

is bounded on

R_{+}

by some constant

a

, and

B = 2 α [2 α - (β + γ - δ - Γ η)] .

Since

β + γ > 0

, for any fixed

α

(0, \frac{β + γ}{2})

we can choose

δ, η > 0

such that

B < 0

Letting

R^{2} = - a / B

and

G = - B > 0

, equation 3.6 becomes

\begin{matrix} \frac{d}{d t} \int_{R^{d}} e^{α | x |^{2}} d μ_{t} (x) \leq G \int_{R^{d}} (R^{2} - | x |^{2}) e^{α | x |^{2}} d μ_{t} (x) . \end{matrix}

(3.7)

Let

p > 1

. The formula

\begin{matrix} \int_{| x | > p R} (R^{2} - | x |^{2}) e^{α | x |^{2}} d μ_{t} (x) & \leq & R^{2} (1 - p^{2}) \int_{| x | > p R} e^{α | x |^{2}} d μ_{t} (x) \end{matrix}

\begin{matrix} = & R^{2} (1 - p^{2}) [\int_{R^{d}} e^{α | x |^{2}} d μ_{t} (x) - \int_{| x | \leq p R} e^{α | x |^{2}} d μ_{t} (x)] \end{matrix}

leads to

\int_{R^{d}} (R^{2} - | x |^{2}) e^{α | x |^{2}} d μ_{t} (x) \leq \int_{| x | \leq p R} (R^{2} p^{2} - | x |^{2}) e^{α | x |^{2}} d μ_{t} (x) + R^{2} (1 - p^{2}) M_{α} .

by decomposing the integral on the sets

{| x | \leq p R}

and

{| x | > p R}

. From 3.7 we deduce

(M_{α})^{'} (t) + ω_{1} M_{α} (t) \leq ω_{2}

where

ω_{1}

and

ω_{2}

are positive constants. It follows that

M_{α} (t)

remains bounded on

R_{+}

M_{α} (0) < + \infty

, and this concludes the argument. □

3.3 Time-regularity

Now we study the time-regularity of

μ_{t}

Proposition 3.3. With the conventions of Subsection 3.1 , for any

T < + \infty

there exists a constant

C (T)

such that

\begin{matrix} \forall s, t \in [0, T], W_{1} (μ_{t}, μ_{s}) \leq C (T) | t - s |^{1 / 2} . \end{matrix}

(3.8)

Remark 3.4. The exponent

1 / 2

is natural in small time if no regularity assumption is made on

μ_{0}

; it can be improved if

t, s

are assumed to be bounded below by some

t_{0} > 0

. Also, in view of the results of convergence to equilibrium recalled later on, the constant

C (T)

might be chosen independent of

T

β > 0, β + 2 γ > 0

Remark 3.5. A stochastic proof of 3.8 is possible, via the study of continuity estimates for

Y_{t}

, which in any case will be useful later on. But here we prefer to present an analytical proof, to stress the fact that estimates in this section are purely analytical statements.

Proof. Let

L

be the linear operator

- Δ - \nabla \cdot (\cdot \nabla V + \nabla (W * μ_{t}))

, and let

e^{- t L}

be the associated semigroup: from our assumptions and estimates it follows that it is well-defined, at least for initial data which admit a finite square exponential moment. Of course

μ_{t} = e^{- t L} μ_{0}

. It follows that

\begin{matrix} W_{2} (μ_{s}, μ_{t}) = W_{2} (μ_{s}, e^{- (t - s) L} μ_{s}) & = W_{2} (\int_{R^{d}} δ_{y} d μ_{s} (y), \int_{R^{d}} e^{- (t - s) L} δ_{y} d μ_{s} (y)) \end{matrix}

\begin{matrix} \leq \int_{R^{d}} W_{2} (δ_{y}, e^{- (t - s) L} δ_{y}) d μ_{s} (y) . \end{matrix}

\begin{matrix}  \end{matrix}

Our goal is to bound this by

O (\sqrt{t - s})

. In view of Proposition 3.1 , it is sufficient to prove that for all

a > 0

W_{2}^{2} (δ_{y}, e^{- (t - s) L} δ_{y}) = O (t - s) O (e^{a | y |^{2}}) .

This estimate is rather easy, since the left-hand side is just the variance of the solution of a linear diffusion equation, starting with a Dirac mass at

y

as initial datum. Without loss of generality, we assume

s = 0

, and write

{\tilde{μ}}_{t} : = e^{- t L} δ_{y}

. For simplicity we write the computations in a sketchy way, but they are not hard to justify.

Since the initial datum is

δ_{y}

, its square exponential moment

{\tilde{M}}_{α}

of order

α

e^{α | y |^{2}}

With an argument similar to the proof of Proposition 3.1 (ii), one can show that

0 \leq t \leq T ⟹ \int e^{α | x |^{2}} d {\tilde{μ}}_{t} (x) \leq C (T) (1 + {\tilde{M}}_{α}) \leq C (T) e^{α | y |^{2}} .

Now, since

| \nabla V | (x) = O (e^{a | x |^{2}})

a < α

| \nabla W * μ_{t} |

grows at most polynomially, and

{\tilde{μ}}_{t}

admits a square exponential moment of order

α

, we easily obtain

\frac{d}{d t} \int x d {\tilde{μ}}_{t} = - \int \nabla (V + W * μ_{t}) d {\tilde{μ}}_{t} = \int O (e^{a | x |^{2}}) d {\tilde{μ}}_{t} = O (e^{α | y |^{2}});

\frac{d}{d t} \int \frac{| x |^{2}}{2} d {\tilde{μ}}_{t} = d - \int x \cdot \nabla (V + W * μ_{t}) d {\tilde{μ}}_{t} = O (e^{α | y |^{2}}) .

From these estimates we deduce that the time-derivative of the variance

V ({\tilde{μ}}_{t}) : = \int | x |^{2} d {\tilde{μ}}_{t} - (\int x d {\tilde{μ}}_{t})^{2}

is bounded by

O (e^{b | y |^{2}})

for any

b > 0

. Since

{\tilde{μ}}_{0}

has zero variance, it follows that the variance of

{\tilde{μ}}_{t}

O (t e^{b | y |^{2}})

, which was our goal. □

3.4 Regularity in phase space

Regularity estimates will be useful for Theorem 1.11 . Equation 1.8 is a (weakly nonlinear) parabolic equation, for which regularization effects can be studied by standard tools. Some limits to the strength of the regularization are imposed by the regularity of

V

. So as not to be bothered by these nonessential considerations, we shall assume strong regularity conditions on

V

here. Then in Appendix B we shall prove the following estimates:

Proposition 3.6. With the conventions of Subsection 3.1 , assume in addition that

V

has all its derivatives growing at most polynomially at infinity.

Then, for each

k \geq 0

and for all

t_{0} > 0

T > t_{0}

there is a finite constant

C (t_{0}, T)

, only depending on

t_{0}, T, k

and a square exponential moment of the initial measure

μ_{0}

, such that the density

f_{t}

μ_{t}

is of class

C^{k}

, with

{sup}_{t_{0} \leq t \leq T} ∥ f_{t} ∥_{C^{k}} \leq C (t_{0}, T) .

If moreover

β > 0

β + γ > 0

, then

C (t_{0}, T)

can be chosen to be independent of

T

for any fixed

t_{0}

Remark 3.7. For regular initial data and under some adequate assumptions on

V

and

W

, some regularity estimates on

f_{t} / f_{\infty}

, where

f_{\infty}

is the limit density in large time, are established in [10,Lemma 6.7] . These estimates allow a much more precise uniform decay, but are limited to just one derivative. Here there will be no need for them.

3.5 Asymptotic behavior

In the “uniformly convex” case when

β + γ > 0

, the measure

μ_{t}

converges to a definite limit

μ_{\infty}

t \to \infty

. This was investigated in [19, 9, 10] . The following statement is a simple variant of [9,Theorems 2.1and5.1] .

Proposition 3.8. With the conventions of Subsection 3.1 , assuming that

β > 0, β + 2 γ > 0

, there exists a probability measure

μ_{\infty}

such that

W_{2} (μ_{t}, μ_{\infty}) \leq C e^{- λ t}, λ > 0 .

Here the constants

C

and

λ

only depend on the initial datum

μ_{0}

4 The limit empirical measure

Consider the random time-dependent measure

\begin{matrix} {\hat{ν}}_{t}^{N} : = \frac{1}{N} \sum_{i = 1}^{N} δ_{Y_{t}^{i}}, \end{matrix}

(4.1)

where

(Y_{t}^{i})_{t \geq 0}

1 \leq i \leq N

, are

N

independent processes solving the same stochastic differential equation

d Y_{t}^{i} = \sqrt{2} d B_{t}^{i} - [\nabla (V + W * μ_{t})] (Y_{t}^{i}) d t,

and such that the law of

Y_{0}^{i}

μ_{0}

. As we already mentioned, for each

t

and

i

Y_{t}^{i}

is distributed according to the law

μ_{t}

. We call

{\hat{ν}}_{t}^{N}

the “limit empirical measure” because it is expected to be a rather accurate description, in some well-chosen sense, of the empirical measure

{\hat{μ}}_{t}^{N}

N \to \infty

Our estimates on

μ_{t}

, and the fact that

{\hat{ν}}_{t}^{N}

is the empirical measure for independent processes, are sufficient to imply good properties of concentration of

{\hat{ν}}_{t}^{N}

around its mean

μ_{t}

, as

N \to \infty

, for each

t

. But later on we shall use some estimates about the time-dependent measure (even to obtain a result of concentration for

{\hat{μ}}_{t}^{N}

with fixed

t

). To get such results, we shall study the time-regularity of

{\hat{ν}}_{t}^{N}

. Our final goal in this section is the following

Proposition 4.1. With the conventions of Subsection 3.1 , for any

T \geq 0

there are constants

C = C (T)

and

a = a (T) > 0

such that the limit empirical measure 4.1 satisfies

\forall Δ \in [0, T], \forall ɛ > 0, P [{sup}_{t_{0} \leq s, t \leq t_{0} + Δ} W_{1} ({\hat{ν}}_{s}^{N}, {\hat{ν}}_{t}^{N}) > ɛ] \leq exp (- N (a ɛ^{2} - C Δ)) .

To prove Proposition 4.1 , we shall use a bit of classical stochastic calculus tools.

4.1 SDE estimates

In this subsection we establish the following estimates of time regularity for the stochastic process

Y_{t}

: For all

T > 0

, there exist positive constants

a

and

C

such that, for all

s, t, t_{0}, Δ \in [0, T]

\begin{matrix} (i) & E | Y_{t} - Y_{s} |^{2} \leq C | t - s | \end{matrix}

\begin{matrix}  \end{matrix}

\begin{matrix} (ii) & E | Y_{t} - Y_{s} |^{4} \leq C | t - s |^{2} \end{matrix}

\begin{matrix}  \end{matrix}

\begin{matrix} (iii) & E [{sup}_{t_{0} \leq s \leq t \leq t_{0} + Δ} exp (a | Y_{t} - Y_{s} |^{2})] \leq 1 + C Δ . \end{matrix}

Proof. We start with (i). We use Itô's formula to write a stochastic equation on the process

(| Y_{t} - Y_{s} |^{2})_{t \geq s}

| Y_{t} - Y_{s} |^{2} = M_{s, t} + 2 d (t - s) - 2 \int_{s}^{t} (\nabla V (Y_{u}) + \nabla W * μ_{u} (Y_{u})) \cdot (Y_{u} - Y_{s}) d u,

where

M_{s, t}

, viewed as a process depending on

t

, is a martingale with zero expectation.

Hence

\begin{matrix} E | Y_{t} - Y_{s} |^{2} = 2 d (t - s) - 2 \int_{s}^{t} E (\nabla V (Y_{u}) + \nabla W * μ_{u} (Y_{u})) \cdot (Y_{u} - Y_{s}) d u . \end{matrix}

(4.2)

On one hand

\begin{matrix} E {| (\nabla V (Y_{u}) + \nabla W * μ_{u} (Y_{u})) \cdot (Y_{u} - Y_{s}) |}^{2} \leq 4 (E | \nabla V (Y_{u}) |^{2} + E | \nabla W * μ_{u} (Y_{u}) |^{2}) (E | Y_{u} |^{2} + E | Y_{s} |^{2}) . \end{matrix}

(4.3)

On the other hand, by Proposition 3.1 ,

μ_{u}

has a finite square exponential moment, uniformly bounded for

u \in [0, T]

. More precisely, there exist

α > 0

and

M < + \infty

such that

\int e^{α | x |^{2}} d μ_{u} (x) \leq M

for all

u \leq T

. Since by assumption

| \nabla W (z) | \leq L | z |

and

| \nabla V (x) | = O (e^{α | x |^{2}})

, we deduce

{sup}_{s \leq u \leq T} (E (\nabla V (Y_{u}) + \nabla W * μ_{u} (Y_{u})) \cdot (Y_{u} - Y_{s})) < + \infty .

In view of 4.2 , it follows that there exists a constant

C = C (T)

such that

E | Y_{t} - Y_{s} |^{2} \leq (2 d + C) (t - s) .

This concludes the proof of (i).

To establish (ii), we perform a very similar computation. For given

s

, let

Z_{s, t} : = (| Y_{t} - Y_{s} |^{4})_{t \geq s}

. Another application of Itô's formula yields

E Z_{s, t} = 4 (2 + d) \int_{s}^{t} E | Y_{u} - Y_{s} |^{2} d u - 4 \int_{s}^{t} E | Y_{u} - Y_{s} |^{2} (Y_{u} - Y_{s}) \cdot (\nabla V (Y_{u}) + \nabla W * μ_{u} (Y_{u})) d u .

On one hand, from (i),

\int_{s}^{t} E | Y_{u} - Y_{s} |^{2} d u \leq 2 C \int_{s}^{t} (u - s) d s = C (t - s)^{2} .

On the other hand

\begin{matrix} \int_{s}^{t} E | Y_{u} - Y_{s} |^{2} (Y_{u} - Y_{s}) \cdot (\nabla V (Y_{u}) + \nabla W * μ_{u} (Y_{u})) d u \leq {(\int_{s}^{t} E Z_{s, u} d u)}^{3 / 4} {(\int_{s}^{t} E {| \nabla V (Y_{u}) + \nabla W * μ_{u} (Y_{u}) |}^{4} d u)}^{1 / 4} \end{matrix}

(4.4)

by Hölder's inequality. But again, since the measures

μ_{t}

admit a bounded square exponential moment,

E | \nabla V (Y_{u}) + \nabla W * μ_{u} (Y_{u}) |^{4}

is bounded on

[0, T]

. We conclude that

\begin{matrix} E Z_{s, t} \leq C ((t - s)^{2} + (t - s)^{1 / 4} {(\int_{s}^{t} E Z_{s, u} d u)}^{3 / 4}) . \end{matrix}

(4.5)

Then, with

C

standing again for various constants which are independent of

s

and

t

E Z_{s, u} \leq C (E | Y_{u} |^{4} + E | Y_{s} |^{4}) \leq 2 C {sup}_{0 \leq t \leq T} \int | x |^{4} d μ_{u} (x) \leq C;

so, from 4.5 ,

E Z_{s, t} \leq C ((t - s)^{2} + (t - s)^{1 / 4} (t - s)^{3 / 4}) \leq C (t - s),

and by 4.5 again we successively obtain

E Z_{s, t} \leq C (t - s)^{7 / 4},

and finally

E Z_{s, t} \leq C (t - s)^{2} .

This concludes the proof of (ii).

We finally turn to the proof of (iii). Without real loss of generality, we set

t_{0} = 0

. We shall proceed as in the proof of Proposition 3.1 , and prove the existence of some constant

C

and some continuous positive function

a

R_{+}

such that

\begin{matrix} E ({sup}_{0 \leq s \leq t \leq Δ \leq T} exp (a (t) | Y_{t} - Y_{s} |^{2})) \leq 1 + C Δ . \end{matrix}

(4.6)

Let

a (t)

be a smooth function, and

Z_{s, t} : = e^{a (t) | Y_{t} - Y_{s} |^{2}} .

By Itô's formula,

Z_{s, t} = 1 + M_{s, t} + \int_{s}^{t} [2 a (u) (d + 2 a | Y_{u} - Y_{s} |^{2} - (\nabla V + \nabla W * μ_{u}) (Y_{u}) \cdot (Y_{u} - Y_{s})) + a^{'} (u) | Y_{u} - Y_{s} |^{2}] Z_{s, u} d u

where

M_{s, t} : = \int_{s}^{t} a (u) (Y_{u} - Y_{s}) Z_{u} d B_{u} .

For each

s

M_{s, t}

, viewed as a stochastic process in

t

, is a martingale.

By Young's inequality, for any

b > 0

- 2 (\nabla V + \nabla W * μ_{u}) (Y_{u}) \cdot (Y_{u} - Y_{s}) \leq b {| Y_{u} - Y_{s} |}^{2} + \frac{1}{b} {| \nabla V + \nabla W * μ_{u} |}^{2} (Y_{u}) .

So, by letting

A_{u} : = a (u) [2 d + \frac{1}{b} {| \nabla V (Y_{u}) + \nabla W * μ_{u} (Y_{u}) |}^{2}]

and

B (u) : = a^{'} (u) + 4 a^{2} (u) + b a (u)

we obtain

Z_{s, t} \leq 1 + M_{s, t} + \int_{s}^{t} [A_{u} + B (u) | Y_{u} - Y_{s} |^{2}] Z_{s, u} d u .

We choose

a

in such a way that the function

B

is identically zero, that is

a (u) = e^{- b u} {(\frac{1}{a (0)} + 4 \frac{1 - e^{- b u}}{b})}^{- 1},

where

a (0)

is to be fixed later. Then

Z_{s, t} \leq 1 + M_{s, t} + \int_{s}^{t} A_{u} Z_{s, u} d u

from which it is clear that

\begin{matrix} E {sup}_{s \leq t \leq Δ} Z_{s, t} \leq 1 + E {sup}_{s \leq t \leq Δ} M_{s, t} + \int_{s}^{Δ} E A_{u} Z_{s, u} d u . \end{matrix}

(4.7)

By Cauchy-Schwarz and Doob's inequalities,

\begin{matrix} {(E {sup}_{s \leq t \leq Δ} M_{s, t})}^{2} \leq E {| {sup}_{s \leq t \leq Δ} M_{s, t} |}^{2} \leq 2 {sup}_{s \leq t \leq Δ} E | M_{s, t} |^{2} . \end{matrix}

(4.8)

Also, by Itô's formula and the Cauchy-Schwarz inequality again,

\begin{matrix} E | M_{s, t} |^{2} = \int_{s}^{t} a (u)^{2} E | Y_{u} - Y_{s} |^{2} Z_{s, u}^{2} d u \leq \frac{1}{2} \int_{s}^{t} a (u)^{2} {(E | Y_{u} - Y_{s} |^{4})}^{1 / 2} {(E Z_{s, u}^{4})}^{1 / 2} d u . \end{matrix}

(4.9)

In view of (ii), there exists a constant

C

such that

\begin{matrix} E | Y_{u} - Y_{s} |^{4} \leq C (u - s)^{2} . \end{matrix}

(4.10)

Furthermore,

\begin{matrix} E Z_{s, u}^{4} = E exp 4 a (u) | Y_{u} - Y_{s} |^{2} \leq {(E exp 16 a (u) | Y_{u} |^{2})}^{1 / 2} {(E exp 16 a (u) | Y_{s} |^{2})}^{1 / 2} . \end{matrix}

(4.11)

Recall from Proposition 3.1 that there exist constants

M

and

α > 0

such that

{sup}_{s \leq u \leq Δ} \int e^{α | y |^{2}} d μ_{u} (y) \leq M .

If we choose

a (0) \leq α / 16

, the decreasing property of

a

will ensure that

a (u) \leq α / 16

for all

u \in [0, Δ]

, and

E exp 16 a (u) | Y_{u} |^{2} (= \int e^{16 a (u) | y |^{2}} d μ_{u} (y)) \leq M .

Then, from 4.11 ,

{sup}_{s \leq u \leq Δ} E Z_{s, u}^{4} \leq M .

Now, from 4.9 and 4.10 we deduce

{sup}_{s \leq t \leq Δ} E | M_{s, t} |^{2} \leq C (t - s)^{2} .

Combining this with 4.8 , we conclude that

E {sup}_{s \leq t \leq Δ} M_{s, t} \leq C Δ .

In the same way, we can prove that

E (A_{t} Z_{s, t})

is bounded for

t \in [s, Δ]

by bounding

E Z_{s, t}^{2}

and

E A_{t}^{2}

. This concludes the proof of 4.6 , and therefore of (iii) above. □

4.2 Time-regularity of the limit empirical measure

We are now ready to prove Proposition 4.1 .

On one hand

W_{1} ({\hat{ν}}_{s}^{N}, {\hat{ν}}_{t}^{N}) \leq \frac{1}{N} \sum_{i = 1}^{N} | Y_{t}^{i} - Y_{s}^{i} |,

\begin{matrix} P [{sup}_{0 \leq s \leq t \leq Δ} W_{1} ({\hat{ν}}_{s}^{N}, {\hat{ν}}_{t}^{N}) > ɛ] \leq P [\frac{1}{N} \sum_{i = 1}^{N} V^{i} > ɛ] \end{matrix}

(4.12)

where

V^{i} : = {sup}_{0 \leq s \leq t \leq Δ} | Y_{t}^{i} - Y_{s}^{i} | .

By Chebyshev's exponential inequality and the independence of the

(Y_{t}^{i} - Y_{s}^{i})

P [\frac{1}{N} \sum_{i = 1}^{N} V^{i} > ɛ] \leq exp (- N {sup}_{ζ \geq 0} [ɛ ζ - log E exp (ζ V^{1})]) .

But, for any given

ζ

and

ω \geq 0

E exp (ζ V^{1}) \leq E exp (ζ (\frac{ω^{2} + (V^{1})^{2}}{2 ω})) \leq exp \frac{ζ ω}{2} E exp \frac{ζ}{2 ω} (V^{1})^{2} .

Let

ω = \frac{ζ}{2 a}

, so that

\frac{ζ}{2 ω} = a

. Then, from estimate (iii) in Subsection 4.1 ,

E exp \frac{ζ}{2 ω} (V^{1})^{2} \leq 1 + C Δ,

uniformly in

s

and

Δ

. Hence, for any

ζ > 0

E exp (ζ V^{1}) \leq E exp \frac{ζ^{2}}{4 a} (1 + C Δ) .

Consequently,

\begin{matrix} P [\frac{1}{N} \sum_{i = 1}^{N} V^{i} > ɛ] & \leq & exp (- N {sup}_{ζ \geq 0} [ɛ ζ - \frac{ζ^{2}}{4} - log (1 + C Δ)]) \end{matrix}

\begin{matrix} = & exp (- N [a ɛ^{2} - log (1 + C Δ)]) \end{matrix}

\begin{matrix} \leq & exp (- N [a ɛ^{2} - C Δ]) . \end{matrix}

The proof of Proposition 4.1 follows by 4.12 .

5 Coupling

We now (as is classical) reduce the proof of convergence for

{\hat{μ}}_{t}^{N}

to a proof of convergence for the empirical measure

{\hat{ν}}_{t}^{N}

constructed on the auxiliary independent system

(Y_{t}^{i})

. The final goal of this section is the following estimate.

Proposition 5.1. With the conventions of Subsection 3.1 ,

W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}) \leq Γ \int_{0}^{t} e^{- α (t - s)} W_{1} ({\hat{ν}}_{s}^{N}, μ_{s}) d s + W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}),

where

Γ

is defined by 3.1 , and

α : = β + 2 min (γ, 0)

Proof. For the sake of simplicity we give a slightly sketchy proof. We couple the stochastic systems

(X_{t}^{i})

and

(Y_{t}^{i})

by assuming that (i)

X_{0}^{i} = Y_{0}^{i}

and (ii) both systems are driven by the same Brownian processes

B_{t}^{i}

. In particular, for each

i \in {1, \dots, N}

, the process

X_{t}^{i} - Y_{t}^{i}

satisfies the equation

\begin{matrix} d (X_{t}^{i} - Y_{t}^{i}) = - (\nabla V (X_{t}^{i}) - \nabla V (Y_{t}^{i})) d t - (\nabla W * {\hat{μ}}_{t}^{N} (X_{t}^{i}) - \nabla W * μ_{t} (Y_{t}^{i})) d t . \end{matrix}

(5.1)

From 5.1 we deduce

\begin{matrix} \frac{1}{2} \frac{d}{d t} | X_{t}^{i} - Y_{t}^{i} |^{2} = - (\nabla V (X_{t}^{i}) - \nabla V (Y_{t}^{i})) \cdot (X_{t}^{i} - Y_{t}^{i}) - (\nabla W * {\hat{μ}}_{t}^{N} (X_{t}^{i}) - \nabla W * μ_{t} (Y_{t}^{i})) \cdot (X_{t}^{i} - Y_{t}^{i}) . \end{matrix}

(5.2)

Our convexity assumption on

V

implies

- (\nabla V (X_{t}^{i}) - \nabla V (Y_{t}^{i})) \cdot (X_{t}^{i} - Y_{t}^{i}) \leq - β | X_{t}^{i} - Y_{t}^{i} |^{2};

so the main issue consists in the treatment of the quantity

\nabla W * {\hat{μ}}_{t}^{N} (X_{t}^{i}) - \nabla W * μ_{t} (Y_{t}^{i})

appearing in the right-hand side of 5.2 . There are (at least) two options here. The first one consists in writing

\begin{matrix} \nabla W * {\hat{μ}}_{t}^{N} (X_{t}^{i}) - \nabla W * μ_{t} (Y_{t}^{i}) = (\nabla W * {\hat{μ}}_{t}^{N} - \nabla W * μ_{t}) (X_{t}^{i}) + (\nabla W * μ_{t} (X_{t}^{i}) - \nabla W * μ_{t} (Y_{t}^{i})); \end{matrix}

(5.3)

while the second one consists in forcing the introduction of

{\hat{ν}}_{t}^{N}

as follows:

\begin{matrix} \nabla W * {\hat{μ}}_{t}^{N} (X_{t}^{i}) - \nabla W * μ_{t} (Y_{t}^{i}) = \frac{1}{N} \sum_{j = 1}^{N} [\nabla W (X_{t}^{i} - X_{t}^{j}) - \nabla W (Y_{t}^{i} - Y_{t}^{j})] - (\nabla W * {\hat{ν}}_{t}^{N} - \nabla W * μ_{t}) (Y_{t}^{i}) . \end{matrix}

(5.4)

Both options are interesting and lead to slightly different computations. Since both lines of computations might be useful in other contexts, we shall sketch them one after the other. The second option leads to better bounds, but at the price of more complications (in particular, we shall need to sum over the index

i

at an early stage).

First option: We start as in 5.3 . In view of our assumption on

D^{2} W

, the Lipschitz norm of

\nabla W (X_{t}^{i} - \cdot)

is bounded by

Γ

. Therefore, by the Kantorovich-Rubinstein dual formulation 1.2 ,

| \nabla W * ({\hat{μ}}_{t}^{N} - μ_{t}) (X_{t}^{i}) | = | \int_{R^{d}} \nabla W (X_{t}^{i} - y) d ({\hat{μ}}_{t}^{N} - μ_{t}) (y) | \leq Γ W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}),

and then our assumptions on

V

and

W

imply

\frac{1}{2} \frac{d}{d t} | X_{t}^{i} - Y_{t}^{i} |^{2} \leq - (γ + β) | X_{t}^{i} - Y_{t}^{i} |^{2} + Γ W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}) | X_{t}^{i} - Y_{t}^{i} | .

In other words,

| X_{t}^{i} - Y_{t}^{i} |

satisfies the differential inequality

\frac{d}{d t} | X_{t}^{i} - Y_{t}^{i} | + (β + γ) | X_{t}^{i} - Y_{t}^{i} | \leq Γ W_{1} ({\hat{μ}}_{t}^{N}, μ_{t})

(

X_{t}^{i}

and

Y_{t}^{i}

separately are not Lipschitz functions of

t

, but their difference is). Hence, by Gronwall's lemma,

| X_{t}^{i} - Y_{t}^{i} | \leq Γ \int_{0}^{t} e^{- (β + γ) (t - s)} W_{1} ({\hat{μ}}_{s}^{N}, μ_{s}) d s .

Now we sum over

i

; by convexity of the distance

W_{1}

and triangular inequality, we obtain

\begin{matrix} W_{1} ({\hat{μ}}_{t}^{N}, {\hat{ν}}_{t}^{N}) & \leq & \frac{1}{N} \sum_{i = 1}^{N} | X_{t}^{i} - Y_{t}^{i} | \leq Γ \int_{0}^{t} e^{- (β + γ) (t - s)} W_{1} ({\hat{μ}}_{s}^{N}, μ_{s}) d s \end{matrix}

\begin{matrix} \leq & Γ \int_{0}^{t} e^{- (β + γ) (t - s)} [W_{1} ({\hat{μ}}_{s}^{N}, {\hat{ν}}_{s}^{N}) + W_{1} ({\hat{ν}}_{s}^{N}, μ_{s})] d s . \end{matrix}

By using Gronwall's lemma again, we deduce

W_{1} ({\hat{μ}}_{t}^{N}, {\hat{ν}}_{t}^{N}) \leq Γ \int_{0}^{t} e^{- (β + γ - Γ) (t - s)} W_{1} ({\hat{ν}}_{s}^{N}, μ_{s}) d s .

By applying the triangular inequality for

W_{1}

, we conclude to the validity of Proposition 5.1 , only with

α

replaced by the (a priori smaller) quantity

β + γ - Γ

Second option: Now we start with 5.4 . This time we sum over

i

right from the beginning:

\frac{1}{2} \frac{d}{d t} \sum_{i = 1}^{N} | X_{t}^{i} - Y_{t}^{i} |^{2} = - \sum_{i = 1}^{N} (\nabla V (X_{t}^{i}) - \nabla V (Y_{t}^{i})) \cdot (X_{t}^{i} - Y_{t}^{i}) - \frac{1}{N} \sum_{i, j = 1}^{N} (A_{t}^{i j} + B_{t}^{i j})

where

A_{t}^{i j} = (\nabla W (X_{t}^{i} - X_{t}^{j}) - \nabla W (Y_{t}^{i} - Y_{t}^{j})) \cdot (X_{t}^{i} - Y_{t}^{i})

and

B_{t}^{i j} = (W (Y_{t}^{i} - Y_{t}^{j}) - \nabla W * μ_{t} (Y_{t}^{i})) \cdot (X_{t}^{i} - Y_{t}^{i}) .

Since

\nabla W

is an odd function and

D^{2} W (x) \geq γ I

for all

x \in R^{d}

, we have

A_{t}^{i j} + A_{t}^{j i} = (\nabla W (X_{t}^{i} - X_{t}^{j}) - \nabla W (Y_{t}^{i} - Y_{t}^{j})) \cdot ((X_{t}^{i} - X_{t}^{j}) - (Y_{t}^{i} - Y_{t}^{j})) \geq γ {| (X_{t}^{i} - X_{t}^{j}) - (Y_{t}^{i} - Y_{t}^{j}) |}^{2},

whence

- \sum_{i, j = 1}^{N} A_{t}^{i j} \leq - \frac{γ}{2} \sum_{i, j = 1}^{N} {| (X_{t}^{i} - X_{t}^{j}) - (Y_{t}^{i} - Y_{t}^{j}) |}^{2} \leq - 2 N γ^{-} \sum_{i = 1}^{N} | X_{t}^{i} - Y_{t}^{i} |^{2}

where

γ^{-} = min (γ, 0)

Then

- \sum_{j = 1}^{N} B_{t}^{i j} = - (X_{t}^{i} - Y_{t}^{i}) \cdot (\nabla W * {\hat{ν}}_{t}^{N} (Y_{t}^{i}) - \nabla W * μ_{t} (Y_{t}^{i})) .

Our assumption on

D^{2} W

implies that the Lipschitz norm of

\nabla W (Y_{t}^{i} - \cdot)

is bounded by

Γ

; so, by the Kantorovich-Rubinstein dual formulation 1.2 ,

| \nabla W * ({\hat{ν}}_{t}^{N} - μ_{t}) (Y_{t}^{i}) | = | \int_{R^{d}} \nabla W (Y_{t}^{i} - y) d ({\hat{ν}}_{t}^{N} - μ_{t}) (y) | \leq Γ W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) .

Collecting all terms we finally obtain

\frac{1}{2} \frac{d}{d t} \sum_{i = 1}^{N} | X_{t}^{i} - Y_{t}^{i} |^{2} \leq - (β + 2 γ^{-}) \sum_{i = 1}^{N} | X_{t}^{i} - Y_{t}^{i} |^{2} + Γ \sum_{i = 1}^{N} | X_{t}^{i} - Y_{t}^{i} | W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) .

Then, since

\sum_{i = 1}^{N} | X_{t}^{i} - Y_{t}^{i} | \leq {(N \sum_{i = 1}^{N} | X_{t}^{i} - Y_{t}^{i} |^{2})}^{1 / 2}

, the function

y (t) : = {(\frac{1}{N} \sum_{i = 1}^{N} | X_{t}^{i} - Y_{t}^{i} |^{2})}^{1 / 2}

satisfies the differential inequality

y^{'} (t) + (β + 2 γ^{-}) y (t) \leq Γ W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}),

so that

{(\frac{1}{N} \sum_{i = 1}^{N} | X_{t}^{i} - Y_{t}^{i} |^{2})}^{1 / 2} \leq Γ \int_{0}^{t} e^{- (β + 2 γ^{-}) (t - s)} W_{1} ({\hat{ν}}_{s}^{N}, μ_{s}) d s .

The conclusion follows by triangular inequality again since

W_{1} ({\hat{μ}}_{t}^{N}, {\hat{ν}}_{t}^{N}) \leq W_{2} ({\hat{μ}}_{t}^{N}, {\hat{ν}}_{t}^{N}) \leq {(\frac{1}{N} \sum_{i = 1}^{N} | X_{t}^{i} - Y_{t}^{i} |^{2})}^{1 / 2} .

□

Remark 5.2. Not only does the “second option” in the proof lead to better bounds, it also provides an estimate of the distance between

\hat{μ}

and

\hat{ν}

in the

W_{2}

distance, which is stronger than the

W_{1}

distance. However, we do not take any advantage of this refinement.

6 Conclusion

In this section, we paste together all the estimates established in the previous sections, so as to prove Theorems 1.7 to 1.11 .

6.1 Concentration estimates

We start with the proof of Theorem 1.7 . By

C

we shall denote various constants depending on

T

, on our assumptions on

V

and

W

, and also on

\int e^{α | x |^{2}} d μ_{0} (x)

, for some

α > 0

From Proposition 5.1 ,

{sup}_{0 \leq t \leq T} W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}) \leq (Γ e^{| α | T} + 1) {sup}_{0 \leq t \leq T} W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) d s .

In particular, there is a constant

C

such that

\begin{matrix} P [{sup}_{0 \leq t \leq T} W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}) > ɛ] \leq P [{sup}_{0 \leq t \leq T} W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) > \tilde{ɛ}], \tilde{ɛ} = \frac{ɛ}{C} . \end{matrix}

(6.1)

From Corollary 3.2 and Theorem 1.1 we know that

{sup}_{0 \leq t \leq T} P [W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) > \tilde{ɛ}] \leq e^{- K N {\tilde{ɛ}}^{2}}

for all

t \in [0, T]

N \geq N_{0} max ({\tilde{ɛ}}^{- (d^{'} + 2)}, 1)

(

d^{'} > d

). The issue now is to “exchange”

sup

and

P

in this estimate. As we shall see, this is authorized by the continuity estimates on

{\hat{ν}}_{t}^{N}

and

μ_{t}

Let

Δ > 0

(to be fixed later on), and let

M

be the integer part of

T / Δ + 1

. We decompose the interval

[0, T]

[0, T] = [0, Δ] \cup [Δ, 2 Δ] \cup \dots \cup [(M - 1) Δ, T] \subset^{M - 1} ⋃_{h = 0} [h Δ, (h + 1) Δ] .

Proposition 3.3 guarantees that, if

Δ \leq a {\tilde{ɛ}}^{2}

for some

a

small enough, then

\begin{matrix} h Δ \leq t \leq (h + 1) Δ ⟹ W_{1} (μ_{t}, μ_{h Δ}) \leq \frac{\tilde{ɛ}}{2} . \end{matrix}

(6.2)

Then, by triangular inequality and 6.2 ,

P [{sup}_{0 \leq t \leq T} W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) > \tilde{ɛ}]

\leq P [{sup}_{h = 0, . . ., M - 1} {sup}_{h Δ \leq t \leq (h + 1) Δ} W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) > \tilde{ɛ}]

\leq P [{sup}_{h = 0, . . ., M - 1} {sup}_{h Δ \leq t \leq (h + 1) Δ} W_{1} ({\hat{ν}}_{t}^{N}, ν_{h Δ}^{N}) + {sup}_{h = 0, . . ., M - 1} W_{1} ({\hat{ν}}_{h Δ}^{N}, μ_{h Δ}) + {sup}_{h = 0, . . ., M - 1} {sup}_{h Δ \leq t \leq (h + 1) Δ} W_{1} (μ_{h Δ}, μ_{t}) > \tilde{ɛ})]

\leq P [{sup}_{h = 0, . . ., M - 1} {sup}_{h Δ \leq t \leq (h + 1) Δ} W_{1} ({\hat{ν}}_{t}^{N}, {\hat{ν}}_{h Δ}^{N}) + {sup}_{h = 0, . . ., M - 1} W_{1} ({\hat{ν}}_{h Δ}^{N}, μ_{h Δ}) > \frac{\tilde{ɛ}}{2}],

which can be bounded by

P [{sup}_{h = 0, . . ., M - 1} {sup}_{h Δ \leq t \leq (h + 1) Δ} W_{1} ({\hat{ν}}_{t}^{N}, {\hat{ν}}_{h Δ}^{N}) > \frac{\tilde{ɛ}}{4}] + P [{sup}_{h = 0, . . ., M - 1} W_{1} ({\hat{ν}}_{h Δ}^{N}, μ_{h Δ}) > \frac{\tilde{ɛ}}{4}] .

By Corollary 3.2 and Theorem 1.1 , there exist some constants

C

and

N_{0}

such that

P [W_{1} ({\hat{ν}}_{h Δ}^{N}, μ_{h Δ}) \geq \frac{\tilde{ɛ}}{4}] \leq exp (- C N {\tilde{ɛ}}^{2})

for all

h = 0, . . ., M - 1

, and

N \geq N_{0} max ({\tilde{ɛ}}^{- (d^{'} + 2)}, 1)

. Hence

\begin{matrix} P [{sup}_{h = 0, . . ., M - 1} W_{1} ({\hat{ν}}_{h Δ}^{N}, μ_{h Δ}) > \frac{\tilde{ɛ}}{4}] \leq \sum_{h = 0}^{M - 1} P [W_{1} ({\hat{ν}}_{h Δ}^{N}, μ_{h Δ}) > \frac{\tilde{ɛ}}{4}] \leq M exp (- C N {\tilde{ɛ}}^{2}) . \end{matrix}

(6.3)

On the other hand, from Proposition 4.1 we deduce

P [{sup}_{h Δ \leq t \leq (h + 1) Δ} W_{1} ({\hat{ν}}_{t}^{N}, {\hat{ν}}_{h Δ}^{N}) > \frac{\tilde{ɛ}}{4}] \leq exp (- N (\frac{a}{4} {\tilde{ɛ}}^{2} - C Δ))

for all

h = 0, . . ., M - 1

and

\tilde{ɛ} > 0

, so

\begin{matrix} P [{sup}_{h = 0, . . ., M - 1} {sup}_{h Δ \leq t \leq (h + 1) Δ} W_{1} ({\hat{ν}}_{t}^{N}, {\hat{ν}}_{h Δ}^{N}) > \frac{\tilde{ɛ}}{4}] \leq M exp (- N (\frac{a}{4} {\tilde{ɛ}}^{2} - C Δ)) . \end{matrix}

(6.4)

We can assume that

Δ \leq \frac{a}{8 C} {\tilde{ɛ}}^{2}

, and

M \leq C T / {\tilde{ɛ}}^{2} + 1

; then we can bound the right-hand side of 6.4 by

\begin{matrix} M exp (- \frac{a}{8} N {\tilde{ɛ}}^{2}) \leq C (1 + \frac{T}{{\tilde{ɛ}}^{2}}) exp (- \frac{a}{8} N {\tilde{ɛ}}^{2}) \end{matrix}

(6.5)

From 6.3 and 6.5 we deduce that, for

Δ

small enough (depending on

ɛ

!),

\begin{matrix} P [{sup}_{0 \leq t \leq T} W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) > \tilde{ɛ}] \leq 2 C (1 + \frac{T}{{\tilde{ɛ}}^{2}}) exp (- K N {\tilde{ɛ}}^{2}) \end{matrix}

(6.6)

for

N \geq N_{0} max ({\tilde{ɛ}}^{- (d^{'} + 2)}, 1)

. So we deduce from 6.6 that

P [{sup}_{0 \leq t \leq T} W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) > \tilde{ɛ}] \leq exp (log (C (\frac{T}{{\tilde{ɛ}}^{2}} + 1)) - K N {\tilde{ɛ}}^{2}),

where again

C, K

stand for various positive constants, and

N \geq max (N_{0} ɛ^{- (d^{'} + 2)}, 1)

. This concludes the proof of Theorem 1.7 .

6.2 Uniform in time estimates

Now, we shall focus on the case when

β > 0, β + 2 γ > 0

is positive, and derive Theorem 1.9 by a slightly refined estimate.

Let us start again from the bound

W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}) \leq Γ \int_{0}^{t} e^{- α (t - s)} W_{1} ({\hat{ν}}_{s}^{N}, μ_{s}) d s + W_{1} ({\hat{ν}}_{t}^{N}, μ_{t})

where

α : = β + 2 min (γ, 0)

is positive. Let

Δ > 0

(to be fixed later on), and

k

be the integer part of

t / Δ

. If

W_{1} ({\hat{μ}}_{t}^{N}, μ_{t})

is larger than

ɛ

, then

{\begin{matrix} either & W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) \geq \frac{ɛ}{2} \\ or & \exists j \in {0, \dots, k}; \int_{j Δ}^{(j + 1) Δ} e^{- α (t - s)} W_{1} ({\hat{ν}}_{s}^{N}, μ_{s}) d s \geq \frac{ɛ}{2^{k + 2 - j} Γ} . \end{matrix}

Indeed,

(ɛ / 2) + \sum_{j \leq k} (ɛ / 2^{k + 2 - j}) \leq ɛ

. As a consequence,

{\begin{matrix} either & W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) > \frac{ɛ}{2} \\ or & \exists j \in {0, \dots, k}; {sup}_{j Δ \leq s \leq (j + 1) Δ} W_{1} ({\hat{ν}}_{s}^{N}, μ_{s}) > \frac{ɛ α e^{α [t - (j + 1) Δ]}}{2^{k + 2 - j} Γ} . \end{matrix}

Since, for

t \in [j Δ, (j + 1) Δ]

\frac{e^{α [t - (j + 1) Δ]}}{2^{k + 2 - j}} \geq \frac{e^{α (k - j - 1) Δ}}{2^{k - j + 2}} = (\frac{1}{4 e^{α Δ}}) {(\frac{e^{α Δ}}{2})}^{k - j},

we conclude to the existence of a constant

C

such that

\begin{matrix} P [W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}) > ɛ] \leq P [W_{1} ({\hat{ν}}_{t}^{N}, μ_{t}) > \frac{ɛ}{2}] + \sum_{j = 0}^{k} P [{sup}_{j Δ \leq s \leq (j + 1) Δ} W_{1} ({\hat{ν}}_{s}^{N}, μ_{s}) > C ɛ {(\frac{e^{α Δ}}{2})}^{k - j}] . \end{matrix}

(6.7)

We already know that the first term in the right-hand side in 6.7 is bounded by

e^{- λ N ɛ^{2}}

for some constant

λ > 0

, and so we focus on the other terms.

In the proof of Theorem 1.7 , we have established that there are constant

C

and

λ

, depending on

Δ

and on bounds on square exponential moments for

μ_{0}

, such that

\begin{matrix} P [{sup}_{0 \leq s \leq Δ} W_{1} ({\hat{ν}}_{s}^{N}, μ_{s}) > δ] \leq C (1 + \frac{Δ}{δ^{2}}) e^{- λ N δ^{2}} . \end{matrix}

(6.8)

Proposition 3.1 guarantees that these square exponential bounds also hold true for

μ_{t}

, uniformly in

t

. Thus we can apply 6.8 with

μ_{j Δ}

taken as initial datum, and get

\begin{matrix} P [{sup}_{j Δ \leq s \leq (j + 1) Δ} W_{1} ({\hat{ν}}_{s}^{N}, μ_{s}) > δ] \leq C e^{- λ N δ^{2}}, \end{matrix}

(6.9)

as soon as

N \geq N_{0} max (δ^{- (d^{'} + 2)}, 1)

We now use 6.9 to bound the sum appearing in the right-hand side of 6.7 . Choose

Δ

large enough that

θ : = \frac{e^{α Δ}}{2} > 1 .

Applying 6.9 with

δ

replaced by

C θ^{j - k} ɛ

, we can bound the sum in the right-hand side of 6.7 by

C \sum_{j = 0}^{k} exp (- K θ^{2 (k - j)} N ɛ^{2})

for

N \geq N_{0} max (ɛ^{- (d^{'} + 2)}, 1)

, where

C

K

and

N_{0}

are again positive constants. Since again

θ

is larger than 1, there is a constant

a > 0

such that

θ^{2 (k - j)} \geq a (k - j)

, so the sum above is bounded by

C (e^{- K N ɛ^{2}} + \sum_{ℓ = 1}^{\infty} e^{- K ℓ N ɛ^{2}}) \leq C (e^{- K N ɛ^{2}} + \frac{e^{- K N ɛ^{2}}}{1 - e^{- K N ɛ^{2}}}) .

N_{0}

is large enough, our assumption

N \geq N_{0} max (ɛ^{- (d^{'} + 2)}, 1)

implies that

e^{- K N ɛ^{2}}

is always less than

1 / 2

, so that the above sum can be bounded by just

C e^{- K N ɛ^{2}}

. This concludes the proof of the first point of Theorem 1.9 .

The second point is proved by writing

\begin{matrix} W_{1} ({\hat{μ}}_{t}^{N}, μ_{\infty}) & \leq & W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}) + W_{1} (μ_{t}, μ_{\infty}) \end{matrix}

\begin{matrix} \leq & W_{1} ({\hat{μ}}_{t}^{N}, μ_{t}) + C e^{- λ t} \end{matrix}

successively by the triangular inequality for Wasserstein distance and use of Proposition 3.8 . Then the result follows from the uniform estimate obtained above.

6.3 Data reconstruction

We finally consider Theorem 1.11 . Proposition 3.6 ensures that, as

t \to \infty

f_{t}

is uniformly bounded in

C^{k}

, where

k

is arbitrarily large. Since

f_{t}

converges to

f_{\infty}

t \to \infty

, we deduce that

f_{\infty}

is Lipschitz. Then Theorem 1.9 and Proposition 2.1 together imply Theorem 1.11 .

A Metric entropy of a probability space

We now prove the covering result used in Section 2.1 , as a particular case of a more general estimate. Let

E

be a Polish space, we look for an upper bound on the number

N_{p} (E, δ) : = m (P (E), δ)

of balls of radius

δ

in Wasserstein distance

W_{p}

needed to cover the space

P (E)

of probability measures on

E

. We use the same strategy as in [12,Exercise 6.2.19] , where the Lévy distance is used instead of the Wasserstein distance.

Theorem A.1. Let

(E, d)

be a Polish space with finite diameter

D

. For any

r > 0

, define

N (E, r)

as the minimal number of balls needed to cover

E

by balls of radius

r

. Then there exists a numerical constant

C

such that for all

p \geq 1

and

δ \in (0, D)

, the space

P (E)

can be covered by

N_{p} (E, δ)

balls of radius

δ

W_{p}

distance, with

\begin{matrix} N_{p} (E, δ) \leq {(\frac{C D}{δ})}^{p N (E, \frac{δ}{2})} . \end{matrix}

(A.1)

Remark A.2. The

W_{p}

distance between any two probability measures on

E

is at most

D

, so, for all

δ \geq D

, we have the trivial estimate

N_{p} (E, δ) = 1

Proof. Let

r > 0

, and let

{x_{j}}_{1 \leq j \leq N (E, r)}

be such that

E

is covered by the balls

B (x_{j}, r)

with centers

x_{j} \in E

and radius

r

. For simplicity we shall write

N = N (E, r)

In a first step we prove that for any

μ \in P (E)

there exist nonnegative real numbers

(β_{j})_{1 \leq j \leq N}

, with

\sum_{j = 1}^{N} β_{j} = 1

, such that

W_{p} (μ, \tilde{μ}) \leq r, \tilde{μ} : = \sum_{j = 1}^{N} β_{j} δ_{x_{j}} .

For this we first replace the balls

B (x_{j}, r)

's by the sets

{\tilde{B}}_{j}

's defined by

\forall j, {\tilde{B}}_{j} = B (x_{j}, r) \ ⋃_{k \leq j - 1} B (x_{k}, r),

so that

E

is partitioned into the

{\tilde{B}}_{j}

's. Next define

β_{j} = μ [{\tilde{B}}_{j}] .

It is easy to check that the required properties are fulfilled. Indeed, we may transport

μ

onto

\tilde{μ} = \sum_{j = 1}^{N} β_{j} δ_{x_{j}}

by sending all

x

's in

{\tilde{B}}_{j}

onto

x_{j}

, for each

j = 1, . . ., N

: the cost of this transport is bounded by

\sum_{j = 1}^{N} r^{p} μ ({\tilde{B}}_{j}) = r^{p}

. In the second step we introduce an integer

K

(whose value will be made more precise later on), and consider the set

C_{K} : = {\sum_{j = 1}^{N} α_{j} δ_{x_{j}}; (α_{j})_{1 \leq j \leq N} \in A_{K}} \subset P (E),

where

A_{K}

is the set of all

N

-tuples

(α_{j})_{1 \leq j \leq N}

, such that each

α_{j}

is of the form

k_{j} / K

k_{j} \in N

, and

\sum_{j = 1}^{N} α_{j} = 1

Given a probability measure

\tilde{μ} = \sum_{i = 1}^{N} β_{i} δ_{x_{i}}

(where

(β_{i})_{i}

does not necessarily belong to

A_{K}

), there exists

μ^{'}

C_{K}

such that

\begin{matrix} W_{p} (μ^{'}, \tilde{μ}) \leq D {(\frac{N}{K})}^{1 / p} . \end{matrix}

(A.2)

To prove A.2 , we define

n_{j}

as the integer part

[K β_{j}]

K β_{j}

and

J

as the first integer such that

\sum_{j = 1}^{J} (n_{j} + 1) + \sum_{j = J + 1}^{N} n_{j} = K .

Since

\sum_{j = 1}^{N} β_{j} = 1

, it is clear that

J \leq N

. Then we define a measure

μ^{'} \in C_{K}

μ^{'} = \sum_{j = 1}^{N} α_{j} δ_{x_{j}}

, where

α_{j} = {\begin{matrix} \frac{n_{j} + 1}{K} & for j = 1, . . ., J \\ \frac{n_{j}}{K} & for j = J + 1, . . ., N . \end{matrix}

Let us bound the distance between

μ

and

μ^{'}

. For that we gradually define a transport plan between

\tilde{μ}

and

μ^{'}

in the following way: first of all, at each point

x_{i}

, the mass

n_{i} / K

stays in place. Then, the remaining masses

β_{i} - n_{i} / K

are redistributed as follows: all the remaining mass at

x_{1}, \dots, x_{ℓ}

is brought to

x_{1}

, together with possibly a bit of mass at

x_{ℓ + 1}

, until a total mass

1 / K

has been added at location

x_{1}

(for

ℓ

large enough). If

J \geq 2

, then we again bring mass from

x_{ℓ + 1}, \dots

, until another mass

1 / K

has been added at

x_{2}

. We carry on until all the mass at

x_{J}

has been used, thus building a transport plan

(π_{i j})_{1 \leq i, j \leq N}

which sends

\tilde{μ}

onto

μ^{'}

, in such a way that

π_{i i} \geq \frac{n_{i}}{K}

for all

i

. Hence,

\sum_{j \neq i} π_{i j} \leq β_{i} - π_{i i} = β_{i} - \frac{n_{i}}{K} \leq \frac{1}{K},

and this plan yields an upper bound on the Wasserstein distance:

W_{p}^{p} (\tilde{μ}, μ^{'}) \leq \sum_{i, j = 1}^{N} d (x_{i}, x_{j})^{p} π_{i j} = \sum_{i = 1}^{N} \sum_{j \neq i} d (x_{i}, x_{j})^{p} π_{i j} \leq N \frac{D^{p}}{K} .

To summarize the first two steps: for any

μ

P (E)

there exists

μ^{'} \in C_{K}

such that

W_{p} (μ, μ^{'}) \leq r + D {(\frac{N}{K})}^{1 / p} .

In other words, the family

{(B (μ^{'}, r + D (N / K)^{1 / p}))}_{μ^{'} \in C_{K}}

covers

P (E)

In the third step we choose some suitable

K

and

r

for a given

δ

We first choose

K

in such a way that

r

and

D (N / K)^{1 / p}

have the same order of magnitude, for instance

K = [N {(\frac{D}{r})}^{p}] + 1 .

Then

r + D (N / K)^{1 / p} \leq 2 r,

and the balls

B (μ^{'}, r + D (N / K)^{1 / p})

have radius at most

δ

r = \frac{δ}{2} .

Now

K

and

r

are fixed,

N = N (E, δ / 2)

, and we just have to estimate the cardinality

♯ C_{K}

C_{K}

. For this we first note that

♯ C_{K} = \frac{(K + N - 1)!}{(K - 1)! N!} = \frac{(K + N - 1) . . . K}{N!}

Without loss of generality, we have assumed

δ < D

, so

K > N

. Then

K < \cdot \cdot \cdot < K + N - 1 < 2 K

, and hence

♯ C_{K} \leq \frac{(2 K)^{N}}{N!} \leq {(\frac{2 K e}{N})}^{N} .

Since

N \geq 1

and

2 D \geq δ

, we can write

K \leq N {(\frac{2 D}{δ})}^{p} + 1 \leq 2 N {(\frac{2 D}{δ})}^{p},

and we deduce

♯ C_{K} \leq {(C \frac{D}{δ})}^{p N (E, \frac{δ}{2})}

with

C = 2 (4 e)^{1 / p} \leq 8 e

Consequently, we have covered

P (E)

by the

{(C \frac{D}{δ})}^{p N (E, \frac{δ}{2})}

balls

(B (μ^{'}, δ))_{μ^{'} \in C_{K}}

with radius

δ

. This concludes the argument.

□

In the particular case when

E

is the Euclidean ball

B_{R}

of radius

R

R^{d}

, we have

\begin{matrix} N (B_{R}, r) \leq k {(\frac{R}{r})}^{d} \end{matrix}

(A.3)

for some constant

k

. To see this, one may for instance consider the balls with center in the lattice

\frac{r}{\sqrt{d}} Z^{d}

R^{d}

. Then Theorem A.1 yields the bound

N_{p} (B_{R}, δ) \leq {(C \frac{R}{δ})}^{p k {(\frac{R}{δ})}^{d}},

which is used in the present paper.

B Regularity estimates on the limit PDE

In this appendix we study solutions to the limit equation

\begin{matrix} \partial_{t} ρ = Δ ρ + \nabla . (ρ (V + W * ρ)), t \geq 0, x \in R^{d} \end{matrix}

(B.1)

and establish the regularity results stated in Proposition 3.6 . Following the method in [13] , we shall measure the regularity in terms of

L^{2}

-Sobolev spaces

H^{s} (R^{d}) = {u \in L^{2} (R^{d}); \partial^{α} u \in L^{2} (R^{d}), α \in N^{d}, | α | \leq s} (s \in N) .

Our main result is as follows.

Theorem B.1. Let

V

and

W

such that all their partial derivatives

\partial^{α} V

and

\partial^{α} W

are continuous and grow at most polynomially at infinity, for any multi-index

α \in N^{d}

with

| α | \leq s + 1

. Let

a, E > 0

and let

ρ_{0}

be a probability density such that

\int_{R^{d}} e^{a | x |^{2}} d ρ_{0} (x) \leq E .

Then, there exists a continuous function

f : (0, + \infty) \to (0, + \infty)

, only depending on

d

s

V

W

a

and

E

, such that any classical solution

ρ = ρ (t, x)

to B.1 , starting from

ρ_{0}

, satisfies

{∥ ρ (t, \cdot) ∥}_{H^{s} (R^{d})} \leq f (t) .

Proof. For the sake of simplicity we only give a formal proof, which can be turned rigorous by means of regularization arguments.

Let then

ρ = (ρ (t, .))_{t \geq 0}

be a solution of

\partial_{t} ρ = Δ ρ + \nabla . (ρ (V + W * ρ)), t \geq 0, x \in R^{d};

we rewrite the equation as

\partial_{t} ρ = \sum_{i = 1}^{d} \partial_{i i} ρ + \partial_{i} [ρ \partial_{i} φ],

where

\partial_{i} = \partial^{e_{i}}

e_{i}

is the

i

-th vector of the canonical base of

R^{d}

, and

φ (t, x) = V (x) + W * ρ (t, x) .

Let

α \in N^{d}

be given. By integration by parts and Cauchy-Schwarz inequality,

\frac{1}{2} \frac{d}{d t} \int_{R^{d}} {| \partial^{α} ρ |}^{2} = \int_{R^{d}} \partial^{α} ρ \partial_{t} (\partial^{α} ρ) = \int_{R^{d}} \partial^{α} ρ \partial^{α} (\partial_{t} ρ) = \sum_{i = 1}^{d} \int_{R^{d}} \partial^{α} ρ \partial^{α} (\partial_{i i} ρ + \partial_{i} [ρ \partial_{i} φ]) = - \sum_{i = 1}^{d} \int_{R^{d}} {| \partial^{α + e_{i}} ρ |}^{2} + \int \partial^{α + e_{i}} ρ \partial^{α} [ρ \partial_{i} φ] \leq - \sum_{i = 1}^{d} \int_{R^{d}} {| \partial^{α + e_{i}} ρ |}^{2} + \sum_{i}^{d} {[\int_{R^{d}} {| \partial^{α + e_{i}} ρ |}^{2}]}^{1 / 2} {[\sum_{β \leq α} C_{α, β} \int_{R^{d}} {| \partial^{α - β + e_{i}} φ \partial^{β} ρ |}^{2}]}^{1 / 2} \leq - \frac{1}{2} \sum_{i = 1}^{d} \int_{R^{d}} {| \partial^{α + e_{i}} ρ |}^{2} + \sum_{β \leq α} C_{α, β} \int_{R^{d}} {| \partial^{α - β + e_{i}} φ \partial^{β} ρ |}^{2} .

By summing over

α \in N^{d}

with

| α | \leq s

, we find

\frac{d}{d t} \sum_{| α | \leq s} \int_{R^{d}} {| \partial^{α} ρ |}^{2} \leq - \sum_{| α | \leq s} \sum_{i = 1}^{d} \int_{R^{d}} {| \partial^{α + e_{i}} ρ |}^{2} + \sum_{| α | \leq s} \sum_{β \leq α} C_{α, β} \int_{R^{d}} {| \partial^{α - β + e_{i}} φ \partial^{β} ρ |}^{2} .

Given

T > 0

, by Proposition 3.1 there exist constants

\hat{a}

and

\hat{E}

, depending only on

d

a

E

and

T

, such that

\begin{matrix} \int e^{\hat{a} | x |^{2}} d ρ (t, x) \leq \hat{E} \end{matrix}

(B.2)

for all

t \in [0, T]

. In particular, it follows from our assumptions on the derivatives of

V

and

W

that all

{| \partial^{α - β + e_{i}} φ |}^{2}

terms are bounded by some polynomial in

| x |

, uniformly in

t \in [0, T]

Let

〈 x 〉 : = \sqrt{1 + | x |^{2}}

. For

k, s \geq 0

, we introduce the weighted norms

∥ u ∥_{H_{k}^{s}} : = {(\sum_{| α | \leq s} \int_{R^{d}} 〈 x 〉^{k} | \partial^{α} u (x) |^{2} d x)}^{1 / 2}

and

∥ u ∥_{L_{k}^{1}} : = \int_{R^{d}} 〈 x 〉^{k} | u (x) | d x .

Then for any

s \in N

and

T \geq 0

there exist

k

and

C \geq 0

such that

\begin{matrix} 0 \leq t \leq T ⟹ \frac{d}{d t} ∥ u ∥_{H^{s}}^{2} \leq - ∥ u ∥_{H^{s + 1}}^{2} + C ∥ u ∥_{H_{k}^{s}}^{2} . \end{matrix}

(B.3)

We shall prove later on the following interpolation lemma:

Lemma B.2. Given

d \geq 1

s \in N

k \geq 0

, there exist nonnegative constants

C (d, s, k)

and

h (d, s, k)

, and

θ (d, s) \in (0, 1)

such that for all

u \in L_{\infty}^{1} (R^{d}) \cap H^{s + 1} (R^{d})

∥ u ∥_{H_{k}^{s}} \leq C (d, s, k) ∥ u ∥_{L_{h (d, s, k)}^{1}}^{1 - θ (d, s)} ∥ u ∥_{H^{s + 1}}^{θ (d, s)} .

Then, again from B.2 , all

∥ u ∥_{L_{h (d, s, k)}^{1}} (t)

norms are bounded on

[0, T]

, so from B.3 and Lemma B.2 there exists some constants

C

such that

\frac{d}{d t} ∥ u ∥_{H^{s}}^{2} \leq - ∥ u ∥_{H^{s + 1}}^{2} + C ∥ u ∥_{H^{s + 1}}^{2 θ} \leq - \frac{1}{2} ∥ u ∥_{H^{s + 1}}^{2} + C \leq - C ∥ u ∥_{H^{s}}^{2 / θ} + C .

In other words

A (t) = ∥ u ∥_{H^{s}}^{2} (t)

satisfies on

[0, T]

the differential inequality

\begin{matrix} A^{'} (t) + c A (t)^{p} \leq C \end{matrix}

(B.4)

for some constants

c, C \geq 0

and

p = 1 / θ > 1

depending only on

d

a

E

s

and

T

Let us distinguish two cases. If

A (0) \leq 1

, then we only use the inequality

A^{'} (t) \leq C

to make sure that

A (t) \leq A (0) + C t \leq 1 + C T

for any

t \in [0, T]

If on the other hand

A (0) \geq 1

, we deduce from B.4 that

A^{'} (t) + c A (t)^{p} \leq C A (t),

as long as

A (t) \geq 1

, so that

D (t) : = A (t)^{1 - p}

satisfies the inequality

D^{'} (t) + (p - 1) C D (t) \geq (p - 1) c

which integrates to

D (t) \geq D (0) e^{(1 - p) C t} + \frac{c}{C} (1 - e^{(p - 1) C t}) \geq \frac{c}{C} (1 - e^{(p - 1) t}) .

As a consequence, as long as

A (t) \geq 1

, we have

A (t) \leq (c / C)^{1 / 1 - p} (1 - e^{(p - 1) t})^{1 / (1 - p)} .

In the end, we have obtained an a priori bound on

A (t) = \int {| \partial^{α} ρ |}^{2} (t)

for

t \in (0, T]

, depending only on

d, s, a, E

and

T

, but not on the initial value

A (0)

. Then the proof can be concluded by an approximation argument. □

Proof of Lemma B.2 . We proceed by induction on $s$ .
In the first step we prove the result for $s = 0$ . Given $d \geq 1$ and $a \in (0, 1]$ , we write $\int_{R^{d}} 〈 x 〉^{k} | u (x) |^{2} d x = \int_{R^{d}} 〈 x 〉^{k} | u (x) |^{a} | u (x) |^{2 - a} d x,$ so, by Hölder's inequality, $∥ u ∥_{L_{k}^{2}}^{2} \leq ∥ u ∥_{L_{\frac{k}{a}}^{1}}^{a} ∥ u ∥_{L^{\frac{2 - a}{1 - a}}}^{2 - a}$ (with $\frac{2 - a}{1 - a} = \infty$ if $a = 1$ ). Then by Sobolev embedding, $∥ u ∥_{L_{k}^{2}}^{2} \leq C (d, a) ∥ u ∥_{L_{\frac{k}{a}}^{1}}^{a} ∥ u ∥_{H^{1}}^{2 - a},$ where $a = 1$ if $d = 1$ , $a$ is arbitrary in $(0, 1)$ if $d = 2$ , and $a = \frac{4}{d + 2}$ if $d \geq 3$ , that is, $∥ u ∥_{L_{k}^{2}} \leq C (d) ∥ u ∥_{L_{\frac{k}{a}}^{1}}^{1 - θ (d)} ∥ u ∥_{H^{1}}^{θ (d)}$ where $θ (1) = \frac{1}{2}$ , any $θ (2) \in (\frac{1}{2}, 1)$ for $d = 2$ , and $θ (d) = \frac{d}{d + 2}$ for $d \geq 3$ .
In the second step we let $s \geq 1$ and assume by induction that there exist some constants $C (d, s - 1, k), h (d, s - 1, k) \geq 0$ and $θ (d, s - 1) \in (0, 1)$ such that for all $u \in L_{\infty}^{1} (R^{d}) \cap H^{s} (R^{d})$ :
$∥ u ∥_{H_{k}^{s - 1}} \leq C (d, s - 1, k) ∥ u ∥_{L_{h (d, s - 1, k)}^{1}}^{1 - θ (d, s - 1)} ∥ u ∥_{H^{s}}^{θ (d, s - 1)} .$ Let then $u \in L_{\infty}^{1} (R^{d}) \cap H^{s + 1} (R^{d})$ .
Given $α \in N^{d}$ with $| α | = j$ and $1 \leq j \leq s$ , we split $α$ into $α = α_{1} + α_{2}$ with $| α_{2} | = 1$ , and integrate by parts:
$∥ \partial^{α} u ∥_{L_{k}^{2}}^{2} \leq k ∥ \partial^{α_{1}} u ∥_{L_{2 k - 2}^{2}} ∥ \partial^{α} u ∥_{L^{2}} + ∥ \partial^{α_{1}} u ∥_{L_{2 k}^{2}} ∥ \partial^{α + α_{2}} u ∥_{L^{2}} \leq (k + 1) ∥ \partial^{α_{1}} u ∥_{L_{2 k}^{2}} {sup}_{| α | \leq j + 1} ∥ \partial^{α} u ∥_{L^{2}},$
whence
${sup}_{| α | = j} ∥ \partial^{α} u ∥_{L_{k}^{2}}^{2} \leq (k + 1) {sup}_{| α | = j - 1} ∥ \partial^{α} u ∥_{L_{2 k}^{2}} {sup}_{| α | \leq j + 1} ∥ \partial^{α} u ∥_{L^{2}} \leq (k + 1) {sup}_{| α | \leq s - 1} ∥ \partial^{α} u ∥_{L_{2 k}^{2}} {sup}_{| α | \leq s + 1} ∥ \partial^{α} u ∥_{L^{2}} .$
Since this holds for any $1 \leq j \leq s$ we obtain ${sup}_{1 \leq | α | \leq s} ∥ \partial^{α} u ∥_{L_{k}^{2}}^{2} \leq (k + 1) {sup}_{| α | \leq s - 1} ∥ \partial^{α} u ∥_{L_{2 k}^{2}} {sup}_{| α | \leq s + 1} ∥ \partial^{α} u ∥_{L^{2}} .$ Moreover $∥ u ∥_{L_{k}^{2}}^{2} \leq ∥ u ∥_{L_{2 k}^{2}} ∥ u ∥_{L^{2}} \leq {sup}_{| α | \leq s - 1} ∥ \partial^{α} u ∥_{L_{2 k}^{2}} {sup}_{| α | \leq s + 1} ∥ \partial^{α} u ∥_{L^{2}},$ so that finally $∥ u ∥_{H_{k}^{s}}^{2} \leq (k + 1) ∥ u ∥_{H_{2 k}^{s - 1}} ∥ u ∥_{H^{s + 1}} .$ Then, by induction hypothesis, $∥ u ∥_{H_{k}^{s}}^{2} \leq (k + 1) C (d, s - 1, 2 k) ∥ u ∥_{L_{h (d, s - 1, 2 k)}^{1}}^{1 - θ (d, s - 1)} ∥ u ∥_{H^{s}}^{θ (d, s - 1)} ∥ u ∥_{H^{s + 1}},$ whence $∥ u ∥_{H_{k}^{s}} \leq C (d, k, s) ∥ u ∥_{L_{h (d, s, k)}^{1}}^{1 - θ (d, s)} ∥ u ∥_{H^{s + 1}}^{θ (d, s)}$ where $θ (d, s) = \frac{1}{2 - θ (d, s - 1)} \in (0, 1)$ and $h (d, s, k) = h (d, s - 1, 2 k) \geq 0$ . This concludes the argument.
□

Acknowledgments: The authors thank M. Ledoux for his relevant comments and his interest during the preparation of this work, as well as providing Reference [16] .

References

Araujo, A. and Gine, E. The central limit theorem for real and Banach valued random variables John Wiley & Sons, New York, (1980).
Benachour, S., Roynette, B., Talay, D., and Vallois, P. Nonlinear self-stabilizing processes. I. Existence, invariant probability, propagation of chaos. Stochastic Process. Appl. 75, 2 (1998), 173–201.
Benachour, S., Roynette, B., and Vallois, P. Nonlinear self-stabilizing processes. II. Convergence to invariant probability. Stochastic Process. Appl. 75, 2 (1998), 203–224.
Benedetto, D., Caglioti, E., Carrillo, J. A., and Pulvirenti, M. A non-Maxwellian steady distribution for one-dimensional granular media. J. Statist. Phys. 91, 5-6 (1998), 979–990.
Benedetto, D., Caglioti, E., and Pulvirenti, M. A kinetic equation for granular media. RAIRO Modél. Math. Anal. Numér. 31, 5 (1997), 615–641.
Bobkov, S., Gentil, I., and Ledoux, M. Hypercontractivity of Hamilton-Jacobi equations. J. Math. Pures Appl. 80, 7 (2001), 669–696.
Bobkov, S., and Gotze, F. Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. J. Funct. Anal. 163 (1999), 1–28.
Bolley, F., and Villani, C. Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities. To appear in Ann. Fac. Sci. Toulouse. Available online via http://www.umpa.ens-lyon.fr/~cvillani/cv.html#publicationlist, 2004.
Carrillo, J. A., McCann, R. J., and Villani, C. Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates. Rev. Mat. Iberoamericana 19, 3 (2003), 971–1018.
Carrillo, J. A., McCann, R. J., and Villani, C. Contractions in the 2-Wasserstein length space and thermalization of granular media. Preprint, 2004.
Cattiaux, P., and Guillin, A. Talagrand's like quadratic transportation cost inequalities. Available online via http://www.ceremade.dauphine.fr/~guillin/index3.html. Preprint, 2004.
Dembo, A., and Zeitouni, O. Large Deviations Techniques And Applications, second ed. Springer Verlag, New York, 1998.
Desvillettes, L., and Villani, C. On the spatially homogeneous Landau equation for hard potentials. I. Existence, uniqueness and smoothness. Comm. Partial Differential Equations 25, 1-2 (2000), 179–259.
Djellout, H., Guillin, A., and Wu, L. Transportation cost-information inequalities and applications to random dynamical systems and diffusions. Ann. Probab. 32, 3B (2004), 2702–2732.
Gao, F. Moderate deviations and large deviations for kernel density estimators. J. Theor. Prob., 16 (2003), 401–418.
Gine, E. and Zinn, J. Empirical processes indexed by Lipschitz functions Ann. Probab.14 , 4 (1986), 1329–1338.
Ledoux, M. The concentration of measure phenomenon, vol. 89 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, 2001.
Ledoux, M. and Talagrand, M., Probability in Banach spaces. Springer-Verlag, Berlin, 1991.
Malrieu, F. Logarithmic Sobolev inequalities for some nonlinear PDE's. Stochastic Process. Appl. 95, 1 (2001), 109–132.
Malrieu, F. Convergence to equilibrium for granular media equations and their Euler schemes. Ann. Appl. Probab. 13, 2 (2003), 540–560.
Marchioro, C., and Pulvirenti, M. Mathematical theory of incompressible nonviscuous fluids. Springer-Verlag, New York, 1994.
Massart, P. Saint-Flour Lecture Notes. Available at http://www.math.u-psud.fr/~massart, 2003.
Otto, F., and Villani, C. Generalization of an inequality by Talagrand, and links with the logarithmic Sobolev inequality. J. Funct. Anal. 173 (2000), 361–400.
Petrov, V. V. Limit theorems of probability theory. The Clarendon Press Oxford University Press, New York, 1995.
Schochet, S. The point-vortex method for periodic weak solutions of the 2-D Euler equations. Comm. Pure Appl. Math. 49, 9 (1996), 911–965.
Sion, M. On general minimax theorems. Pac. J. Math. 8 (1958), 171–176.
Sznitman, A.-S. Topics in propagation of chaos. In École d'Été de Probabilités de Saint-Flour XIX—1989, vol. 1464 of Lecture Notes in Math. Springer, Berlin, 1991.
Talagrand, M. Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 6 (1996), 587–600.
Villani, C. Topics in optimal transportation. Grad. Stud. Math. (58), American Mathematical Society, Providence, 2003.
Wang, F.-Y. Probability distance inequalities on Riemannian manifolds and path spaces. J. Funct. Anal. 206, 1 (2004), 167–190.

ENS Lyon, Umpa, 46 allee d'Italie, F-69364 Lyon Cedex 07 E-mail address : fbolley@umpa.ens-lyon.fr CEREMADE, Universite Paris Dauphine E-mail address : guillin@ceremade.dauphine.fr ENS Lyon, Umpa, 46 allee d'Italie, F-69364 Lyon Cedex 07 E-mail address : cvillani@umpa.ens-lyon.fr

François Bolley, Arnaud Guillin,

Cédric Villani