Continuum limit of the nonlocal p-Laplacian evolution problem on random inhomogeneous graphs

In this paper we study numerical approximations of the evolution problem for the nonlocal $p$-Laplacian operator with homogeneous Neumann boundary conditions on inhomogeneous random convergent graph sequences. More precisely, for networks on convergent inhomogeneous random graph sequences (generated first by deterministic and then random node sequences), we establish their continuum limits and provide rate of convergence of solutions for the discrete models to their continuum counterparts as the number of vertices grows. Our bounds reveals the role of the different parameters, and in particular that of $p$ and the geometry/regularity of the data.


Problem statement
Our main goal in this paper is to study numerical approximations on random inhomogeneous graphs to a nonlocal nonlinear diffusion problem, involving the nonlocal p-Laplacian operator with homogeneous Neumann boundary conditions. More precisely, the nonlocal p-Laplacian evolution problem with Neumann boundary conditions that we deal with is ∂ ∂t u(x, t) = −∆ K p (u(x, t)), x ∈ Ω, t > 0, u(x, 0) = g(x), x ∈ Ω, where ∆ K p (u(x, t)) = − Ω K(x, y) u(y, t) − u(x, t) p−2 (u(y, t) − u(x, t))dy, with Ω ⊂ R a compact domain, and without loss of generality Ω = [0, 1] 1 . The kernel K ∈ L ∞ (Ω 2 ) is a symmetric and nonnegative mapping. Throughout the paper, we will assume that p ∈]1, +∞[. Existence and uniqueness of a strong solution to (P) in the space L p (Ω) was shown in [15,Theorem 3.1] (relying on arguments from [2]).
The interest for this operator has constantly increased over the last few years, as it appears naturally in the study of nonlocal diffusion processes. It arises in a number of applications such as continuum mechanics, phase transition phenomena, population dynamics, image processing and game theory (see [1,2,14,17] and the references therein). On the other hand, recently, there has been a high interest in adapting and applying disecretized versions of PDEs such as (P) on data defined on arbitrary graphs and networks. Given the discrete nature of data in practice, graphs constitute a natural structure suited to their representation. The demand for such methods is motivated by existing and potential future applications, such as in machine learning and mathematical image processing (see among other references [10,11,13,8]). Indeed, any kind of data can be represented by a graph in an abstract form in which the vertices are associated to the data and the edges correspond to relationships within the data. These practical considerations naturally lead to a discrete time and space approximation of (P).
To do this, fix n ∈ N * . Let G n = (V (G n ), E(G n )), where V (G n ) stands for the set of nodes and E(G n ) ⊂ V (G n ) × V (G n ) denotes the edges set, be a sequence of simple graphs, i.e. undirected graphs without loops and parallel edges.
Next, we consider the fully discrete counterpart of (P) on a graph G n using the forward Euler scheme. For that, let us consider a partition (not necessarily uniform) {τ h } N h=1 , N ∈ N * of the time interval [0, T ] of maximal size τ = max (P d n,τ ) Thus, (P d n,τ ) induces a discrete diffusion process parametrized by the structure of the graph whise adjacency matrix captures the (nonlocal) interactions. As such, it can be viewed as a discrete approximation of a continuous problem such as (P).
Several questions then naturally arise: • Does the discrete problem (P d n,τ ), and in what sense, has a continuum limit (as n → +∞) ?
• What is the rate of convergence to this limit ? Is this limit consistent/related with the unique strong solution of (P) ?
• What are the parameters involved in this rate and what is their influence on the convergence rate ?
This paper provides answers to these questions for graphs drawn from a random model. The 'classical' random graph models, in particular dense graphs, are 'homogeneous', in the sense that the nodes degrees tend to be concentrated around a typical value, so that all vertices are exactly equivalent in the definition of the model. Furthermore, in a typical realization, most vertices are in some sense similar to most others. In contrast, many graphs arising in the real world applications do not have this property and are inhomogeneous. One reason is that the vertices may have been 'born' at different times, with old and new vertices having very different properties. In particular, in many examples the degree distribution follows a power law. Thus, there has been a lot of recent interest in defining and studying networks in 'inhomogeneous' random graph models (see Section 2 for further details). That is why our aim is to investigate this graph model to study the limit p-Laplacian discrete approximation.

Contributions and relation to prior work
In [21] and earlier [22], the author studied convergence of discrete approximations of a nonlinear heat equation governed by a Lipschitz continuous potentiel, first on deterministic graphs and then on random ones, both being dense, without discretization of time. This last result can not be applied to the p-Laplacian, which requires much more sophisticated arguments. Moreover, the result in [22] are asymptotic by nature as they essentially reply on the central limit theorem.
In [15], we provided a rigorous justification of the continuum limit (P) for the discrete p-Laplacian on deterministic dense graphs. The analysis of the continuum limit in [15] uses ideas from the theory of dense graph limits [19,6,18], which for every convergent family of dense graphs defines the limiting object, a measurable symmetric and bounded function K. This function is called a graphon. It captures the connectivity of G n for large n. In [15], for convergent sequences of deterministic dense graphs {G n } n∈N , it was shown that with the kernel in (P) taken to be the graphon associated to {G n } n∈N , the solution of (P) is well-approximated by those of the totally discrete problems (P d n,τ ) for large n and small discretization time step τ . However, the analysis in [15] does not cover networks on inhomogeneous graphs nor does it deal with random graph models. The latter have many important applications. The main contribution of our paper is to bridge this gap by focusing on evolution systems on inhomogeneous random graphs.
Combining tools from evolution equations, random graph theory and deviation inequalities, we establish nonasymptotic rate of convergence of the discrete solution to its continuum limit with high probability. More precisely, we start by considering the case of random graph models generated by a deterministic sequence of nodes. We prove nonasymptotic error bounds that hold with high probability. These results serve as a basis to deal with the totally random graph model, i.e.; where both the nodes and edges are random. In turn, this shows convergence of solutions for the discrete model to the solution of the continuum problem as the number of vertices n grows. To get the corresponding convergence rate, we additionally assume that the kernel K and the initial data g belong to the very large class of to the Lipschitz spaces Lip(s, L q (Ω 2 )) and Lip(s ′ , L q (Ω)). Roughly speaking, Lip(s, L q (Ω 2 )) contains functions with s "derivatives" in L q (Ω 2 ). They contain in particular functions of bounded variation and those of fractal structure for appropriate values of s, see (see Appendix A for a brief introduction to these functional spaces). Using in addition arguments from approximation theory on these spaces, we get convergence rates that reveal the role of the value of p and the regularity of the graphon K and the initial data g both on the rate and the probability of success. In particular, we isolate three different regimes where the rate exhibits different scalings.

Paper organization
The rest of the paper is organized as follows. In Section 2, we give the definition of the inhomogeneous random model that we deal with throughout the paper and specify the assumptions needed to get our results. We finish the section by giving an example for which our assumptions are verified. Section 3 is devoted to the main result of the paper. We begin our analysis by treating random graph sequences generated by deterministic nodes in Section 3.1. Then, in Section 3.2 we consider the general model defined previously in Section 2. After getting the convergence of the discrete model to its continuum limit and identifying the corresponding rate, in Section 3.3, we discuss the different regimes of the convergence rate as a function of the problem parameters. Some technical material is deferred to Appendix A.

Notations
For a graph G = (V (G), E(G)), two vertices i, j ∈ V (G) are adjacent, if they are connected by an edge. Let G n = (V (G n ), E(G n )), n ∈ N * , be a sequence of inhomogeneous, finite, and simple graphs.
For a given vector u = (u 1 , · · · , u n ) ⊤ ∈ R n , we define the norm · p,n For an integer n ∈ N * , we denote [n] = {1, · · · , n}. For any set S, S is its closure and |S| is its cardinality or its Lebesgue measure (to be understood from the context). χ S is the characteristic function of the set S (takes 1 in it and 0 otherwise). C(0, T ; L p (Ω)) denotes the space of uniformly time continuous functions with values in L p (Ω). For d ∈ {1, 2}, Lip(s, L q (Ω d )) is the Lipschitz space which consists of functions with, roughly speaking, s "derivatives" in L q (Ω d ) [9, Ch. 2, Section 9]. Only values s ∈]0, 1] are of interest to us. See Section A.2 for further details on these spaces and approximation theoretic results on them.
2 The random inhomogeneous graph model

The graph model
We start with the description of the model of inhomogeneous random graphs that will be used throughout. This random graph model is motivated by the construction of inhomogeneous random graphs in [3,4,5].
Definition 2.1. Fix n ∈ N * and let K be a symmetric measurable function on Ω 2 . Generate the graph G n = (V (G n ), E(G n )) def = G qn (n, K) as follows: 1) Generate n independent and identically distributed (i.i.d.) random variables (X 1 , · · · , X n ) def = X from the uniform distribution on Ω. Let X (i) n i=1 be the order statistics of the random vector X, i.e. X (i) is the i-th smallest value.
2) Conditionally on X, join each pair (i, j) ∈ [n] 2 of vertices independently, with probability where and where q n is non-negative and uniformly bounded in n.
A graph G qn (n, K) generated according to this procedure is called a K-random inhomogeneous graph generated by a random sequence X.
At this stage, the following important remark is in order.
Remark 2.1. In the context of numerical analysis, we are primarily interested not only in the error bounds of the discrete problem, but more importantly in the (nonasymptotic) rate of convergence. This is why our attention aims specifically at this graph model and not at the original inhomogeneous random model defined in [3,4], i.e. the model constructed replacing (1) by Our error bounds of the discrete problem (P d n,τ ) cover also this graph model, and more specifically, the first statements of Theorem 3.1 and Theorem 3.2 hold. However, with this model, even our convergence claim (not to mention the rate) of the discrete scheme does not hold unless the kernel K and the intial data g are additionally supposed almost everywhere continuous.
We denote by x = (x 1 , · · · , x n ) the realization of X. To lighten the notation, we also denote (4) As the realization of the random vector X is fixed, we define In the rest of the paper, the following random variables will be useful. Let λ ij , (i, j) ∈ [n] 2 , i = j, be i.i.d. random variables such that q n λ ij follows a Bernoulli distribution with parameter q n ∧ K x nij . We consider the i.i.d. random variables Υ ij such that the distribution of q n Υ ij conditionally on is the expectation operator (here with respect to the distribution of X).
We now formulate our assumptions on the graph sequence {G qn (n, K)} n∈N . Assumption 2.1. We suppose that q n and K are such that the following hold: (A.1) G qn (n, K) converges almost surely and its limit is the graphon K ∈ L ∞ (Ω 2 ); (A.2) inf n≥1 q n > 0 and sup n≥1 q n < +∞.

Example
Although we shall give a general result throughout the paper, it may help to bear in mind one particular example of the general class of models we shall study. This example is inspired by the so-called almost dense (or non uniform) random graphs (see [4,Section 3.4]).
is a symmetric measurable function. Choose the parameter q n = n −g(n) such that g(n) log(n) = O(1). Then, assumptions (A.1) and (A.2) are in force.
Proof . Since the graphon K ∈ L ∞ (Ω 2 ), the arguments to prove [4, Lemma 3.5 and Lemma 3.8], that were designed for the graph model described in Remark 2.1, can be adapted to cover our graph model with (1) to show that the sequence of random graphs G qn (n, K) indeed converges almost surely to the graphon K in the metric d sup (see [4, Section 2.1] for details about this metric). This shows (A.1). As we suppose that g(n) log(n) = O(1), we get immediately that (A.2) is verified.
Observe that taking the trivial choice q n = O(1), one recovers the dense random graph model extensively studied in [19,7].

Consistency of the nonlocal p-Laplacian on random inhomogeneous graphs
Having defined the structure of the network, we are now in position to state our main error bounds between the discrete dynamics and their continuous ones. First, in Section 3.1, we assume that X is deterministic. Capitalizing on this result, we will then deal with the totally random model (i.e.; generated by random nodes) in Section 3.2 by a simple marginalization argument.

Networks on graphs generated by deterministic nodes
We define the parameter δ(n) as the maximal size of the spacings between the the ordered values Next, we consider the following system of difference equations on G qn (n, K) 2 : Recall from Section 2 that λ ij are the i.i.d. random variables such that q n λ ij follows the Bernoulli distribution with parameter q n ∧ K x nij . Before turning to our convergence result, we pause here to make the following two important observations. Remark 3.1. Coming back to Definition 2.1, one can easily check that G qn (n, K) is actually a product probability space 3 So that, rigorously speaking, if we take a random event ω from Ω n , problem (P d,d n ) must be written using λ ij (ω) instead of λ ij , and likewise for all other random variables. For notational simplicity, we drop ω. But it is important to keep in mind that the evolution equations we write involving random variables must be understood in this sense. Remark 3.2. As the reader may have remarked, the sum in the right-hand side of (P d,d n ) is divided by n instead of a weighted sum with weights We are now in position to tackle our main goal: comparing the solutions of the discrete and continuous problems and establish our rate of convergence. Since the two solutions do not live on the same spaces, it is reasonable to represent some intermediate model that is the continuous extension of the discrete problem, using the vector U h = (u h 1 , u h 2 , · · · , u h n ) ⊤ whose components uniquely 4 solve the previous system (P d,d n ) to obtain the following piecewise linear interpolation on and a piecewise approximation Then,ǔ n uniquely solves the following problem Toward our goal of establishing error bounds, we need an intermediate discrete problem for the p-Laplacian. This is defined as The discrete problem ( ∧ P d n ) can also be viewed as a discrete p-Laplacian evolution problem over a complete 5 weighted graph on n vertices, where the weight of edge (i, j) is , similarly to before, we define the following linear interpolation on Ω and a piecewise-constant approximation We also define the piecewise-constant extension Then, by construction,v n (x, t) uniquely solves the following problem . The first main result of the paper is the following theorem.
is a symmetric and measurable mapping, and g ∈ L ∞ (Ω). Let u and U h denote the unique solutions to (P) and (P d,d n ), respectively. Letǔ n be the continuous extension of U h given in (7). Then, the following hold: with probability at least 1 − n −C min(q 2p−1 n ,q p n) β . 5 Recall that a complete graph is a simple undirected graph in which each pair of vertices is connected by an edge.
Before proceeding to the proof, some remarks are in order.

Remark 3.3.
(i) By Lemma A.2, it is clear that the first term in the bounds (12)-(13) can be replaced by (ii) The constant in (12) depends on p and the data via g L ∞ (Ω) and K L ∞ (Ω) . For the bound (13), it also depends on (q, s, s ′ ).
(iii) One may wonder if the functional space assumption made on g and K in claim (ii) is reasonable or even makes sense. The answer is affirmative. Indeed, Lipschitz spaces are rich enough to include both functions with discontinuities and even fractal structure. For instance, from [18], one can show that the graphon corresponding to the nearest neighbour graphs, which are very popular in practice (e.g. in image processing [11,10] To prove Theorem 3.1, we first show the following key lemma.
Under the assumptions of Theorem 3.1, for T > 0, there exists a positive constant C, such that for any β > 0 (the constant C 3 is given in the proof ).
Proof of Lemma 3.1. For 1 < p < +∞, we define the function Observe thatv n (·, t) andǔ n (·, t) are both constants over Ω x ni . Similarly,v n (·, t) andū n (·, t) are also constants over the cell Ω x ni . We therefore used the shorthand notations for the vector-valued functionsū n (t) = (ū ni (t)) i∈[n] def = (ū n (x i , t)) i∈[n] andv n (t) = (v n (t)) i∈[n] def = (v n (x i , t)) i∈ [n] , and likewise forǔ n (t) andv n (t). Let us denoteξ n (t) =ǔ n (t) −v n (t) andξ n (t) =ū n (t) −v n (t). By subtracting both sides of (P n ) from those of ( ∧ P n ), evaluated at the cell Ω x ni , we obtain where For notational convenience, we denote α ij (t) We multiply both sides of (15) by 1 n Ψ(ξ ni (t)) and sum over i to obtain We estimate the first term on the right-hand side of (17) using the Hölder inequality, to get Now, using the fact that 0 ≤ λ ij ≤ 1/q n , ∀(i, j) ∈ [n] 2 and applying [15, Corollary B.1] to the function Ψ between a =v nj (t) −v ni (t) and b =ū nj (t) −ū ni (t) (without loss of generality, we suppose that b > a), we get where η n (t) is an intermediate value between a and b. Using that fact that g ∈ L ∞ (Ω) and the construction ofū n (·), we deduce from [15, Theorem 3.1(ii)] that for t ∈ [0, T ] Inserting (20) into (19), and then using the Hölder and triangle inequalities, it follows that Using the triangle inequality combined with the result of [15, Lemma 5.2], we have Putting together (18), (21) and (22), we have d dt ξ n (t) p p,n ≤ Z n (t) p,n ξ n (t) p−1 p,n + 2C 2 (p − 1)/q n C ′′ τ + ξ n (t) p,n ξ n (t) Then, from (23) via the Gronwall's inequality in its differential form (see, e.g., [ Since we suppose that q n verifies Assumption (A.2), then exp 2C  We are now ready to prove our main result.
Since by construction ∧ K n is a bounded mapping, we bound the first term on the right-hand side of (25) using [15, Theorem 5.1] to get Inequality (12) then follows by combining (26) with (14).

Networks on graphs generated by random nodes
Let us now turn to the totally random graph model. Consider the following system of difference equations on the totally random graph G qn (n, K) 7 : (P r,d n ) As we have done before, we consider the continuous extension of the solution vector and a piecewise approximation Then, we have ∂ ∂tǔ n (x, t) = −∆ Γn p (ū n (x, t)), x ∈ Ω, t > 0, u n (x, 0) = g n (x), x ∈ Ω (P r n ) where , and the random variable Γ n is such that Γ n (x, y) = Υ ij for (x, y) ∈ Ω X nij .
If conditioned with respect to a realization x = (x 1 , · · · , x n ) of the random vector X, problem (P r,d n ) can be rewritten on G qn (n, K) in the following form (P d n ) By capitalizing on the results obtained for the the case where {G qn (n, K)} n∈N was generated by the deterministic sequence x, we get the following result.
The dependence of the constant C in the parameters is similar to Remark 3.3(ii).
As a preparatory step to prove Theorem 3.2, the following lemma is instrumental. It establishes that the spacings between the n uniformly distributed nodes are O(log(n)/n) with high probability. Lemma 3.2. Consider the sequence of random spacings (X (1) , X (2) − X (1) , · · · , 1 − X (n) ), where we recall X (i) n i=1 are the order statistics of X. Let t ∈]0, e[. Then, for any i ∈ [n] with probability at least 1 − n −t .
Proof of Lemma 3.2. Since X i are i.i.d. uniform random variables on Ω, we have, by virtue of [23, Theorem 1.6.7] that the random variables δ i , i ∈ [n], have the same distribution as the random variables Z i / n+1 k=1 Z k , where Z 1 , · · · , Z n+1 are i.i.d standard exponential random variables. In addition, invoking [23, Lemma 1.6.6], we know that S n+1 def = n+1 k=1 Z k is a Gamma random variable with parameters (1, n + 1) (thus having the density f S n+1 (s) = e −s s n /n!, s ≥ 0). Now, combining these two observations, we obtain by straightforward integral calculations that for any ε ∈ [0, 1[ The equality of the second line stems from an equality in distribution, since S n+1 − Z i has the same distribution as S n and Z i has the same distribution as Z n+1 , and the fact that Z i and S n+1 − Z i are independent. Taking ε = t log(n) n ∈]0, 1[, and using the standard inequality log(1 − u) ≤ −u, for u ∈ [0, 1], we get Proof of Theorem 3.2. The idea of the proof is to take the conditional probability with respect to a fixed realization x = (x 1 , · · · , x n ) of the random vector X, then use the bound in Theorem 3.1, which is independent of x, and finally integrate with respect to the uniform density on Ω n . with Hence, the desired result, (30) follows from the fact that the obtained estimate in (12) is uniformally independent of the random choice of x.
(ii) In view of (27), we can argue that Taking κ = max(C(p, q, s), C(p, q, s ′ )) t log(n) n θ , for t ∈]0, e[, applying Lemma 3.2, and using a union bound we deduce that the events simultaneously hold with probability at least 1 − 2n −t . Denote the events , with C the largest constants among the one in claim (i) and max(C(p, q, s), C(p, q, s ′ )). Using again a union bound, we get which leads to the desired claim.

Rate regimes
A close inspection of the error bound in (31) (Theorem 3.2) reveals three contributions: • Spatial discretization: the first contribution is materialized in the first term which scales as (see Remark 3.3 This term represents the spatial discretization error when approximating the continuous evolution equation (P) on the random inhomogeneous graph model G qn (n, K) generated according to Definition 2.1 with the graphon K.
• Data approximation: the second term is O log(n) n θ which captures the error of discretizting the initial data g and the graphon K. The presence of the error on K is clearly tied to the nonlocal nature of the evolution equation on graphs. This approximation error depends on the regularity of g and K, and the latter encodes the geometry/structure of the underlying graphs. The more regular g and K are, the faster the convergence rate.
• Time discretization: the last term, which is O(τ ), is classical and corresponds to the time discretization error.
At this stage, one may wonder which of the first two terms dominate, or in other words, what are the different regimes exhibited by the convergence rate as a function of the problem parameters (p, q, s, s ′ ). This is quite important as it will reveal which nonlocal p-Laplacian evolution problems are harder/easier to discretize by highlighting the role of each parameter, and for instance that of p and the impact of nonlocality (i.e. graphon structure).
Toward this goal, we first make the error measure in (31) independent of p and we choose to quantify the error in the classical L 2 (Ω) norm. Consequently, thanks to Lemma A.2 and Lemma A.3, as well as boundedness of the solutions, it is not difficult to see that u −ǔ n C(0,T ;L 2 (Ω)) = holds with probability at least 1 − n −C min{q 2p−1 n ,q p n }β + 2n −t . To make the rest of the discussion more concrete and also guarantee the convergence of the sequence {G qn (n, K)} n∈N to the graphon K, we will work under the assumptions of the example in Section 2.2, i.e. q n = n −g(n) with g(n) ≤ c/ log(n) for some c > 0. Observe that q n ∈]0, 1], and since p > 1, we have max q −(1−1/p) Thus, the second term in (35) reads O n − min(p/4,1/2) .
Without loss of generality 8 , we also suppose that s = s ′ and q ≤ p so that θ = sq/p ∈]0, q/p] ⊂ ]0, 1]. In this setting, (35) reads u −ǔ n C(0,T ;L 2 (Ω)) = O log(n) n min(1/p,1/2,sq/p) min(p/2,1) The term depending on n then exhibits four different regimes as a function of p, s and q (see Figure 1). Indeed, it is straightforward to see that it scales as  In particular, the convergence rate shows a transition phenomenon at p = 2. The rate increases with p for p ∈]2, +∞[ while it decreases with p for p ∈]1, 2] and sq ∈ [p/2, p]. As expected, the dependence of the rate on the initial data g and graphon K is more prominent as they become irregular, i.e. for smaller values of sq. For small sq and p ∈]1, 2], the rate is independent of p.

A.1 A key deviation result
The following lemma establishes a key deviation inequality for sup t∈[0,T ] Z n (t) p,n where Z n (·) is the random process defined in (16).
Lemma A.1. Let Z n (·) be the random process defined in (16). Then, we have (i) For p ∈]1, +∞[, T > 0, there exists a positive constant C, such that for any β > 0 P sup where C 3 is a precise constant which will be explicited in the proof.
(ii) For p ∈ [2, +∞[, suppose that there exists a positive constant C, such that for Then, To prove this lemma, we need the following deviation inequalities that we include for the reader convenience.
Rosenthal's inequality [16]. Let n be a positive integer, γ ≥ 2 and U 1 , . . . , U n be n zero mean independent random variables such that sup Bernstein's inequality [20]. Let n be a positive integer and U 1 , . . . , U n be n zero mean independent random variables such that there exists a positive constant M satisfying sup Then, for any υ > 0, Proof of Lemma A.1. (i) Let us recall that q n λ ij are i.i.d random variables following the Bernoulli distribution with parameter q n ∧ K x nij . For the sake of simplicity, set, for It remains to bound E (Y ni ). We distinguish the case when p ≥ 2 and p ∈]1, 2[.
• p ≥ 2. Using the Rosenthal inequality with the independent according to j centered random variables U nij We have Taking p = 2, we get Since α ij and q n ∧ K x nij are both bounded and p being greater than 2, there exists C 2 > 0, such that, • p ∈]1, 2[. With the same steps as above, since p ∈ [1, 2[, applying the Jensen inequality first for the concave function x → x p/2 and second for the convex function x → x 2 , we have Therefore, we have again Thus, for any p > 1, we get Hence, setting W ni = Y ni − E (Y ni ) and λ = ε p − C 3 max q −(p−1) n , q −p/2 n 1 n p/2 , we have Let ε > 0 such that λ > 0. Observe that the random variables {W ni } n i=1 are independent, centred, and obey: Y ni ≤ C 4 , since α ij and q n ∧ K x nij are both bounded.
Replacing the exponent "p" in inequality (37), by "2p" which is greater than 2, we obtain We are then in position to apply the Bernstein inequality to {W ni } n i=1 according to the index i, whence we get, after some elementary algebra n , q p n nλ 2 n −p + λ .
Taking λ = β log(n) n > n −p , for p > 1, we have after straightforward calculations For this choice of λ, observe that For p ∈ [2, +∞[, applying the Jensen inequality twice, we have

A.2 Approximation theoretic results
In an effort to make this paper more self-contained we briefly recall some results on functional spaces and approximation theory that our work relies on. But before this, we state the following classical lemma which is useful throughout the paper.
Definition A.1. For F ∈ L q (Ω d ), q ∈ [1, +∞], we define the (first-order) L q (Ω d ) modulus of smoothness by The Lipschitz spaces Lip(s, L q (Ω d )) consist of all functions F for which We restrict ourselves to values s ∈]0, 1] as for s > 1, only constant functions are in Lip(s, L q (Ω d )). It is easy to see that F Lip(s,L q (Ω d )) is a semi-norm. Lip(s, L q (Ω d )) is endowed with the norm F Lip(s,L q (Ω 2 )) def = F L q (Ω 2 ) + F Lip(s,L q (Ω d )) .
Clearly, F n is nothing but the orthogonal projection of F on the n 2 -dimensional subspace of L q (Ω 2 ) defined as Span χ Ω nij : (i, j) ∈ [n] 2 .
An immediate consequence is the following result. Lip(s,L q (Ω d )) δ sq/p otherwise, where we used (42) (resp. Lemma A.3) and Lemma A.4 in the first (resp. second) case.