Computing homogenized coefficients via multiscale representation and hierarchical hybrid grids

We present an efficient method for the computation of homogenized coefficients of divergence-form operators with random coefficients. The approach is based on a multiscale representation of the homogenized coefficients. We then implement the method numerically using a finite-element method with hierarchical hybrid grids, which is a semi-implicit method allowing for significant gains in memory usage and execution time. Finally, we demonstrate the efficiency of our approach on two- and three-dimensional examples, for piecewise-constant coefficients with corner discontinuities. For moderate ellipticity contrast and for a precision of a few percentage points, our method allows to compute the homogenized coefficients on a laptop computer in a few seconds, in two dimensions, or in a few minutes, in three dimensions.


Statement of the main results
The goal of this paper is to define, study, and implement an efficient approach to the calculation of homogenized coefficients for divergence-form operators with random coefficients. That is, we consider operators of the form ∇ · a∇, where a = (a( )) ∈R is a random coefficient field on R taking values in the set of symmetric positive definite matrices. We assume that this random coefficient field is uniformly elliptic, Z -stationary, and of unit range of dependence; see Section 2.1 for precise statements. Under these assumptions, there exists a homogenized matrix a such that the large-scale properties of the heterogeneous operator ∇ · a( )∇ resemble those of the homogeneous operator ∇ · a∇. We define a multiscale method allowing to compute the homogenized matrix efficiently, and identify rigorously its rate of convergence. We then explain how to implement the algorithm in practice, using the notion of hierarchical hybrid grids, and demonstrate its performance on examples.
For these numerical examples, we consider coefficient fields that are piecewise constant on a square tiling, in two dimensions, or on a cubic tiling, in three dimensions. This class of examples is particularly challenging from a computational perspective. Indeed, solutions develop singularities at the corners of the tiling which are essentially the worst possible in the class of (isotropic) coefficient fields with fixed ellipticity contrast (see Sect. 5.1). Despite this, for moderate ellipticity contrast and for a precision of a few percentage points, our algorithm runs on a laptop computer and outputs a satisfactory approximation of the homogenized matrix within a few seconds in two dimensions, and within a few minutes in three dimensions. Our code is written in the Julia language and is freely available online, see the link in (6.1).
The method we explore here was introduced in [52] in the context of discrete finite-difference equations. The main idea is to decompose the homogenized matrix into a series of terms, each of which accounting for a different length scale. The terms associated with short length scales naturally enjoy very small boundary layers and low computational effort. Those associated with larger length scales are a priori more demanding, but only appear as small correction terms in the decomposition, and can thus be computed on much smaller sample domains. Overall, this second effect more than compensates for the increase in computational effort, so that the majority of the computational time and memory is spent on the shortest length scales. A prominent feature of the method is that minimal effort is spent on the calculation of boundary layers. An additional benefit is that the method can be refined on the fly: if some calculations have already been performed and one realizes that more precision is necessary, then one does not need to throw these past calculations away and restart from scratch.
We now describe this method more precisely. We fix ∈ R of unit norm, and introduce the quantities that will allow us to compute · a . We let −1 ∈ −1 loc (︀ R )︀ be −1 ( ) := ∇ · (a( ) ), (1.1) and for each ∈ N, we define inductively ∈ 1 loc (︀ R )︀ to be the unique stationary solution to We also give ourselves a bump function ∈ ∞ (︀ R )︀ with compact support in the unit ball (0, 1) and such thatˆR = 1.
For every 1, we set ( ) := − (︀ )︀ · (1.4) The following theorem is our main theoretical result. Recall that we assume the law of the coefficient field (a( )) ∈R to be invariant under translations by vectors of Z . If we make the stronger hypothesis that the law is invariant under translations by any vector of R , then we can replace each average against a smooth mask in (1.6) by an average over the cube (− , ) . However, under our current more restrictive assumption of invariance under translations by vectors of Z , this replacement will only work if we make sure that the side length of the box is an integer. In other words, we would need to know the identity of the underlying lattice of periods (which without loss of generality was fixed here to be Z ) and to make sure that the domain over which we take the average contains an integer number of fundamental cells. In contrast, the formulation in Theorem 1.1 does not require that we identify the lattice of periods.
A result comparable to Theorem 1.1 was proved in [52] in the context of discrete finite-difference equations. Besides the adaptation to the continuous setting, there are two main differences between the present result and the one obtained in [52]. The first one is that the quantities on the right side of (1.6) are averages against a smooth mask, while only box averages could be handled in [52]. The second and most important difference is that Theorem 1.1 gives an exponential tail estimate for the probability in (1.7), while the result in [52] was limited to a variance estimate. We expect the estimate (1.7) to be sharp, in the sense that we do not expect that it is possible to replace by on the right side of (1.7) for an exponent > 1 that would be independent of > 0. The implementation of the method proposed in Theorem 1.1 requires the accurate calculation of ∇ 0 and of 0 , . . . , in 2 over the progressively smaller and smaller balls (0, 0 ), . . . , (0, ). As stated in (1.2), the equation satisfied by is posed over the full space R . In practice, we can approximate these problems by selecting a sufficiently large constant bl ("bl" for "boundary layer"), and then solving for̃︀ ∈ 1 0 (︁ (︁ 0, with null Dirichlet boundary condition on (︁ 0, + bl (1 + )2 2 )︁ , and where we have set̃︀ −1 := −1 . The error in this approximation decays exponentially fast as we increase bl (︁ this can be proved using that the Green function decays like exp (︁ −2 − 2 | | )︁)︁ . As a rule of thumb, one should think of choosing bl of the order of √︀ |a|, where |a| is a measure of the typical size of the eigenvalues of a( ) (or, to be more specific, one can take bl to be of the order of √ Λ). The additional multiplicative factor of (1 + ) allows for a progressive increase of the boundary layer as we increase and aim for finer and finer approximations of a. A simple error analysis suggests that the optimal choice for the size of the boundary layer should be an affine function of , and we chose it to be a multiple of (1 + ) for simplicity, but more refined choices can save some computations.
For simplicity, we implemented the version of the method described in Theorem 1.1 with = 0. Strictly speaking, this case is not covered by Theorem 1.1, but a modification of the arguments presented below would in this case yield (1.7) with 2 − 2 replaced by 2 − 2 (1− ) , for arbitrary > 0. (The constant > 0 on the right side would then depend on .) The main power of the method comes from the fact that it splits the problem of calculating · a into multiple scales. Heuristically, the term (or̃︀ ) is meant to capture information related to length scales of the order of 2 2 . When is small, the elliptic problem (1.8) is well-conditioned and has a very small boundary layer, of essentially unit size. As is increased, the elliptic problems in (1.8) become less well-conditioned and involve larger boundary layers. Yet, this is more than compensated by the fact that the domain of interest is rapidly shrinking. In practice, the main part of the computational effort is spent on calculating 0 .
Compared with the discrete setting of finite-difference operators investigated in [52], the case of continuous differential operators considered here poses crucial new challenges from a computational perspective. With applications such as those in materials science in mind, it is natural to consider piecewise constant coefficient fields. We choose to focus more specifically on the case when the coefficient field is constant over each unit square or cube of the form + [0, 1) , where ∈ Z . At least in dimension = 2, this class is essentially the most difficult possible, in the sense that solutions then have the worst possible regularity properties, given the ellipticity contrast -see Section 5.1 for a more precise discussion. As a consequence of the roughness of the solutions, a "coarsest possible" discretization of the coefficient field into finite elements with constant coefficients would yield widely incorrect results. To wit, the algorithm as proposed here would run just fine, but it would compute the homogenized matrix associated with the particular finite-element discretization that is chosen; if the discretization is coarse, then this homogenized matrix will be far from the homogenized matrix of the continuous operator.
To remedy this problem, we thus need to rely on much finer discretizations of the coefficient field. Our method for doing so is strongly inspired by the idea of hierarchical hybrid grids developed in [14,15]. In a nutshell, the starting point is the observation that numerical schemes on fully structured grids with constant coefficients are highly efficient, both from the point of view of time and of memory usage. Unfortunately, the problem we wish to solve is not of this type, since the coefficients vary across the domain. The idea then is to deploy a hybrid representation of the problem, using an unstructured coarse grid to represent the variations of the coefficient field on the one hand, and then proceeding to refine each coarse element in a "fully structured" manner. This allows for very significant gains in memory usage, which is otherwise a fundamental bottleneck, and also in execution time.
We did not make any effort to fine-tune the parameters of the method presented in Theorem 1.1. We indicate here some possible directions for doing so. First, the choice to use successive powers of 2 in (1.2) can be replaced by any other real number larger than 1, up to suitable modifications of the expression in (1.6). Second, for the radii appearing in (1.6), we simply followed the prescription of the theoretical result with = 0, that is, = 2 − 2 . A more fine-tuned method would consist in evaluating the fluctuations of integral averages on the fly and adaptively tune so that the fluctuations of the average get below a certain threshold of the order of 2 − 2 . Finally, the requirements for accuracy are different for 0 , which needs to be controlled in 1 , and for the subsequent 's which only need to be controlled in 2 . We did not try to exploit this feature either, and used approximations of the same quality for all terms.
Although we did not explore this possibility, we point out that the required computations can be performed in parallel in a straightforward way. For instance, instead of computinĝ if one has access to processors, then one can compute are versions of 0 computed on independent realizations of the coefficient field. These computations can obviously be performed without any communication between processors. (If one is given a very large snapshot of a single environment, then effectively independent realizations can be obtained by considering sufficiently distant subregions of the large sample.) See also [39] for more refined techniques allowing for the parallelization of finite-element methods with hierarchical hybrid grids. For simplicity, we assume here that the coefficient field is uniformly elliptic and with a finite range of dependence. However, we expect the results presented here to hold in much greater generality. In particular, we expect that a result comparable with (1.7), but possibly with a more slowly decaying function of on the right side, should hold whenever the local statistics of the coefficient field satisfy a central limit theorem. For more strongly correlated coefficient fields, the method is still of interest, but the choice of in (1.5) and the term 2 − 2 in (1.7) will have to be suitably modified. (This makes the development of a more adaptive algorithm particularly appealing, since such an algorithm could automatically select the optimal scalings without supervision.) Also, in view of [8,23], we expect that the results can be generalized to the case of perforated domains of percolation type.

Related works
Over the last decade, an intensive research effort has been devoted to developing theoretical quantitative results on stochastic homogenization. The multiscale representation of the homogenized coefficients forming the basis of the method is inspired by the "renormalization" approach to quantitative stochastic homogenization, as developed in [9][10][11][12]; see also [53] for a gentle introduction to this line of research and [13] for a monograph. A related approach based on the parabolic flow was put forward in [36], see also Chapter 9 of [13], and will give us the most convenient statement for us to build upon here. A different approach based on concentration inequalities was put forward in [33-35, 37, 38, 47], inspired by earlier insights from statistical mechanics [54,55].
It has been observed long ago that inappropriate boundary conditions for "approximate cell problems" can cause important "resonant errors", and initial attempts at bypassing the problem involved the notion of oversampling [26,41,42,60]. A powerful approach has been studied in [18,27,[30][31][32][33][34]50], based on the introduction of a small zero-order term in the equation. The method we propose here, by combining this idea with a multiscale decomposition, enables to take fuller advantage of this idea. We refer to [52] for a detailed comparison between the single-scale and the multiscale approaches. As is shown in [1], the benefits of the multiscale approach can be seen even in the setting of periodic coefficient fields, if we operate under the constraint that the lattice of periods is unknown.
One alternative method for computing homogenized coefficients, based on the idea of an "embedded corrector problem", is proposed in [21,22]. Well-separated spherical inclusions are considered in the numerical examples. This allows for fairly different approaches to practical calculations than what is pursued in the present paper (and also produces solutions that are more regular than in our examples with corner discontinuities).
For coefficient fields that are very similar to those we investigate numerically here, the standard representative volume method was combined with a tensor-based discretization scheme in [43] to compute homogenized matrices, in two dimensions. The authors of [43] state that their numerical approximation method displays an empirical rate of convergence in 2 of (︀ ℎ )︀ with 3/2, where ℎ measures the size of a discretized element. We believe that this is an artifact of pre-asymptotic effects and moderate ellipticity contrast. Indeed, for any > 0, solutions can develop singularities that fail to be in 1+ , provided that the ellipticity contrast is sufficiently large, and standard finite-element methods provide approximations of these singular solutions that converge in 2 at a rate that is bounded below by ℎ 1+ . In fact, for coefficient fields arranged in a checkerboard-type pattern in two dimensions, as considered in [43] and in the present paper, one can identify exactly the optimal exponent of regularity in terms of the ellipticity contrast: solutions are -regular if and only if < 1 + , where is given in (5.5), as proved in [57] and recalled in Section 5.1 below. In particular, an asymptotic rate of convergence in 2 of (︀ ℎ 3/2 )︀ can only be obtained for values of the ellipticity contrast Λ below 3 + 2 √ 2. We also refer to the right frame of Figure 6 for an illustration of pre-asymptotic effects and slow rates of convergence, for Λ = 90.
Several techniques have been explored to reduce the size of the fluctuations of estimators for the homogenized matrix. In particular, control variate techniques and the selection of special realizations of the coefficient field, called "quasi-random structures", have been explored, see [19,45] for surveys. The latter approach, inspired by [59,61] and, in the context of the homogenization of elliptic operators, advocated for in [46], has recently received a spectacular theoretical foundation in [28]. We would find it very interesting to investigate how these techniques can be combined with those discussed in the present paper.
In a different direction, several works have considered the question of designing and effectively computing certain expansions of the homogenized matrix, in situations where the random medium can be seen as a small perturbation of a reference medium. The most typical scenario is that of a homogeneous medium with a small density of inclusions [48,58]. We refer to [3-7, 16, 25, 44, 51, 56] for works in this area.
To conclude this introduction, we mention that the homogenized matrix can also be of use as part of a modified scheme of multigrid type for computing solutions of elliptic equations with rapidly oscillating coefficients. In short, the idea is to use the homogenized operator when operating on the coarser grids [52].

Organization of the paper
In Section 2, we lay down the notation and make our standing assumptions more precise. We also clarify the meaning of being a stationary solution to (1.2), and recall the definition of the homogenized matrix. We next prove a general multiscale representation of the homogenized matrix in Section 3. By "general", we mean that the finite-range dependence assumption on the coefficient field is not actually used there; assuming ergodicity instead would be sufficient. This is no longer the case in Section 4, where we strongly leverage on the finite-range dependence assumption to obtain sharp quantitative estimates on the different terms appearing in the multiscale decomposition. This allows us to conclude the proof of Theorem 1.1. In Section 5, we explain how to design a finite-element multigrid algorithm using the structure of hierarchical hybrid grids. Finally, we present our numerical results in Section 6. Our code is freely available in the GitHub repository indicated in (6.1).

Precise statement of the assumptions
We fix a constant Λ ∈ [1, ∞) and an integer 2 throughout the paper. We denote by Ω the set of measurable mappings from R to the set of -by-symmetric matrices which satisfy, for almost every ∈ R , For each Borel set ⊆ R , we denote by ℱ the -algebra generated by the mappings where ∞ ( ) denotes the set of smooth functions with compact support in . We also use the shorthand ℱ := ℱ R . For each ∈ R , we denote by : Ω → Ω the action of translation by on Ω, which is such that, for every ∈ R , a( ) = a( + ).
Translations can also operate on events, that is, for every ∈ ℱ and ∈ R , we set := { a : a ∈ }. We assume that we are given a probability measure P on (Ω, ℱ) that, in addition to (2.1), satisfies the following properties: -stationarity with respect to Z translations: for every ∈ Z , we have -unit range of dependence: whenever two Borel sets , ⊆ R are at least at distance 1 away from one another, we have that ℱ and ℱ are independent.
If the latter condition was satisfied with the constant 1 replaced by another fixed number, then we could reduce to the present setting by scaling. Similarly, if stationarity was known to hold along some lattice of R , then we could use a change of coordinates to set it to be Z as in our current assumption.

General notation and function spaces
For every Borel measurable set ⊆ R , we denote by | | the Lebesgue measure of . Whenever | | ∈ (0, ∞), we set, for every ∈ 1 ( ), For each ∈ [1, ∞), we define the rescaled norm of a measurable function as For each ℓ ∈ N ∖ {0}, we denote by ℓ ( ) the classical Sobolev space, with rescaled norm We denote by ℓ 0 ( ) the closure in ℓ ( ) of the space ∞ ( ) of smooth functions with compact support in , and by −ℓ ( ) the dual space to ℓ 0 ( ), equipped with the rescaled norm In the expression above, we used the notation ffl to denote the duality pairing between −ℓ ( ) and ℓ 0 ( ) that is normalized in such a way that whenever and are smooth, the evaluation of this duality pairing coincides with the value of the integral ffl .

Notation for random variables
In order to have concise means to express the size of random variables at our disposal, we write, for each real random variable and , > 0, We also write = ( ) ⇐⇒ ( ) and − ( ).
The notation is homogeneous: we have ( ) if and only if −1 (1). Informally, the statement that (1) means that the right tail of the law of decays like exp(− ). The following lemma makes this precise; see Lemma A.1 of [13] for a proof.
The notion of -bounded random variables is stable under averaging, as the next lemma shows (see [13], Lem. A.4 for a proof).
be a measure over an arbitrary measurable space , : → (0, ∞) be a measurable function and ( ( )) ∈ be a jointly measurable family of nonnegative random variables. We have The key mechanism by which we will witness stochastic cancellations is by appealing to the following lemma.
Lemma 2.3. For every ∈ (1, 2], there exists a constant ( ) < ∞ such that the following holds. Let > 0, 1, ⊆ Z , and for each ∈ , let ( ) be an ℱ (︀ + (− , ) )︀ -measurable centered random variable such that ( ) = ( ( )). We have This lemma is a consequence of Lemmas A.7 and A.11 from [13]. Notice that when specializing Lemma 2.3 to the case when ( ) ≡ does not depend on , and denoting by the cardinality of , we can rewrite the right side of (2.7) as (︁

Definition of homogenized matrix
We now introduce notions related to stationary random fields and solutions, and recall the definition of the homogenized coefficients in terms of correctors. A stationary random field is a measurable mapping : R × Ω → R (for some ∈ N) such that for every ∈ R , ∈ Z and a ∈ Ω, We may also simply say that the mapping is stationary. For instance, the mapping ↦ → a( ) itself is stationary, so the terminology is consistent with the definition in (2.2). As is standard with random objects, most of the time we do not display that a random field depends explicitly on a, and simply write ( ) in place of ( , a). (2.8) That this limit exists and equals the expectation on the right side follows from the ergodic theorem, see [2]. For every ∈ [1, ∞], we write In the case = ∞, the right side is interpreted as which is also the essential supremum of the random variable ‖ ‖ ∞ ([0,1] ) . We denote by ℒ 2 pot the completion in We also define equipped with the norm Any element of ℒ 2 pot can be represented as the gradient of some function ∈ 1 loc (︀ R )︀ , and such a function is defined uniquely up to a constant. However, due to the indeterminacy of this constant, the function itself may fail to be a stationary field, that is, we do not necessarily have ∈ ℋ 1 . We will always write elements of ℒ 2 pot in the form ∇ , bearing this caveat in mind. The functions ( , ∈ N) are defined as elements of ℋ 1 . The equation (1.2) is interpreted, for = 0, as and, for every 1, as For each ∈ ℒ 2 , we define the norm dual to the ℋ 1 norm by setting and we denote by ℋ −1 the completion of ℒ 2 with respect to this norm. An example of an element of ℋ −1 is −1 , see (1.1). By definition, for each ∈ ℋ 1 , the mapping extends to a continuous linear functional over ℋ −1 . Abusing notation slightly, we keep the same notation for the extension. By an integration by parts, we see that (2.9) also makes sense for = 0, provided that the right side is understood in this extended sense (which is the canonical duality pairing between the spaces ℋ 1 and ℋ −1 ). Using the identity (2.8) and stationarity, one can check the following integration by parts formula: for every ∈ ℋ 1 and ∈ (ℋ 1 ) , we have If we only assume ∈ (︀ ℒ 2 )︀ , then this formula allows to interpret ∇ · as an element of ℋ −1 ; similarly, if ∈ ℒ 2 , then we can interpret ∇ as an element of (︀ ℋ −1 )︀ . The gradient of the corrector in the direction of ∈ R is the unique ∇ ( ) ∈ ℒ 2 pot that is a weak solution of This equation is interpreted as The existence of ∇ ( ) can be obtained by considering, for every ∈ (0, 1], the approximation It is indeed straightforward to verify that ∇ ( ) is bounded in ℒ 2 uniformly over ∈ (0, 1], and that any weak limit must be a solution of (2.12). Moreover, the weak convergence of ∇ ( ) to ∇ ( ) in ℒ 2 as tends to 0 can be improved to strong convergence, see e.g. (8.5) of [52]. That is, we have By definition, the homogenized matrix a is such that, for every ∈ R , For the remainder of the paper, we will keep the unit vector ∈ R fixed, and drop it from the notation: in particular, we now simply write in place of ( ) .

Multiscale representation
In this section, we give a multiscale representation of the homogenized matrix. That is, we rewrite a as the sum of a first term taking the form of an average of very local objects, and correction terms that involve progressively larger and larger length scales. This increase of the relevant length scale means that producing one relevant sample for the calculation of the correction term becomes progressively more difficult. Yet, the actual size of these correction terms becomes smaller and smaller, and thus fewer samples need to be averaged out in order to approximate the expected value of the quantity up to a given accuracy. Moreover, this beneficial effect more than compensates for the increase in computational effort required to obtain a single sample, and this is the main reason for the efficiency of the approach presented here.
exists and is finite. Moreover, Remark 3.2. In the summand indexed by = 0 on the right side of (3.2), we have and the left side of the identity above is interpreted as the duality pairing between ℋ −1 and ℋ 1 , as explained below (2.10). All the other terms on the right side of (3.2) involve functions in ℒ 2 and are thus understood as in (2.8).
Remark 3.3. One can show in great generality (using only the ergodicity of the coefficient field instead of the short-range dependence assumption) that lim We do not provide the argument for this fact here. The interested reader can reconstruct it from the quantitative analysis of this term provided in the next section; see also Theorem 5.1 of [52].
Proof of Proposition 3.1. By (2.12), we have Using also (2.15), we deduce that For each > 0, we define the resolvent operator The function = is interpreted as the unique element of ℋ 1 such that, for every ∈ ℋ 1 , the right side of this identity being understood as explained below (2.10). For every , > 0, we have the resolvent formula = + ( − ) . (3.5) In particular, we have = . Moreover, the operator is self-adjoint, in the sense that for every , ∈ ℋ −1 , (3.6) By (3.3) and (2.14), we have The completion of the proof will follow from this identity and a repeated application of the resolvent formula.
To start with, given any family of numbers , 0 , . . . , ∈ (0, ∞), an inductive argument based on the identity (3.5) yields that For the summand with = 0, the product ( 0 − ) · · · ( −1 − ) appearing above is interpreted as being 1. Note that, for every ∈ N, we have We define recursively For each ∈ (0, 2 − ), we now apply the formula (3.7) with the choice For the summand in (3.7) with replaced by 2 + 1 (with ∈ {0, . . . , }), we use the commutation between resolvents and the symmetry (3.6) to obtain By the same reasoning, we can rewrite the contribution of the summand in (3.7) with replaced by 2 (with ∈ {0, . . . , }) as Summing over all indices, and using a similar reasoning for the remainder term, we obtain that Note that , is a scalar multiple of , and that this scalar tends to 1 as tends to 0. Hence, the left side and each summand in the sum indexed by on the right side of the identity above converges as tends to 0. It follows that the limit lim →0 R , is well-defined and finite, and using also (3.4), that the formula (3.2) holds.

Quantitative estimates
Recall that we have fixed a vector ∈ R of unit norm throughout the paper. The proof of Theorem 1.1 relies on estimates on the solution of the initial-value problem The study of this problem was initiated in [50] where suboptimal estimates were derived. The sharp exponent of decay in time was obtained in [37], with polynomial moments controlled. With a very different proof, the stochastic integrability of this result was improved to almost Gaussian tails in [36]. A variation of this argument is exposed in Chapter 9 of [13]. (1) For every ∈ (0, 2), there exists ( , Λ, ) < ∞ such that for every 1 and ∈ R , (2) For every ∈ (︀ 0, 1 2 + 4 )︀ , there exist ( , ) > 2 and ( , Λ, ) < ∞ such that for every 1 and ∈ R , We will only use the first part of Theorem 4.1 once, in the course of the proof of Proposition 4.5, in the form of the 2 estimate In order to conclude for exponentially decaying tails as in the statement of Theorem 1.1, it is crucial to be able to choose an exponent 2 in the estimate on the size of (and for convenience, we will in fact choose > 2); this is the main motivation for stating the second part of Theorem 4.1. The first part of Theorem 4.1 matches the results found in [36] and Theorem 9.1 of [13]. In order to obtain the second part of the statement as a consequence, we can use the following basic deterministic estimate, a proof of which can be found in Lemma 9.2 of [13].

S161
Proof of part (2) of Theorem 4.1. Let > 2 and ∈ (0, 2) be exponents that will be fixed in terms of > 0 and the dimension in the course of the proof. By the first part of the theorem, there exists a constant ( , Λ, ) < ∞ such that for every 1 and ∈ R , Explicitly, this means that It follows from Lemma 4.2 that the random variable | ( , )| is bounded, uniformly over 1 and ∈ R . Hence, for a constant ( , , Λ, ) < ∞, This is (4.2) with given by Since > 2 and < 2 can be chosen arbitrarily close to 2, any exponent ∈ (︀ 0, 1 2 + 4 )︀ can be represented in this way.
We also record the following useful lemma allowing to localize the dependency of on the coefficient field; see Lemma 9.4 of [13] for a proof. and The function can be represented as a time integral of the function ( , ·), and the main contribution to this integral is for ≃ 2 . The previous results concerning the function can thus be translated into information on . (1) There exists (Λ, ) < ∞ such that for every ∈ N, (2) There exist (Λ, ) < ∞ and, for each ∈ [2, ∞), ∈ N and ∈ R , an ℱ( ( , ))-measurable random variable ( , ) such that and (3) For every > 0, there exist ( , ) > 2 and ( , Λ, ) < ∞ such that for every ∈ N and ∈ R , Proof. We decompose the proof into three steps.
Step 1. Transfering information on ( , ·) onto information on the 's relies on the observation that, for every > 0 and ∈ ℒ 2 , where ( ) = exp ( ∇ · a∇) is the semigroup associated with the evolution operator − ∇ · a∇. This identity can be extended to the case = ∇ · a , and we thus have in particular that Since integrals of the form of (4.10) or (4.11) will be iterated multiple times, it is convenient to rewrite them using probabilistic notation. That is, denoting by ( ) an exponential random variable of parameter which is independent of any other quantity in the problem, and by the expectation over this random variable only, we can rewrite (4.10) in the form We keep denoting by the expectation over these random variables. Recalling (3.8) and using the semigroup property of ( ( )) 0 , we deduce that for every ∈ N, (4.14) In view of this relation, we can now transfer information about onto provided that we have some information on the typical behavior of . As a useful guide for the intuition, we remark that so heuristically, we hope that any bound on ( , ·) transfers into a bound on after the substitution of by 2 .
Step 2. We prove (4.9). In view of Theorem 4.1 and (4.14), we need to know that is rarely much smaller than 2 . The following result is shown in (5.12) of [52]: for every > 0, there exists a constant ( ) < ∞ such that for every ∈ N,

S163
To control the behavior of this term on the event ∈ [0, 1], we use that the 's are nonnegative and independent to write, for every 1, This and Lemma 4.2 imply that This is largely sufficient to complete the proof of the estimate in (4.9). The proof of (4.6) is similar, only simpler, using Lemma 4.2 in place of Theorem 4.1.
Step 3. We prove (4.7). Recalling the notation ′ ( , , ) introduced in Lemma 4.3, we define, for each ∈ [2, ∞) and ∈ R , the ℱ( ( , ))-measurable random variable We need an upper bound for the probability of the event that is large. We obtain this by writing, for every 0, We now decompose the error into and analyze each of these terms in turn. By Lemma 4.3 and (4.18), we have Moreover, by Lemma 4.2 and (4.18), we have Combining these estimates yields (4.7). The proof of (4.8) is similar, except that we appeal to (4.5) instead of (4.4).
We denote For every ∈ N, we have that ∈ ℒ 1 , and by Proposition 3.1, for as in (3.1), We next estimate the size of the remainder term .
Proposition 4.5 (Remainder estimate). There exists (Λ, ) < ∞ such that for every ∈ N, Proof. We keep denoting by the random variable defined in (4.13), and we let ′ be an independent copy of . We also let ( ) be an independent exponential random variable of parameter , and denote the expectation with respect to these random variables by . We recall that the introduction of these random variables allows us for convenient representations such as those in (4.12) and (4.14). Combining these representations with the definition of in (3.1), we obtain that and thus In order to estimate the integral over ∈ [0, 1], we use independence to get that and then proceed as in (4.16) and (4.17). For the remaining part, we use (4.3) and the triangle inequality to deduce that ˆ∞ Integrating in and then appealing to (4.15), we obtain as announced.
In the next proposition, we replace the global averages For = 0, the same estimate holds with the additional restriction 1.
Proof. We first record the following elementary observation: for every , > 0 and random variables and , we have Indeed, this follows from We decompose the rest of the proof into three steps.
Step 2. We reformulate (4.21) into an equivalent form that will be more convenient for the analysis. For every ∈ N, we denotẽ︀ : We show that it suffices to prove Proposition 4.6 for √ ′ and with (4.21) replaced bŷ For every 1, the mapping ↦ → E[ ( )] is Z -periodic, and by (4.9), it is uniformly bounded by 2 − (1+ 2 − ) . In the case = 0, the mapping ↦ → E[ 0 ( )] is Z -periodic and in 2 (︀ [0, 1] )︀ , as follows from (4.6) and the fact that 0 ∈ ℋ 1 . Hence, for every ∈ N, there exists a constant ( , Λ, ) < ∞ such that for every ∈ R and 1, See for instance Exercise 3.7 of [13] for a proof. As a consequence, the statements (4.21) and (4.25) are equivalent, up to an adjustment of the constant .
Step 3. Without loss of generality, it suffices to prove (4.25) for = 0. For each ′ , we definẽ︀ By (4.7) and (4.8), there exists (Λ, ) < ∞ such that for every ∈ N and ′ , In this step, we leave aside the case = 0 and show that there exists ( , Λ, ) < ∞ such that for every 1, ′ and > 0,ˆR̃︀ We denote the cube of side length centered at the origin by By (4.26) and the same argument as in Step 1 of this proof, we may assume that 2 . We decompose the left side of (4.27) intoˆR̃︀ By (4.9), (4.22) and Lemma 2.2, there exists > 1 such that, for each ∈ Z ,

By Lemma 2.3, we obtain that
We conclude that (4.27) holds by observing that, since 2 , Step 4. In this step, we show that there exists ( , Λ, ) < ∞ such that for every 1, ′ and > 0, The argument is similar to that of the previous step, only simpler, using only (4.26) and not requiring any appeal to (4.9).
Step 5. We complete the proof. It is clear from (4.26) that︀ We thus decomposẽ︀ ( ) intõ︀

S167
Applying (4.27) to the first term, (4.28) to each of the summands, and summing the result, we obtain (4.25) for 1. In the case = 0, the same reasoning applies, using also the deterministic estimate on ∇ 0 provided in (4.24), and the estimate (4.8) to localize the dependency of this term on the coefficient field.
We have now obtained almost optimal information on the behavior of when tested against the heat kernel. Since we want to understand the behavior of this field against an arbitrary mask, we now upgrade this information into an −ℓ estimate using the following lemma, which is a rescaled version of Remark D.6 from [13]. In order to keep the presentation of the argument as simple as possible, we only state this lemma for 2 -based Sobolev spaces with integer-valued regularity exponents. Recall the definitions of the rescaled ℓ and −ℓ norms in (2.5) and (2.6).

Lemma 4.8 (Sobolev norm for
). For every > 0, there exist ( , ) > 1 and, for every ℓ ∈ N satisfying ℓ > 2 , a constant ( , ℓ, Λ, ) < ∞ such that for every ∈ N and Proof. For convenience, we set := − ffl R , and use the notation ′ introduced in (4.23). We first consider the case ∈ N ∖ {0}. For ′ , by Proposition 4.6, we have for every ∈ R that For ′ , we have instead that where we used that ℓ > 2 and ′ for the second equality. We then obtain the result by an application of Lemma 4.7. The case = 0 is obtained in the same way, except that we treat the integral over ∈ [0, 1] separately using the gradient estimate in (4.24).
We are now ready to complete the proof of Theorem 1.1.
An application of Lemma 4.8 thus yields, for each > 0, that there exists ( , ) > 1 and a constant ( , ‖ ‖ ℓ (R ) , Λ, ) < ∞ such that for every ∈ N and 1, We fix ∈ (︀ 0, −1 2 )︀ , ∈ N, and recall from (1.5) the notation Substituting with in the previous display, we obtain that We select := 4 > 0, so that In view of Lemma 2.1 and of the fact that > 1, this implies the existence of a constant (︀ , ‖ ‖ ℓ (R ) , Λ, )︀ < ∞ such that for every ∈ N and 0, A similar but simpler argument shows that Combining these estimates with (4.19) and Proposition 4.5 completes the proof of Theorem 1.1.

Hierarchical hybrid grids
In this section, we explain our strategy for the numerical approximation of solutions of elliptic equations. For definiteness, given a coefficient field a( ), and a domain ⊆ R , we consider the problem of computing an approximation of the solution ∈ 1 0 ( ) of the equation Since generalizations such as the addition of lower-order terms or non-zero boundary conditions pose no particular additional difficulty, we will not discuss these further. In the first subsection, we observe the necessity to opt for highly refined discretized approximations to the continuous equation. We then explain efficient ways to compute these highly refined approximations. Our approach is in line with the earlier work [14,15] and based on hierarchical hybrid grids. That is, we start from an unstructured coarse mesh and refine it in a self-similar way a number of times; we then exploit this piecewise-structured hierarchical construction extensively at every step of the algorithm (assembly of the finite-element matrix, matrix-vector products, restriction and interpolation operators).

Roughness of solutions
In many practical instances, the heterogeneity of the coefficient field is due to the fact that the material of interest is a composite of several different types of substances: see for instance the library of images at [24]. In view of this, we focus on the case of piecewise-constant coefficient fields.
In this case, the discontinuities of the coefficient field compound the difficulties inherent to solving equations with rapidly oscillating coefficients. In order to measure the extent of these difficulties, consider the problem of approximating the solution ∈ 1 (︀ where the coefficient field a( ) is given by We start from a coarse mesh made of 8 triangles of equal sizes (two triangles in each of the translates of [0, 1] 2 ). We then refine a given mesh by subdividing each triangle into 4 smaller triangles, adding a new vertex at the midpoint of each edge. We consider multiple iterations of this refinement procedure, and for each level of refinement, we compute the associated finite-element solution, using piecewise affine elements. The approximation of the solution minus the affine function = ( 1 , 2 ) ↦ → 1 + 2 after five levels of refinement is represented on the left frame of Figure 1. The rough behavior of the solution near the origin is clearly visible. We also display the value of the solution along the line 1 = 2 on the middle frame of Figure 1 -see below around (5.5) for the prediction of the exponent 0.4 . . . appearing there. The right frame of Figure 1 displays the relative error, measured in 1 and in 2 respectively, compared with the true solution. In order for the relative error to be below 10% in the 1 norm, it is necessary to use at least six levels of refinement. At six levels of refinement, the linear system that needs to be solved already involves 2 15 unknowns. We can understand the roughness of the solution theoretically in a precise way. We consider more generally the situation when the coefficient field is given by for some Λ ∈ [1, ∞). A blow-up analysis near the origin suggests to look for solutions in the unit ball (0, 1) of the form ( ), where 0 and ∈ [0, 2 ) are the standard polar coordinates: 1 = cos and 2 = sin . Denoting we find that the smallest exponent > 0 such that ( ) is a solution in (0, 1) for some function is given by Moreover, ( ) is indeed a solution of the equation in (0, 1) when is the unique minimizer of the variational problem above. The value of this exponent was computed in [57]: it is Notice that the function ( ) belongs to 1+ − ( (0, 1)) for every > 0, but does not belong to 1+ ( (0, 1)). We therefore expect a finite-element scheme with elements of size ℎ to provide an approximation in 1 at a precision of the order of ℎ (and at precision of the order of ℎ 1+ in 2 ). The particular case we investigated numerically corresponds to Λ = 9, which gives In this sense, coefficient fields that are piecewise constant on a checkerboard structure are worst possible from the point of view of regularity (and therefore of difficulty of numerical approximation). For general coefficient fields satisfying (5.7) but not necessarily being a multiple of the identity matrix at each point, it was shown in [57] that the smallest possible exponent for Hölder regularity is = Λ − 1 2 . An explicit coefficient field satisfying (5.7) and admitting a solution of the form Λ − 1 2 ( ) was first given in [49]. This exponent governs the rate of convergence of the finite-element approximation as the mesh is successively refined: for instance, for an ellipticity contrast of Λ = 100, we cannot hope for an asymptotic convergence rate better than ℎ 0.1 in general, and no better than ℎ 0.127... in the case of the coefficient field in (5.4). The situation is even worse in dimension = 3, at least from a theoretical point of view. Indeed, to the best of our knowledge, it is an open question to show that when = 3 (or for any 3), the regularity exponent can be bounded from below by a negative power of the ellipticity contrast.

Number of unknowns
We are ultimately interested in solving elliptic equations with random coefficients. In order to calculate the homogenized matrix, we will need to average over large domains, so as to tame the fluctuations of the coefficient field. As a toy example, consider the problem of calculating the standard average of the coefficient field, denoted by ffl R a above, see (2.8). By the scaling of the central limit theorem, in order to measure this quantity within a precision > 0, we need to average over at least −2 unit cells. Similarly, as was shown in Proposition 1.1 of [52], it is impossible to compute an approximation of the homogenized matrix at precision if one observes only ( −2 ) unit cells (the statement in [52] is written for finite-difference equations, but the proof applies essentially verbatim to the continuous setting).
Roughly speaking, if we want to compute a within a precision of, say, 10%, we are bound to have to examine at least of the order of 10 2 unit cells. In two dimensions, if the mesh we use is refined six times as described in the previous subsection, this means that we must be facing problems involving of the order of 2 15 × 10 2 ≃ 3 × 10 6 unknowns. Notice that each further refinement of the mesh multiplies this number by 4, and that reducing the size of the fluctuations by a factor of 2 also multiplies this number by 4. Finally, this rough estimation hides multiplicative constants that may be large. (On the other hand, the random coefficient fields we investigate numerically in the next section are not made of a systematic periodic repetition of the worst-case coefficient field in (5.3), and this will mitigate the difficulty somewhat.)

Motivations for hybrid methods
The upshot of the previous subsections is that we ought to be able to solve for elliptic problems with many degrees of freedom. As is well-known, the numerical approximation of elliptic equations in domains with simple geometry and with constant coefficients can be performed very efficiently using a variety of techniques, including the geometric multigrid method (see [29] for several benchmarks). Indeed, for equations with constant coefficients, stencil-based operations can replace the need to assemble and store the finite-element matrix. Moreover, the data can be organized locally in agreement with the underlying geometry and accessed in a consistent way, resulting in few integer operations and highly optimized usage of the processor cache memory.
For more complex geometries or varying coefficients, completely unstructured approaches can be used instead. In this case, the problem of storing the finite-element matrix in memory becomes a major limitation. Moreover, data access becomes highly unpredictable and requires more integer operations, two factors that cause a dramatic drop in performance [14,15].
Following [14,15], we seek to remedy this problem by using a hybrid approach. The idea, called Hierarchical hybrid grids in [14,15], is to proceed as in the completely unstructured case on the coarse mesh, but then rely on structured techniques within each constant-coefficient patch. This approach has multiple advantages. Firstly, we only need to assemble and store the finite-element matrix associated with the coarsest mesh. Similarly, we do not need to store the full computational grid in memory. This results in large gains in memory usage, which is otherwise the main limiting factor on the computing architectures we use. Moreover, we store a vector of the finite-element space in a bi-dimensional array indexed by the identity of the coarse element and then the position within it. This allows to obtain efficiency gains similar to those observed in the completely structured case, in particular regarding fast matrix-vector multiplications, and restriction and interpolation operators in the multigrid method.

Hierarchical hybrid grids
We now explain how to implement this approach more precisely. We also refer to [40] for a more thorough discussion, as well as [14,15].
We start with some definitions. We say that = { 1 , . . . , } is a simplicial partition of the set ⊆ R if the following three conditions hold: (1) for every ∈ {1, . . . , }, the set is a simplex in R (i.e. the convex envelope of a set of ( + 1) points -a triangle in dimension = 2 and a tetrahedron in dimension = 3); (2) for every ̸ = , the interiors of and are disjoint; (3) the union ⋃︀ =1 is the closure of the set . For Figure 2. Left: standard simplex̂︀ has been refined twice. Right: an unstructured coarse mesh, and the image of the twice-refined standard simplex through the affine mapping for one particular coarse element .
convenience, we often drop the word "simplicial" and simply say "partition" instead of "simplicial partition". A partition can be represented as a list of nodes {n } 1 ⊆ R and a list of ( + 1)-tuples of indices that define the identity of the corner points of every simplex in the partition. We say that two partitions 1 and 2 are nested, and write 1 ⊑ 2 , if for every ∈ 1 , there exists ∈ 2 such that ⊆ . We denote bŷ︀ the standard simplex, that is, the convex envelope of the nodes e 0 , . . . , e , where (e 1 , . . . , e ) is the canonical basis of R , and e 0 is the null vector. Let̂︀ be a partition of̂︀ , and be a partition of an arbitrary domain. We say that a partition ℎ is the locally uniform partition associated with (︁ ,̂︀ )︁ , and write and, for every ∈ , there exists an affine mapping : R → R such that the image of̂︀ under is { ∈ ℎ : ⊆ }. See Figure 2 for an illustration. Notice that the mapping appearing above must be such that (︁̂︀ )︁ = . Such an affine mapping is entirely specified by prescribing which nodes of̂︀ are sent to which nodes of . Note that the locally uniform partition ℎ is completely specified by the knowledge of (︁ ,̂︀ )︁ . This allows for vast memory gains for storing the partition, since only the reference simplex̂︀ is meshed finely, while the global partition remains coarse. In addition, as discussed below, this format will be very convenient for a variety of operations, including for implementing the restriction and interpolation operators in the multigrid method.

Assembly of the finite-element matrix
We proceed to define the finite-element matrix, and then describe how to store it efficiently using the structure of locally uniform partitions, under the assumption that the coefficient field is constant on each coarse element.
Let be a partition of the domain ⊆ R . We think of this partition as being relatively coarse, having a level of detail just sufficient to resolve the variations of the coefficient field. For clarity of exposition, we start by considering the case in which this coarse partition is not refined further. Denote by {n } 1 ⊆ R the nodes of the partition . We look for an approximation of the solution of (5.1) in the finite-dimensional space Denoting by x the vector encoding the finite-element approximation of (5.1) in the basis formed by we identify x as the solution of the problem Notice that the size of the vectors and of the symmetric matrix appearing above is the number of nodes in the interior of ; this is how the null Dirichlet boundary condition is enforced. where ∈ R ×( +1) is the canonical matrix representing the linear mapping We denote the local shape functions associated with the standard simplex bŷ︀ :=̂︀ , for every ∈ {0, . . . }, and call them reference shape functions. In dimension = 2, these reference shape functions are ↦ → 1 − 1 − 2 , ↦ → 1 , and ↦ → 2 . We also denote by : ↦ → + the unique affine mapping that sends the nodes e 0 , . . . , e of the standard simplex to the nodes n 0 , . . . , n , so that in particular, (︁̂︀ )︁ = (see Fig. 2). Notice that, for every ∈ , ( ( )) =̂︀ ( ), so that (︀ ∇ )︀ ( ( )) = ∇̂︀ ( ). (5.11) By this change of variables, for every , ∈ {0, . . . }, we can rewrite the integral on the right side of (5.9) as In the case when the partition is sufficiently fine that a( ) is constant equal to a ( ) when varies in , we set 13) and the previous display becomes ( ) =ˆ̂︀ ∇̂︀ · c ( ) ∇̂︀ . (5.14) We can expand this expression into The mapping is clearly surjective, but it is not a bijection: indeed, the nodes that belong to the boundary of multiple coarse elements are represented multiple times. On the other hand, every node that belongs to the interior of a simplex of the coarse partition has a unique representation in the form n for some ( , ) ∈ × {︁ 1, . . . ,̂︀ }︁ . As the partition̂︀ becomes finer and finer, the approximation ≃ | |̂︀ therefore becomes more and more accurate. (The notation | | stands for the number of elements in .) For each ∈ and ∈ {︁ 1, . . . ,̂︀ }︁ , we denote by the restriction to of the basis function ( , ) . The contribution of the coarse element ∈ to the finite-element matrix can represented by thê︀-by-̂︀ matrix ( ) such that (5.9) holds for every , ∈ {︁ 0, . . . ,̂︀ }︁ . The relation (5.10) still holds, where now ∈ R ×̂︀ is the canonical matrix representing the linear mapping . Moreover, these matrices can be constructed directly in a straightforward manner, without having to construct the fine partition . Finally, in the practical cases we have in mind, the matriceŝ︀ are highly regular and have only of the order of̂︀ non-zero entries. The amount of memory required to store this data is proportional to

S175
If we were to ignore the locally uniform structure of the fine partition , the cost of storing its finite-element matrix would be proportional to instead. Recalling that ≃ | |̂︀, we see that the semi-structured approach results indeed in a significant gain in memory usage.

Matrix-vector product
Pursuing with the setting of the previous section, we now discuss how to store vectors and perform matrixvector operations with the finite-element matrix, which we recall is represented in memory by the matrices {︀ c ( ) }︀ ∈ and {︁̂︀ }︁ 1 , . As discussed above, where is the matrix representing the linear mapping in (5.17). In view of (5.18), instead of representing finite-element vectors as -dimensional vectors, we encode them in an̂︀-by-| | array. That is, we represent each x ∈ R by an array such that the columns of , denoted by { (:, )} 1 | | ⊆ R̂︀ , are equal to the vectors {︀ x }︀ 1 | | . Here we used the notation { } 1 | | to denote an enumeration of the (unstructured) set . Naturally, the entries that are associated with nodes that belong to multiple coarse elements are repeated in this representation; this parallels the observation that the mapping defined in the previous subsection is surjective but not bijective.
The operation of onto a vector can then be evaluated in two steps: first, we compute thê︀-by-̂︀ matrix defined, for every ∈ {1, . . . , | |}, by In the actual implementation of this second step, we do not need to construct the matrices explicitly. Instead, we implement this formula by identifying the nodes that are found at the interface between two or more elements of the coarse partition. In order to do so, we distinguish between different types of interfaces, according to whether they are to be found on faces, edges, or point vertices. (Naturally, face-type interfaces are only relevant in dimension = 3.) For a more precise description of this aspect, we refer to [14,15,40].

Multigrid method
The geometric multigrid method is a technique for the numerical approximation of elliptic problems [20]. It uses a sequence of nested partitions ⊑ · · · ⊑ 0 , as well as restriction and interpolation operators which allow to transfer a function defined on a given grid to a function defined on a coarser and finer grids respectively.
The setting of locally uniform partitions is particularly conducive to efficient implementations of the geometric multigrid method. Indeed, we first give ourselves a sequence of nested partitions of the reference element︀ ⊑ · · · ⊑̂︀ 0 . These nested partitions are constructed as follows: we fix̂︀ 0 := {︁̂︀ }︁ to be the trivial partition, and then inductively construct̂︀ +1 from̂︀ by adding new nodes at the middle of the edge of each element of︀ , and, in dimension = 2, by replacing each triangle with a partition of this triangle made of 4 triangles, or in dimension = 3, by replacing each tetrahedron with a partition of this tetrahedron made of 8 tetrahedra [17]. The nested partitions we use to implement the geometric multigrid method are then Recall that we denote by ( ) the finite-element space associated with the partition , see (5.8). We start by defining interpolation and restriction operators associated with the nested partitions of the standard simplex.
For each < , we define the interpolation operator̂︀ ℐ : )︁ to be the canonical injection. The restriction operator can then be taken as the transpose of the interpolation operator, up to a normalization constant (see [20], Def. 6.3.1 for more precision). Similarly, we define the interpolation operator ℐ : ( ) → ( +1 ) to be the canonical injection. Recall that we represent a given vector x ∈ ( ) as an̂︀ -by-| | matrix such that (:, ) = ( ) x ∈ R̂︀ wherê︀ is the number of vertices of the partition̂︀ of the standard simplex, and we wrote ( ) instead of to emphasize the dependency on of this operator. In this representation, we can evaluate the interpolation operator very simply by setting, for every ∈ {1, . . . , | |}, Up to a normalization constant, we wish to use the transpose of ℐ as our restriction operator. In view of the format in which we store elements of ( +1 ), this is not absolutely straightforward to compute, since it involves some amount of communication between vertices belonging to different elements of the coarse partition. We now explain how to perform this computation efficiently by reducing it to the same calculation as that arising in matrix-vector multiplication, see (5.19). Given a load vector b and an element x +1 ∈ ( +1 ), we aim to compute the residual We use the same data format to store the load vector, that is, we represent it by a family For the model problem (5.1), this means that we set, for every ∈ {︁ 1, . . . ,̂︀ +1 }︁ , where again we wrote ( +1, ) instead of to make the dependency on more explicit. Recall that the vector x +1 in (5.20) is stored in memory as an array whose columns are given by ( +1) x +1 . Using also (5.10), we obtain that Moreover, one can verify that ( +1) ℐ =̂︀ ℐ +1 ( ) .
We thus conclude that Each term r ( ) is relatively easy to compute, sincê︀ ℐ +1 is an operator of moderate dimension. We have now reached a situation analogous to that in the previous section: the remaining problem is that it is not true in general that r ( ) = ( ) r . This can be arranged by proceeding as in (5.19).
For the smoothing steps in the multigrid method, we use a few steps of conjugate gradient descent. Finally, we use a direct solver for the coarse-grid problem. In our numerical experiments, the above-described implementation of the geometric multigrid method showed robust convergence behavior.

Numerical tests
In this section, we report on numerical results for the method presented in this paper. The code was written in the Julia language, and is available at this address: https://github.com/haampie/Homogenization.jl.
In all the examples we consider, the coefficient field is Z -stationary, where ∈ {2, 3} is the dimension. For convenience, we replace averages against the mask in (1.6) and (1.7) by averages over the cube (− , ) . While strictly speaking, this situation is not covered by Theorem 1.1, it is not difficult to show that the statement is still correct in this case (in fact, the argument is then somewhat simpler). For simplicity, we also fix = 0 in (1.5). As discussed below (1.8), it is not difficult to modify the proof and cover this case as well, at the cost of an arbitrarily small loss of exponent in (1.7). We also slightly modify the definition of the approximations̃︀ in (1.8), by using a square or a cube instead of a ball for the domain: that is, for every ∈ {0, . . . , }, we set with null Dirichlet boundary condition on (︀ (− ( , ), ( , )) )︀ . The estimator we wish to calculate, slightly modified from (1.6), is then defined bŷ︀ In order to obtain numerical approximations of the functions̃︀ , we use the finite-element method with hierarchical hybrid grids presented in Section 5. In all the examples we consider, the coefficient field is piecewise constant on + [0, 1) , for every ∈ Z . We thus start from a coarse partition of the domain which consists, in dimension = 2, in splitting each unit square into two triangles, or in dimension = 3, in splitting each unit cube into six tetrahedra. This provides us with a coarse partition of the domain, which was denoted by in Section 5. We then proceed to refine this partition iteratively by decomposing, in dimension = 2, each triangle into four smaller triangles, or in dimension = 3, each tetrahedron into eight smaller tetrahedra (and we do so in practice by constructing a refined partition ℎ of the standard simplex̂︀ iteratively, which provides us with an implicit fine partition of the whole domain using the notion of locally uniform partition, see Sect. 5.4). We denote the number of iterative levels of refinement performed in this way by ref . This defines an approximation of the quantitŷ︀ 2 defined in (6.3), which we denote bŷ︀ 2 ( , ref ). Strictly speaking, this quantity also depends on the choice of the boundary layer size bl , but we keep this implicit in the notation. Theorem 1.1 bundles together an estimate for the mean error and an estimate for the standard deviation or our approximation̂︀ 2 . The approximation has been set up so that both quantities are of the same order, that is, 2 − 2 . Additionally to this error comes the error due to the finite-element discretization: for each fixed ref , the quantitŷ︀ 2 ( , ref ) computes an approximation of the homogenized matrix of the discretized system with ref levels of refinement. While we did not prove this, it is clear that all the arguments use to prove Theorem 1.1 would remain valid for the discretized system, and thus 2 ( , ref ) allows to approximate the homogenized matrix a ( ref ) of the discretized system with a mean error and a standard deviation that both scale like 2 − 2 as tends to infinity. However, there is also a discrepancy between a ( ref ) and the homogenized matrix a of the continuous equation, which is manifested in our algorithm in the fact that we do not have perfect access to the solutions̃︀ of (6.2). Moreover, as explained in Section 5.1, the rate of convergence of approximate solutions in terms of ref can become arbitrarily slow as the ellipticity contrast gets large.
As said above, we consider coefficient fields that are piecewise constant on unit cubes; more precisely, we assume that for every ∈ Z , we have for some family (b ) ∈Z . This family is random and constructed in the following way, given two parameters ∈ (0, ∞): the random variables (b ) ∈Z are independent; the matrix b is diagonal; the diagonal entries of b , which we denote by (b , ) 1 , are independent; and finally, for every ∈ {1, . . . , }, As discussed in Section 5.1, this example is particularly interesting since it is in some sense the coefficient field which allows for the most pathological singularities in the solutions for a given ellipticity ratio Λ = / . Notice that, in order to demonstrate that our numerical code is not restricted to the case when a( ) is a multiple of the identity, we have dropped this restriction here (and it would not be difficult to accommodate for matrices that are not diagonal). An additional very interesting feature of this class of examples is that it is one of the very rare cases where the homogenized matrix is known exactly: in dimension = 2, it is given by a = √ Id Exercise 2.10 of [13]. (No such simple formula is expected to exist in dimension = 3, and in fact, we are not aware of any genuinely three-dimensional coefficient field where the homogenized matrix is known exactly.)

Two-dimensional case, moderate contrast
We fix = 2, = 1, and = 9. We thus have in this case that a = √ Id = 3 Id. Using the notation in (2.8), we also observe that ffl a = 5 Id, and therefore we expect that̂︀ 2 ( , ref ) converges to 2 as and ref tend to infinity. We fix the boundary layer constant bl := 4, and plot a histogram of̂︀ 2 ( , ref ) for different values of and ref , see Figure 3. Each histogram is obtained by sampling 200 realizations of the estimator. For each value of and ref , we also report the empirical mean and variance of̂︀ 2 ( , ref ). Notice that the estimator has a bias to overestimate the value of a, which is consistent with the fact that the remainder term in the series expansion (3.2) is nonnegative, see Proposition 4.5 (the sign of the discretization error was not predicted theoretically).
From the results displayed on Figure 3, one can check that the quantitŷ︀ 2 ( = 4, ref = 3) falls within the interval [1.84, 2.02] with 95% probability. Taking for granted that we can estimate ffl R a = 5 Id more easily, we obtain an estimation for · a which falls within the interval [2.98, 3.16] with 95% probability, the true value being 3. This estimator thus produces a result with a relative error of 5% from the true value with 95% probability. It takes about 2 s to compute this quantity on a laptop computer with 16 Go of memory and using a single processor clocking at 2.40 GHz.
We next investigate more precisely the scalings of the standard deviation and mean error of 2 ( , ref ). (By definition, the mean error is |E [︀ 2 ( , ref ) ]︀ − 2| for this example). On the left frame of Figure 4, we see that the variance decays like 2 − = 2 −2 , as predicted by the theoretical results. On the left frame of Figure 4, we display the mean error as a function of and of the number of refinements. Our theoretical arguments predict that the mean error is the sum of a term of the order of 2 − 2 = 2 − , of the discretization error which depends on ref , and of the boundary layer error related to the choice of bl . We display the dependency of the mean error in these parameters more precisely on Figure 5, for the value = 4. We see on the left frame of   that the choice of bl = 4 is already sufficient to ensure that the boundary layer error is negligible compared with the discretization error. On the right frame of Figure 5, we observe that the discretization error decays approximately like ℎ 0.409 , where ℎ is the element size, as predicted in the discussion around (5.6).

Two-dimensional case, high contrast
We continue with the two-dimensional setting, we also keep = 9, but we now progressively decrease in the interval [10 −2 , 1]. More precisely, we vary in twenty logarithmically equally spaced steps between the values 1  For these twenty values of ∈ [︀ 10 −2 , 1 ]︀ , the left frame of Figure 6 displays the mean and the standard deviation of̂︀ 2 ( , ref ), with the choices of = 4 and ref = 5. The estimator captures the true value of the homogenized matrix quite well, for a large span of values of , although relative errors become large when approaches 10 −2 . This is in part due to the fact that the true homogenized matrix tends to zero as is decreased to zero, and thus even a constant error in absolute value would translate into a relative error which blows up. A more fundamental reason for the increase of the error is that solutions become more and more singular, and thus accurate discretizations become more challenging. On the right frame of Figure 6, we plot the relative error in the mean, for = 0.1 and different values of ref . We expect the asymptotic convergence rate to scale approximately like ℎ 0.1337... , where ℎ is the size of a finite-element cell. Despite the slow asymptotic rate, a faster pre-asymptotic regime allows to bring the relative error within a few percentage points after five levels of refinement.

Three-dimensional case, small contrast
We now turn to the investigation of three-dimensional problems. To further make the case that the scaling of the discretization error is strongly affected by the ellipticity contrast, we start by investigating a regime of relatively small contrast: we fix = 1 and = 2. As in the two-dimensional case, we plot a histogram for︀ 2 ( , ref ), for different values of and ref , see Figure 7. Each histogram is obtained by combining 200 samples of the estimator.
As a rule of thumb, we expect that the approximation a ≃ ffl R a improves as we increase the dimension and reduce the contrast. This is confirmed by the numerical results, which suggest that for the example considered, the difference ffl R a − a is about 4% of the magnitude of the homogenized matrix a itself. We also see that the convergence of̂︀ 2 ( , ref ) is relatively rapid as ref increases. Finally, the variance decays roughly like 2 − = 2 −3 , in agreement with the theoretical prediction.

Three-dimensional case, moderate contrast
We now turn to more sizable values of the ellipticity contrast, in three dimensions, fixing = 1 and = 9. In the three-dimensional case, we are not aware of any analytic expression for the homogenized matrix. The numerical results we obtained and a naive extrapolation suggest that R a − a ≃ 1.35 Id, and thus a ≃ 3.65 Id.
Assuming that this is correct, a ±5% error interval for a is [3.47, 3.83]. An average of four samples of the quantity 5 −̂︀ 2 ( = 2, ref = 3) falls within this interval with probability above 95%, and takes about 20 min to compute on a laptop computer with 16 Go of memory using a single 2.40 GHz processor. A single sample of the quantity 5 −̂︀ 2 ( = 2, ref = 4) falls within the smaller interval [3.62, 3.80] with 95% probability, and takes about 38 min to compute with the same piece of hardware. Moreover, the computational time can be significantly reduced by optimizing on the boundary layer size.