MDFEM: Multivariate decomposition finite element method for elliptic PDEs with lognormal diffusion coefficients using higher-order QMC and FEM

We introduce the multivariate decomposition finite element method for elliptic PDEs with lognormal diffusion coefficient $a=\exp(Z)$ where $Z$ is a Gaussian random field defined by an infinite series expansion $Z(\boldsymbol{y}) = \sum_{j\ge1} y_j\,\phi_j$ with $y_j\sim\mathcal{N}(0,1)$ and a given sequence of functions $\{\phi_j\}_{j\ge1}$. We use the MDFEM to approximate the expected value of a linear functional of the solution of the PDE which is an infinite-dimensional integral over the parameter space. The proposed algorithm uses the multivariate decomposition method (MDM) to compute the infinite-dimensional integral by a decomposition into finite-dimensional integrals, which we resolve using quasi-Monte Carlo (QMC) methods, and for which we use the finite element method (FEM) to solve different instances of the PDE. We develop higher-order quasi-Monte Carlo rules for integration over the finite-dimensional Euclidean space with respect to the Gaussian distribution by use of a truncation strategy. By linear transformations of interlaced polynomial lattice rules from the unit cube to a multivariate box of the Euclidean space we achieve higher-order convergence rates for functions belonging to a class of anchored Gaussian Sobolev spaces, taking into account the truncation error. Under appropriate conditions, the MDFEM achieves higher-order convergence rates in term of error versus cost, i.e., to achieve an accuracy of $O(\epsilon)$ the computational cost is $O(\epsilon^{-1/\lambda-d'/\lambda}) = O(\epsilon^{-(p^*+d'/\tau)/(1-p^*)})$ where $\epsilon^{-1/\lambda}$ and $\epsilon^{-d'/\lambda}$ are respectively the cost of the quasi-Monte Carlo cubature and the finite element approximations, with $d' = d \, (1+\delta')$ for some $\delta' \ge 0$ and $d$ the physical dimension, and $0


Introduction
In this paper we are concerned with the application of higher-order quasi-Monte Carlo (QMC) rules and multivariate decomposition methods (MDM) to elliptic PDEs with random diffusion coefficients. We focus on the lognormal diffusion coefficient, the logarithm of which is a Gaussian random field. The goal is to compute the expected value of some functional of the solution. This method was motivated by the need for new techniques for elliptic PDEs with smooth lognormal diffusion coefficients where the classical QMC approaches can only achieve first-order convergence, see, e.g., [14,15,17,18].
The MDFEM was already analysed in the case of a uniform diffusion coefficient in [28], but in this case higher-order QMC rules are readily available for integration over the unit cube. For lognormal diffusions we cope with a more challenging problem since the expectation is taken with respect to the Gaussian distribution over an unbounded domain. Consequently, existing higher-order QMC algorithms are not directly applicable. To solve this problem we propose using a truncation method recently developed in [7], see also [27]. By exploiting the fast decay of the Gaussian distribution toward infinity, the Euclidean domain is truncated and the resulting integral is transformed to the unit cube using a linear transformation where suitable higher-order QMC rules can be applied. The proposed algorithm allows us to achieve higher-order convergence for sufficiently smooth integrands.
Let D ⊂ R d be a bounded polygonal domain in R d , with typically d = 1, 2 or 3, with boundary ∂D. We consider the following elliptic Dirichlet problem −∇ · (a(x, y) ∇u(x, y)) = f (x), for x in D, u(x, y) = 0, for x on ∂D.
Here, the gradient operator ∇ is taken with respect to x and a : D × Ω N → R for some Ω ⊆ R. We consider the case when y = {y j } j≥1 is a sequence of parameters distributed on R N according to the product Gaussian measure µ = j≥1 N (0, 1), and the diffusion coefficient takes the form a(x, y) := exp (Z(x, y)) , with Z(x, y) := j≥1 y j φ j (x), y j ∈ Ω = R, y j ∼ N (0, 1), where {φ j } j≥1 is a suitable system of real-valued, bounded, and measurable functions. with ρ u (y u ) := j∈u ρ(y j ). We note that, depending on what is most natural, we write dµ(y u ) or ρ u (y u ) dy u with the understanding that the product probability measure is always on all the variables of the integral. The weak formulation of problem (1) is to find for a given y ∈ Ω N the solution u(·, y) ∈ V := The space V = H 1 0 (D) is equipped with the norm v V := ∇v L 2 (D) . Under some assumptions on the system {φ j } j≥1 we have existence, uniqueness and an a priori estimate of the solution of the weak formulation by means of the Lax-Milgram lemma. For this we need to show a lower bound and an upper bound on a(x, y) for x ∈ D, but since we have a lognormal field, it might be that a(x, y) is not bounded for certain values of y ∈ R N . However, under the following assumptions, which are standard, see, e.g., [1,2,18,21], we can claim lower and upper bounds to hold for µ almost every y. Let us first define, for some given space X and p ∈ [1, ∞), the space L p,ρ (R N ; X) of all strongly measurable mappings v : The following result is implied by [1,Section 2] and [18, Theorem 2, Proposition 3 and Corollary 6].
Proposition 1 shows that for µ almost every y ∈ R N there exists a unique solution to (5) where µ is the product Gaussian measure, and this stems from the measurability of Z and a = exp(Z) in L p,ρ (R N , L ∞ (D)), in particular for p = 1. To make the "for µ almost every y" more tangible we note that if we define the set Ξ := {y ∈ R N : sup j≥1 |y j | b j < ∞} ⊆ R N , then for any y ∈ Ξ Z(·, y) L ∞ (D) = sup x∈D j≥1 under condition (6). From here we can show the wanted results in Proposition 1: a max (y) < ∞ and a min (y) > 0 for all y ∈ Ξ. It can be shown, see, e.g., [1], that the set Ξ has full measure µ(Ξ) = 1 for {b j } j≥1 ∈ ℓ p * , hence "for µ almost every y".
To approximate the solution of (5) the standard approach is to truncate the infinite series (3) to s terms. Let us denote the random field truncated to the first s terms by Z s and the so-obtained truncated lognormal field by a s , i.e., a s (x, y) := a(x, y {1:s} ) = exp(Z(x, y {1:s} )), Z s (x, y) := Z(x, y {1:s} ) = s j=1 y j φ j (x), (12) and by u s the solution to (5) with a = a s , then it is shown, under the assumptions of Proposition 1, in [1,18] that Z s → Z in L ∞ (D), a s → a in L ∞ (D) and u s → u in V when s → ∞ for almost every y ∈ R N . We note that all statements of Proposition 1 also hold for the truncation to s terms, see, e.g., [18].
To compute E[G(u)], see (4), we cope with three computational challenges: the infinite number of variables of the integrand, the unboundedness of the integration domain, and the integrand involves solutions of a PDE. Let us discuss how to solve these problems in more detail.
First, to approximate the infinite-dimensional integral we will make use of the multivariate decomposition method which originated from the changing dimension method, see, e.g., [23,25,31]. The goal is to decompose the infinite-dimensional problem into multiple finite-dimensional ones. Let us illustrate the MDM for calculating the integral I(F ) = R N F (y) dµ(y) = lim s→∞ R s F (y 1 , . . . , y s , 0, 0, . . .) dµ(y 1 , . . . , y s ).
We use the anchored decomposition [26], which, for any finite s, states F (y 1 , . . . , y s , 0, 0, . . .) = u⊆{1:s} where y u takes the values y j for j ∈ u and 0 for those j / ∈ u. (Conditions and properties for F and the F u will be provided later, here we just want to illustrate the concept.) We note that F (y v ) is often written as F ([y v ; 0] v ) or F ((y v ; 0)) or F ((y v , 0 −v )) or F (y v , 0 −v ), but we use our convention to not clutter further expressions more than necessary. Thus, to obtain the projections F u we combine v-truncated evaluations of the function F in points where all components outside of v ⊆ u are set to zero. Taking the limit, we write Instead of just truncating to the first s dimensions, this decomposition allows us to be more selective and approximate the function F , or its integral I(F ) in our case, up to a certain accuracy by considering a set of important subsets U ǫ ⊂ N N , called the active set, as follows This form is interesting when a bounded linear operation on F , which would be costly in the number of "active variables", could be approximated on F Uǫ by wrapping the linear operation inside the sum where hopefully only sets u of small cardinality appear. Note that a plain truncation to the first s dimensions could be represented like this as well, but then the active set would contain all subsets of {1 : s}, including the set {1 : s} itself, and it would be more efficient to just use F (y 1 , . . . , y s , 0, . . .) in this case. The strength of the MDM is exactly that the active set can be chosen as to satisfy for a certain error request ǫ > 0 and appropriate cubature rules Q u,nu . By assessing the relative contribution of specific sets of variables, for a given desired error, the MDM will decide which u to include in the active set to approximate the infinite MDM sum. In Proposition 7 we will show that the sets in the active set have relatively low cardinalities, provided certain conditions on {φ j } j≥1 are satisfied, and hence only relatively low-dimensional problems remain, which can be solved at small cost. As a comparison, for a certain error request ǫ, the truncation strategy might need a truncation dimension of s = 1907, since y 1907 still has large enough influence, thus needing a 1907-dimensional cubature rule, while the analysis of the MDM approach might show it is sufficient to include the sets {1, 1907} and {2, 1907}, along with a lot of other small sets, but never surpassing more than 7-dimensional subproblems, see [10, Table 1] for examples. This is particularly useful when error bounds grow exponentially with the number of dimensions. Next, to compute integrals over the Euclidean space, we exploit the fast decay of the Gaussian distribution to truncate the unbounded domain to bounded boxes. We then use a linear transformation to map the truncated integral into the unit cube. Finally, we apply existing higher-order quasi-Monte Carlo rules, in particular we apply interlaced polynomial lattice rules, see, e.g., [13], to approximate the truncated integral. Interlaced polynomial lattice rules have been used before in the PDE context to achieve higher-order convergence but with uniform diffusion in [8]. In contrast, most existing QMC methods for integration with respect to the normal density, and hence most existing QMC based methods for estimating expected values in the PDE context with lognormal diffusion, map the integral to the unit cube by using the inverse of the Gaussian cumulative distribution function and then use randomly shifted lattice rules to approximate this integral, see, e.g., [24,29], and [14,15,17,18,22]. The aim of all these methods is to obtain dimension-independent convergence by making use of weighted function spaces. However, since using the inverse of the Gaussian cumulative distribution function might damage the smoothness of the integrand, they make use of first order QMC rules, namely, randomly shifted lattice rules, and limit themselves to first order convergence. The proposed QMC method in this paper avoids damaging the smoothness of the integrand by using a truncation and mapping strategy. As a result, it allows us to achieve higher-order convergence rates for sufficiently smooth integrands on the Euclidean space, since they retain their smoothness on the unit cube. We do not include weights in our function spaces and therefore the constants depend exponentially on the number of dimensions. This is by design, since by using the MDM we only need to tackle relatively low-dimensional integrals instead of approximating the truncated high-dimensional integral directly. The importance of the subproblems will determine which ones to include and how well to approximate them.
Lastly, for each variable y sampled by the QMC method the original stochastic PDE becomes a deterministic one. To solve each such problem we use the finite element method (FEM). We call the combination of the multivariate decomposition method (MDM) with the finite element method (FEM) the multivariate decomposition finite element method or MDFEM in short.
For the further discussion we need some notation and properties of the anchored decomposition. In general our multi-indices and variables are infinite-dimensional, e.g., ω ∈ N N and y ∈ R N . By a subscript u ⊆ N we mean to only retain those components in u and set the other components to zero. We extend this notation to sets and for a set A we define In the case that we want the components of a multi-index or vector to range over all values of a set A for the components in u, and be zero otherwise, we write y ∈ A N u , and in this case y u = y. When we only need the components y j for j ∈ u we then write, with a slight abuse of notation, y u ∈ A |u| or even y u ∈ A N u instead of y ∈ A N u . If there is possible confusion then we will indicate the meaning explicitly or use a more explicit notation. Note that, contrary to typical usage, we occasionally also write ω u ∈ N |u| 0 , with ω j ∈ N 0 = {0, 1, 2, . . .} for j ∈ u, which means that we are working with the indices in u, but also allow ω j = 0 for j ∈ u in this case.
We write ∂ ωu yu := ∂ |ωu| / j∈u ∂ ωj yj with |ω u | := j∈u ω j and by (∂ ωu yu F )(y) we mean the value of such a partial derivative at y. Whenever it is clear w.r.t. which variables the derivatives are taken then we just write F (ωu) or F (ω) . Lemma 1. Given the anchored decomposition of a function F (with anchor at zero) by (13) and (14), then F u depends only on the variables listed in u and satisfies F u (y u ) = 0 when ∃j ∈ u : y j = 0. (15) Furthermore, if F has continuous partial derivatives up to ∂ ωu yu for some ω u ∈ N |u| 0 then and (∂ ωu yu F u )(y u ) = 0 when ∃j ∈ u : y j = 0 and ω j = 0.
In the following the function F will actually be G(u) and the projections F u will be G(u u ). We remark that the properties of Lemma 1 will hold for the decompositions of both u and G(u). When we describe function spaces for the functions F u we will identify them with |u|-variate functions on R |u| .
The structure of this paper is as follows. Section 2 presents the main steps of the MDFEM. Section 3 introduces higher-order QMC rules for multivariate integration over the Euclidean space with respect to the Gaussian distribution based on a truncation strategy. A novel anchored Gaussian Sobolev function space is introduced and QMC rules are developed for that specific space. Section 4 discusses the parametric regularity of the solution of the PDE and shows that u u and G(u u ) live in a Bochner space based on the anchored Gaussian Sobolev space, and provides bounds on their norms. Section 5 states our assumptions on the higher-order convergence of the FE approximations. Section 6 presents the main contribution of this paper where the cost model, the construction and the complexity of the MDFEM algorithm are presented. Finally, a comparison with two existing methods, the QMCFEM [18] and MLQMCFEM [17], shows the benefit of the MDFEM.

Applying the MDM to PDEs
We now explain the ingredients of the MDFEM. Our main building block will be u(·, y v ). By u(·, y v ) we mean the weak solution of (1) with y = y v , and we call this the v-truncated solution. That is, for given where a(·, y v ) = exp(Z(·, y v )), This is not unlike the commonly used truncation strategy where one solves the problem (1) by truncating to the first s dimensions, i.e., solving the problem for one (relatively large) set v = {1 : s}, see (12). In contrast, the MDFEM will solve the problem for multiple (relatively small) sets v and combine those results to obtain the u-projected solution u u (which combines all v-truncated solutions u(·, · v ) for all v ⊆ u), see (20) below, to approximate the expected value of a functional G of the solution of (5) up to a given accuracy. A quantitative statement about the number of PDEs which need to be solved and the sizes of the sets u to achieve a certain error is given later in Proposition 7.
Using the truncated solutions we obtain a multivariate decomposition of the full solution u(·, y) by means of the anchored decomposition, cf. (13) and (14), with u(·, y v ) being v-truncated solutions. Note that the full solution consists of u-projected solutions which in their turn consist of v-truncated solutions. The infinite sum in (20) will be truncated to the active set U ǫ . Compared to the more standard solution method of truncating the problem to the first s dimensions, see (12), we remind the reader that, cf. (13), u s (·, y 1 , . . . , y s , 0, . . .) = u⊆{1:s} u u (·, y u ) and hence the limit in (20) can be read as u = lim s→∞ u s . We will show in Section 4 that the functions u u and G(u u ) belong to the spaces H α,0,ρ,|u| (R |u| ; V ) and H α,0,ρ,|u| (R |u| ), see Lemma 4, where the spaces H α,0,ρ,|u| (R |u| ) are reproducing kernel Hilbert spaces which we introduce in Section 3 and we formally identify R N u with R |u| after a relabelling of the components. In Remark 1 we explain that they form an orthogonal decomposition of an infinite-variate reproducing kernel Hilbert space. We will denote integration for F u ∈ H α,0,ρ,|u| (R |u| ) by Convergence of the MDM decomposition (20) and equivalence of the infinite-dimensional integral with is then guaranteed under the conditions from [12] in the setting of applying the MDM to our PDE, see Remarks 1 and 4 for further details, and the conditions we ask in Section 6. Therefore, the convergence statements from [1,18] for solving the s-truncated problem as in (12) will also hold in our case. Note that the active set U ǫ , which will be defined in (69), grows for ǫ → 0 and will include all subsets of {1 : s ǫ } for some s ǫ for which s ǫ → ∞ as ǫ → 0. (Indeed, for any finite set u there is an ǫ such that u ∈ U ǫ .) The above discussion was using the exact weak solution for each v-truncated problem which is typically not available and a numerical approximation will be computed using the FE method of which the approximation error will have to be taken into account in the error analysis. Let us define a family of finite-dimensional subspaces V h ⊂ V , where h > 0 is the mesh diameter, i.e., the largest element diameter over all elements of the FE mesh, and such that The finite element approximation of the weak formulation of the v-truncated problem (18) for a given y v is to find u h (·, y v ) ∈ V h such that the following equation holds We can now piece together the different parts of the MDFEM. To approximate (4) we make use of (22), replacing the integrals by cubature formulas and the solutions to the PDEs by FE approximations, and hence the MDFEM takes the form where U ǫ is the active set, Q u,nu are cubature rules with nodes y We emphasize that in calculating G(u hu u (·, y u )) we are solving 2 |u| PDEs, all with the same FE mesh, indicated by the FE mesh diameter h u for all the v-truncated solutions in the formula above.
The computational cost of the MDFEM algorithm is given as There are three sources of error in the MDFEM approximation (24): the truncation error from truncating the infinite sum, the cubature errors in approximating the integrals, and the FEM errors in approximating the solutions to the PDEs. They are gathered into two terms as follows A sufficient condition to achieve an approximation error of at most ǫ is that both of these terms are less than ǫ/2. This forces us to construct the active set U ǫ such that the first term is bounded by ǫ/2. For each u ∈ U ǫ the FE space V hu and the cubature rule Q u,nu are then chosen such that the second term is bounded by ǫ/2 while minimizing the computational cost (27). This will be the strategy we follow in Section 6.
3 Higher-order quasi-Monte Carlo rules for finite-dimensional integration with respect to the Gaussian distribution using truncation for anchored integrand functions In this section we consider quasi-Monte Carlo rules for approximating integrals over R s with respect to the Gaussian distribution. Particularly, we are interested in computing s-dimensional integrals of the form where, with a slight abuse of notation, in this section ρ is the product Gaussian distribution s j=1 ρ(y j ). In further usage we are calculating I u (F u ) = I s (F ), where I u is the integral w.r.t. ρ u (y u ) dy u , and the s dimensions here will be a relabelling of the variables in y u with the function F = F u = G(u hu u (·, y u )) given by the anchored decomposition for some u ⊂ N such that s = |u| < ∞. We will show in Section 4, see Lemma 4, that the functions G(u hu u (·, y u )) belong to the function spaces which we introduce below.
To approximate the integral I s (F ), we first truncate the Euclidean domain to a multidimensional bounded box, then use a linear mapping to transform the truncated integral into one over the unit cube, and finally approximate the integral over the unit cube using suitable cubature rules. More precisely, the truncated and transformed integrals have the following form The resulting integral (30) is then approximated using an n-point QMC rule of the form where {y (i) } n−1 i=0 are well chosen cubature points on the unit cube. We note that it would be possible to truncate the box differently for each dimension, defining a box [−T 1 , T 1 ] × · · · × [−T s , T s ] as was done, e.g., in [27]. We do not pursue such a strategy here since in the application of the MDM we are not immediately making use of the different importances of the dimensions for each projected F u .

Two reproducing kernel Hilbert spaces
We will introduce two reproducing kernel Hilbert spaces. The first function space is an anchored Sobolev space on the Euclidean space R s with Gaussian measure, cf. (29), for which we will take into account that our integrand function F is anchored. We will show in Section 4 that under appropriate conditions the functions G(u u ) belong to this first function space. The second function space is an unanchored Sobolev space on the unit cube to analyse our truncated and mapped integral, cf. the right hand side of (30). For this second function space we can then adjust techniques from [8] such that interlaced polynomial lattice rules will achieve higher-order convergence in approximating the integral. Contrary to most modern QMC results we do not introduce weights in the function spaces since we rely on the MDM to keep the number of dimensions limited. Therefore our error bounds will contain exponential factors in the number of dimensions. However, these weights will eventually show up when we handle the infinite-variate case, see Remark 1. We begin with introducing the univariate function space. For α ∈ N the space H α,0,ρ (R) consists of integrable functions over R with respect to the Gaussian distribution having absolutely continuous derivatives up to order α − 1 for any bounded interval and square integrable derivative of order α over R with respect to the Gaussian distribution and are anchored at 0. Note that this allows us to use the Lebesgue version of the Fundamental Theorem of Calculus, and hence the Taylor theorem with integral remainder up to order α. For F, G ∈ H α,0,ρ (R) the anchored inner product is defined as Note that, because we use the anchored decomposition for the MDM, we only consider functions such that F (0) = 0 and hence the term for τ = 0 which is normally there in the first sum is zero here. The associated norm is given by · Hα,0,ρ(R) := ·, ·

1/2
Hα,0,ρ(R) . We note that another common choice of a Gaussian Sobolev space is the unanchored Gaussian Sobolev space or Hermite space, see, e.g., [7,19,20]. Instead of anchoring the values of the function and its derivatives up to order α − 1 at 0 as in (32), they are integrated out against the Gaussian distribution over R. However, here we want to benefit from the anchored decomposition and therefore will use an anchored Sobolev space.
The anchored Gaussian Sobolev space for anchored functions H α,0,ρ (R) is a reproducing kernel Hilbert space with kernel where 1{X} is the indicator function on X. Such a kernel for α = 1 was given in [29, Section 3.3], but we note that in this case the weight function in the formula for the inner-product and the reproducing kernel cannot be taken the same as in the integral (29) for α = 1, see [29, Table 1]; here we are in fact interested in α ≥ 2, but see also Remark 2. A slightly different kernel for an anchored Sobolev space over the Euclidean space with higher order smoothness, although without taking any weight function into account, was given in [30,Section 11.5.1]. For completeness we provide the full derivation of the reproducing kernel for inner products like (32) for general ρ in Appendix A.2.
We define the multivariate space as the tensor product of the univariate spaces. The kernel of our space of anchored functions is then given by Note that this kernel itself also has the anchored property: if there is a j ∈ {1 : s} for which x j = 0 or y j = 0 then K α,0,ρ,s (x, y) = 0. The corresponding inner product is where, in the first line, −v = {1 : s}\v and (y v , 0 −v ) is a vector of s variables such that (y) j = y j for j ∈ v and 0 otherwise. The second line follows by our convention that y v is a vector of the appropriate size which takes the value zero outside of v. In many references, see, e.g., [4], such inner products are usually written using a double sum, which here then takes the following form is a vector of s variables such that the jth component equals τ j for j ∈ −v and α otherwise. The corresponding norm is given by · Hα,0,ρ,s(R s ) := ·, ·

1/2
Hα,0,ρ,s(R s ) . In Lemma 5 in Appendix A.2 we show how anchored functions in anchored spaces like H α,0,ρ,s (R s ) can be represented by a Taylor series with integral remainder.
In analysing the error of the MDM algorithm we will need to be able to get estimates on integrals of functions F u ∈ H α,0,ρ,|u| (R |u| ). We will make use of the following properties of the reproducing kernel.
and, for u ⊂ N, we have Hence for F u ∈ H α,0,ρ,|u| (R |u| ) we have Proof. For y > 0 we have .
The same bound holds for any y < 0. Thus, using with the maximum for α ∈ N achieved for α = 3. The bound on I u (F u ) follows by using the reproducing property of the kernel To show the last claim we write with 2 F 2 a generalized hypergeometric function. The sum over r will stay finite when integrating against the normal density, similar like above. For the second part we have which is finite when 2 ≤ α < ∞.
Remark 1. Our discussion so far is on finite-variate spaces H α,0,ρ,|u| (R |u| ) of anchored functions. We now show how they form an orthogonal decomposition for a space of s-variate functions (not just anchored functions) and we will then extend this to infinite-variate functions. The kernel for the s-variate weighted anchored Sobolev space can be expressed as, see, [26,Example 4.4], where {γ u } |u|<∞ is a sequence of positive weights, where for u = ∅ we set γ ∅ := 1. For u = ∅ we have the space of constant functions and we define K α,0,ρ,|∅| := 1 and f ∅ K α,0,ρ,|∅| := |f ∅ |. The weights γ u in (39) model the importance of our subspaces and we will show in Section 6 that they can be taken of product form γ u = j∈u γ j with γ j = √ 2 b j . In Section 6 we will eventually demand that {b j } j≥1 ∈ ℓ p * (N) for p * ∈ (0, 1). Hence we know that j≥1 γ j < ∞. By taking the limit for s → ∞ we obtain the reproducing kernel of the infinite-variate reproducing kernel Hilbert space H α,ρ,γ (R N ), see [12], and where F u and G u are obtained by the anchored decomposition of the functions F and G. Since our kernels are the tensor products of a univariate kernel, the subspaces are orthogonal and their intersection only contains the zero function for u = v, see [12]. Technically, the kernel K is a reproducing kernel only when x, y ∈ Y with Y := {y ∈ R N : K(y, y) < ∞}, but we can use [12 to show that µ(Y) = 1 for α ≥ 2 under the condition of product weights γ u = j∈u γ j with j≥1 γ j < ∞. This can be shown using (38) and the same reasoning as in the proof of Proposition 6. This means Y can be replaced by R N in the "almost everywhere" sense for the infinitevariate reproducing kernel Hilbert space and we can write R |u| as the domain for each subspace instead of the formal domain Y. Finally, this condition allows us to claim convergence of struncated functions to the infinite-variate function in the Hilbert space and to claim equality for the MDM form of the infinite-dimensional integral see again [12]. We refer to Remark 4 for the verification that our integrand function F = G(u), with u(·, y) being the solution of the PDE, belongs to H α,ρ,γ (R N ) under the studied conditions.

Unanchored Sobolev space on the unit cube
We need a function space over the unit cube to analyse the error for our cubature method of choice, which will be interlaced polynomial lattice rules. For this, let us define the unanchored Sobolev space over the unit cube H α,s ([0, 1] s ). This is the same space as was used in, e.g., [4]. For α ∈ N the univariate space H α ([0, 1]) consists of integrable functions over [0, 1] having absolutely continuous derivatives up to order α − 1 and square integrable derivative of order α. For the univariate case the unanchored inner product is with norm · Hα([0,1]) := ·, ·

1/2
Hα([0,1]) . Note that contrary to our Gaussian function space which is taking benefit of the functions being anchored here we must include the typical τ = 0 term in the first sum.
The multivariate space is the tensor product of the univariate spaces with inner product and norm · Hα,s([0,1] s ) := ·, ·

1/2
Hα,s([0,1] s ) . Let us denote integration over the unit cube for functions F ∈ H α,s ([0, 1] s ) by and a QMC rule using a point set P n = {y (0) , . . . , y (n−1) } over the unit cube by For this space we can construct interlaced polynomial lattice rules which achieve the almost optimal order of convergence. The full derivation of the following result is given in Appendix A.4.
. For any m ∈ N an interlaced polynomial lattice rule of order α with point set P n,α with n = 2 m points can be constructed with cost O(αsn log(n)) such that Proof. This result follows from combining Propositions 10 and 11 in the appendix. The construction cost follows from the algorithm for product weights, which we set all equal to 1, in [8]. To complete the analysis of mapping and truncating the integral (29) to (30) and using the result from Theorem 1 we need to show a bound on the norm in H α,s ([0, 1] s ) of the mapped function in terms of the norm in H α,0,ρ,s (R s ) of the original function. Since the proof is quite long it is presented in the appendix.
and I 0 (·) is the modified Bessel function of the first kind of order 0 with I 0 (1/2) ≈ 1.06348.

Higher-order quasi-Monte Carlo for integration over R s
We are now ready to state the error bound of the method described in the beginning of this section.
. . , y (n−1) } be the point set of an interlaced polynomial lattice rule of order α with n = 2 m points, with m ∈ N, according to Theorem 1. Then, for any λ ∈ [1, α) and by taking T = 2 + 2 λ ln(n), the QMC rule Q s,n , defined in (31), using the point set P n,α has an error bounded as where C α,λ,s is a constant, independent of F and n, defined below by (53).
Proof. The error splits into two terms The first term is the domain truncation error. Similar as in Proposition 2, using the reproducing property of the kernel K α,0,ρ,s , we obtain To obtain a bound on the truncation error we will bound the integral on the right hand side. We have We will estimate each of the above integrals. Using the same argument as in the proof of Proposition 2 we have for any T ≥ 2 T for y ≥ T , and in the last step filled in T = 2. Moreover, we have for T > 2, and with the substitution t = y/2 − 1, Inserting this into (48) yields Applying this inequality together with (36) and (47) to (46) we obtain with We move to the second term of the total error which is the cubature error. Using the result of Theorem 1 and Proposition 3 we have for any λ ∈ [1, α) where with C 1,α and C α,λ,s , respectively, defined in (43) and in (42). Combining (45), (49) and (51) leads to To balance the dominating terms in the square brackets we choose T = 2 + 2 λ ln(n) such that e −(T /2−1) 2 = n −λ . Hence, where in the second inequality we used 2 with C 3,α,s and C 4,α,λ,s , respectively, defined in (50) and (52).
Remark 2. A similar result as that of Theorem 2 can be shown for the anchored Gaussian Sobolev space with first order smoothness H α,0,ρ,s (R s ) with α = 1 using randomly digitally shifted polynomial lattice rules which achieve the optimal convergence rate in H α,s ([0, 1] s ) with α = 1, see [9]. However, in this case we have to be slightly careful as (38) does not hold for α = 1. We note however that we can change from y j ∼ N (0, 1) to y j ∼ N (0, c 2 ) with 0 < c < 1 in (3), in effect changing the variance of the normal distribution that we integrate against. To keep the law of the random field unchanged we will have to divide each y j by c in (3). If we keep the weight in the kernel unchanged then the integral (38) w.r.t. N (0, c 2 ) will be finite, see also [29, Table 1]. The effect of dividing y j by c can now be moved into the basis functions φ j and eventually ends up as multiplying each b j with c. Now take c = 1 − δ for arbitrarily small δ > 0. For condition (6) we then obtain κ c = κ/c ≤ (1 + δ) κ < ∞ if before we had κ < ∞. The same remark holds for condition (57) which asks κ < ln(2)/α and which we will need in the next section. We here obtain κ c ≤ (1 + δ) κ < ln(2)/α if before we had κ < ln(2)/α. Combining such randomized cubature rules with a suitable truncation of the Euclidean domain gives a similar convergence rate as in (44), however, of order λ ∈ [ 1 2 , 1). Remark 3. An alternative approach is to embed the function (F ρ) • T into the anchored Sobolev space over the unit cube and then use higher-order polynomial lattice rules, see, e.g., [6,28]. However, a non-trivial result similar to Proposition 3 is then needed to obtain an explicit formula for the embedding constant. One of the reasons we choose our approach of mapping to the unanchored Sobolev space is that then the technique of [8] to make use of interlaced polynomial lattice rules could be used with a construction cost of O(αsn log(n)) as given in Theorem 1. The construction cost of higher-order polynomial lattice rules on the other hand grows exponentially in n with respect to the smoothness, see [5]. We remark that this excessive construction cost could still be avoided by making use of yet another embedding of the anchored Sobolev space over the unit cube into the unanchored Sobolev space on the unit cube, see [11,Example 2.1]. In that case the interlaced polynomial lattice rules from Theorem 1 could then also be used.

Parametric regularity of the PDE solution
We next derive bounds for mixed derivatives of the solution u(·, y) with respect to y. For α ∈ N and u ⊂ N let us define the Bochner norm based on the spaces H α,0,ρ,|u| (R |u| ), with inner product (34), and V by Note that ∂ τu yu u u is a function which only depends on the variables in u and inside the integral we evaluate this function at y v with v ⊆ u setting all y j = 0 for j / ∈ v. The following result was also used in [28,Lemma 2] for the analysis of the MDFEM in the uniform case and allows us to use the regularity analysis on u(·, · u ) instead of on u u , since their norms in H α,0,ρ,|u| (R |u| ; V ) coincide. Note however that u(·, · u ) / ∈ H α,0,ρ,|u| (R |u| ; V ) since it does not satisfy the anchored properties, except maybe in exceptional cases.
Lemma 2. For α ∈ N and u ⊂ N, let u u be obtained by the anchored decomposition (20) and let u(·, · u ) be the u-truncated solution, cf. (18) Proof. This follows directly from property (16) of the anchored decomposition and (54).
For a given y u ∈ R N u and with v(·, y u ) ∈ V let us introduce the notation Note that · V,ay u depends on y u . It is easy to see that for every v(·, and additionally, when v(·, y u ) = u(·, y u ) ∈ V is the u-truncated solution, cf. (18), we obtain where f ∈ V * is the right hand side of the PDE and where we used (10) for y = y u . The following result is modified from [21, Proposition 3.1], see also [1,Theorem 4.1], and accounts for the truncation to an arbitrary set u ⊂ N.
The previous result can now be used to show a bound on the norm of u(·, · u ) and as a consequence also on the norm of u u . Lemma 3. Assume the sequence {b j } j≥1 satisfies the assumptions of Proposition 4 for a given α ∈ N, and, additionally that {b j } j≥1 ∈ ℓ p * (N) for some p * ∈ (0, 1]. Then it holds that Proof. Using the definition of the Bochner norm (54), but with the H α,0,ρ,|u| norm written as a double sum, cf. (35), we have :α} |u| s.t. τj=α for j∈v and τj <α for j / ∈v where we applied Proposition 4 with y u = (y v , 0 u\v ) ∈ R N v ⊆ R N u for each v ⊆ u. Now we estimate the sum in the last expression. Our strategy is similar to that in [18, proof of Theorem 13]. Note that for any Therefore, we have where we used that the integral in the second line can be interpreted as the cumulative standard normal distribution evaluated at 2κb j , and this can be bounded by exp(2(2κb j )/ √ 2π)/2, see, e.g., [14, p. 355]. Inserting (60) into (59) we obtain Since {b j } j≥1 ∈ ℓ p * (N) ⊆ ℓ 1 (N) ⊂ ℓ 2 (N) for p * ∈ (0, 1] the sum in the last expression is finite. Taking the square root on both sides finishes the proof. Combining Lemma 2 and Lemma 3 we obtain bounds for the norms of u u and G(u u ). Note that the arguments used to arrive at these bounds are based on the weak formulation with u(·, y u ) ∈ V . They remain valid for the approximation u hu u , in which we combine the FE approximations u hu (·, y v ) ∈ V hu for all v ⊆ u, since V hu ⊂ V , with constants independent of h u . Note that this requires us to use the same FE mesh diameter h u for all the v-truncated solutions which we use to calculate u hu u , see (25).
Lemma 4. Assume the sequence {b j } j≥1 satisfies the assumptions of Proposition 4 for a given α ∈ N, and, additionally that {b j } j≥1 ∈ ℓ p * (N) for some p * ∈ (0, 1]. Let G ∈ V * be a bounded linear functional. Then it holds that u u ∈ H α,0,ρ,|u| (R |u| ; V ) with with C ′ κ,α given in (58). Moreover, for V hu ⊂ V , we have u hu u ∈ H α,0,ρ,|u| (R |u| ; V ) and G(u hu u ) ∈ H α,0,ρ,|u| (R |u| ) with the same bounds on their norms as given for u u and G(u u ) up to a multiplicative constant independent of h u .

Finite element approximation error
For bounding the error we also need results on the FE approximation error, which is the error between the true solution u(·, y v ) ∈ V and the discretized solution u hu (·, y v ) ∈ V hu ⊂ V of the weak form, cf. (18) and (23), where (25) dictates which FE approximations we need to take. The FE approximation error depends on the spatial regularity of the solution which depends on the smoothness of the domain D, the right hand side of the PDE f and the spatial regularity of the diffusion coefficient a(·, y v ) = exp(Z(·, y v )).
Our assumptions are quite standard, see, e.g., [18,Proposition 15] and also [8, Theorem 2.4 and Theorem 2.5]. We will assume that, given there exists a t ∈ (0, ∞) such that and for which there exists a positive sequence {b j } j≥1 , with 0 < b j ≤ 1 for all j, satisfying condition (6) and (7), which will be further strengthened, see Theorem 3, as well as then, for all p ∈ [1, ∞) and τ < 2t there is a constant C > 0 such that, for any h v > 0, we have an FEM, using a continuous piecewise polynomial basis of total degree r ≥ ⌈τ /2⌉, for which Here C depends on the spatial regularity of D, a, f and G through (61)-(63), but is independent of h v and v. We note that to handle singularities, due to, e.g., reentrant corners for d = 2, one either has to change the norms to incorporate a weight function as in [17, Proposition 2.3 and Remark 2.4] or use local mesh refinement as, e.g., in [15]. We refer to [17] for the definitions of the norms and omit such details here.

MDFEM
In the next subsections we explain the cost model for the MDFEM algorithm and select the active set, cubature rules and finite element approximations based on a priori error estimates. The main complexity result will be presented in Theorem 3, after which we compare the MDFEM with the QMCFEM and MLQMCFEM algorithms.

Computational cost
The total cost of the MDFEM algorithm (24) is comprised of the costs of computing Q u,nu (G(u hu u )) for all u ∈ U(ǫ), where U(ǫ) is the "active set" determined to reach a given error request ǫ > 0. For each u ∈ U(ǫ) the cost of computing Q u,nu (G(u hu u )) is given by n u times the cost of computing G(u hu u ), see (27). Based on the decomposition (26) the cost of evaluating G(u hu u ) is bounded by 2 |u| times the cost of evaluating the FE approximation of the u-truncated solution, where we assume the cost of approximating u hu (·, y u ) to be dominating those of u hu (·, y v ) for all v ⊆ u since the stiffness matrix involves calculating (19) at a cost which we will estimate at O(|v|).
Technically this cost could be avoided by using a Gray code ordering of enumerating the sets v ⊆ u. Hence, we have For each y u , and hence for each y v obtained from this y u for v ⊆ u, the cost of evaluating the FE approximation u hu (·, y v ) is given by the cost of assembling the stiffness matrix plus the cost of solving the linear system. Due to the locality of the O(h −d u ) basis functions of V hu , the stiffness matrix is sparse and has O(h −d u ) nonzero entries. Each entry in turn needs at most O(|u|) operations to evaluate the diffusion parameter, cf. (19), which could be avoided, see the previous remark, but we will leave this cost in, since there are other terms more dominating, cf. (78) forthcoming. We assume cost of solving the linear system E.g., in [18,Section 10] it is assumed that the cost is nearly linear and that δ ′ > 0 can be chosen arbitrarily small, making use of [16,Corollary 17]. Thus, using (m + 1) ≤ 2m for m ∈ N, we have To simplify the further exposition, we will write £ u := 2 |u| |u| and d ′ := d (1 + δ ′ ). Therefore, the total computational cost of the MDFEM is

Error analysis
We will now give an a priori estimate of the total error of the MDFEM algorithm. We will use the higher-order convergence of our QMC based cubature rules from Section 3, together with the required parametric regularity results of the PDE solution from Section 4, and the convergence of the FE approximations from Section 5. For the convergence of the FE approximation we remind the reader that similar assumptions as given in (61)-(63), possibly extended with weighted norms or local mesh refinements, are needed to allow the convergence of (64) to hold. The next result holds for any choice of U(ǫ) and with the MDFEM algorithm Q ǫ making use of this active set. The actual choice of the active set, as well as the cubature rules and FE approximations, to reach a certain error request ǫ will be shown in the next sections based on the error bound in the next result.
Proposition 5. Let the set U(ǫ) ⊂ N N be given and assume that the conditions of Lemma 4 are satisfied for a given α ∈ N, and hence there is a sequence {b j } j≥1 ∈ ℓ p * (N) for some p * ∈ (0, 1], with 0 < b j ≤ 1 and κ < ln(2)/α. Let G ∈ V * be a bounded linear functional. Assume D, a, f and G have sufficient spatial regularity such that the application of G to the FE approximations of the truncated solutions converge like O(h τ ) as in (64). Let the cubature rules be defined as the transformed interlaced polynomial lattice rules with interlacing factor α as in Theorem 2, and hence α ≥ 2, and, with m u ∈ N, using n u = 2 mu ≥ 2 points. For n u = 0 and n u = 1 we take the zero approximation. Then the error of the MDFEM algorithm Q ǫ , see (24), based on the given set U(ǫ) can be bounded as for any λ ∈ [1, α), with the corresponding truncation points for the cubature rules chosen in accordance with this λ, i.e., T u = 2 + 2 λ ln(n u ), and where α 1 = α/2 + 1/4, C α,λ,|u| is given by (53), M u = M |u| with M given by (36) and γ u := j∈u γ j with and γ ∅ := 1. The first term in (67) corresponds to the truncation error, while the second term corresponds to the combined error due to the FE approximations and QMC cubatures. The hidden constant in (67) is independent of the choice of the active set U(ǫ) and the choices of n u and h u .
Proof. The error splits into three terms: The first term is the truncation error from truncating to only a finite number of decomposed elements. The second term stems from the spatial discretization of the FE approximations. The last term is the cubature error arising from using cubature rules to approximate the integrals.
1. Since we know that G(u u ) ∈ H α,0,ρ,|u| (R |u| ), see Lemma 4, we can make use of (37) in Proposition 2. Therefore, with {γ u } |u|<∞ a sequence of positive weights, the truncation error can be bounded as 2. For the error due to the FE approximations we have u∈U(ǫ) Due to the linearity of G we have Thus, using the assumption of an FE approximation error bound as in (64), we have with the hidden constant independent of h u and u but dependent on the spatial regularity conditions of D, a, f and G, see, e.g., (61)-(63). Hence the error incurred by the FE approximation can be bounded as with the hidden constant independent of h u .
To show that the above formula also holds for u = ∅ we recall from Remark 1 that we have F ∅ Hα,0,ρ,0 = |F ∅ | = |F (0)|. Thus, for n ∅ = 0 the absolute value of the cubature error is |F (0)| while, if for n ∅ ≥ 1 we set Q ∅,n ∅ (F ∅ ) = F ∅ , the error is zero for n ∅ ≥ 1. Hence for any n ∈ N 0 we have We have now bounded all three contributions to the error. For the truncation error and the cubature error we still want to choose the weights γ u . For the truncation error we obtain from Lemma 4 that (58), under the assumptions of {b j } j≥1 ∈ ℓ p * (N) for p * ∈ (0, 1]. For the cubature error we obtain, also from Lemma 4, that such a bound holds with u hu u in place of u u and hence we also have sup u∈U(ǫ) By choosing γ j = √ 2 b j , and with γ u = j∈u γ j , we have sup |u|<∞ γ −1 u b u 2 |u|/2 = 1.
Combining all three errors we obtain the claimed bound for the total error.
Remark 4. In the previous proof we made use of a supremum-norm over all subspaces of the infinite-variate space, namely sup |u|<∞ γ −1 u F u H α,0,ρ,|u| (R |u| ) . Since we know from Lemma 4 that F u H α,0,ρ,|u| (R |u| ) b u 2 |u|/2 , where b u = j∈u b j , we have chosen product weights γ u = j∈u γ j with γ j = √ 2 b j such that this supremum-norm is finite. This same choice of weights γ u also shows that our infinite-variate function F = |u|<∞ F u has finite norm in the infinite-variate reproducing kernel Hilbert space H α,ρ,γ (R N ) which we introduced in Remark 1, since which is finite when j≥1 b j < ∞ (using the technique as in the proof of Proposition 6). The summability of the b j is implied by our assumption in Proposition 5 which demands {b j } j≥1 ∈ ℓ p * (N) for some p * ∈ (0, 1].

Selection of the active set
Based on the expression of the truncation error in Proposition 5 we can choose the active set to reach a truncation error upper bounded by ǫ/2 up to multiplicative constants.
Proposition 6. Under the conditions of Proposition 5 with p * ∈ (0, 1), for which {b j } j≥1 ∈ ℓ p * (N), let the MDFEM active set be chosen by with γ j = √ 2 b j for all j as in (68) and with γ u = j∈u γ j . Then |u|<∞ (γ u M u ) p * < ∞, and the MDFEM truncation error is bounded as Proof. For the first claim we have the implications, with γ u = j∈u γ j and M u = M |u| , where we used ln(1 + x) ≤ x for x > −1 and the last line is true since {b j } j≥1 ∈ ℓ p * (N). For the second claim we have from Proposition 5 The following result states that both the cardinality of the active set, as well as the cardinalities of each of the individual sets in the active set, increase very slowly with decreasing ǫ. The (upper bound of the) cardinality of the active set also gets smaller for decreasing p * . Proposition 7. Given γ u = j∈u γ j with {γ j } ∈ ℓ p * (N) for some p * ∈ (0, 1) and U(ǫ) = U(ǫ, p * ) chosen as in (69), then for any ǫ > 0 it holds that Proof. See [28,31].

Selection of the cubature rules and FEMs
By the combined error of the FE approximations and the cubature errors from Proposition 5 we can now choose the mesh diameters h u for the FE approximations and the number of cubature points n u for the cubature formulas to balance the combined error and obtain an upper bound of ǫ/2 up to multiplicative constants using the method of Lagrange multipliers to minimize the associated cost. The maximum appearing in the bound of the next result will be dealt with later in Theorem 3.
and take the FE mesh diameters h u and the number of cubature points n u as the solution to an optimization problem (specified as (72) in the proof of this statement, with solutions (74) and (76) and with n u = 2 log 2 (⌊ku⌋) ∈ N 0 ). Then the combined error from the FE approximations and the cubature approximations of the MDFEM algorithm, using transformed polynomial lattice rules with interlacing factor α, is bounded as where α 1 = α/2 + 1/4, and the computational cost is bounded as Proof. From Proposition 5 we obtain u∈U(ǫ) Note that compared to Proposition 5 we have (possibly) increased the upper bound by multiplying the numerator by 2 λ while only multiplying the denominator by 2 λ for each u with n u ∈ N 0 when n u ≥ 1. This is such that the optimization problem we formulate next will give us an upper bound for this error. We will deal with the max term later in Theorem 3 and now put an upper bound of ǫ/2 on the sum over u. We are looking for positive real numbers k u and h u which are the solutions of the following minimization problem: Note that since we set n u = 2 log 2 (⌊ku⌋) ≤ k u , the objective function is an upper bound on the cost (66) while the constraint is an upper bound on the sum over u in the error bound (71) since max{1, 2 n u } = max{1, 2 1+log 2 (⌊ku⌋) } ≥ k u . See also, e.g., [23,Section 4.3] for a similar technique. Define a u := γ u 2 λ C α,λ,|u| |u| α1|u| . The Lagrangian is then given by with ξ the Lagrange multiplier. For each u ∈ U(ǫ) we obtain the equations From the second equation we obtain (after multiplying with h u ) which we insert into the first equation to obtain of which we substitute the first form into the constraint to obtain From (73) and (74) we also find where we have set .
Inserting this expression for k u into (75) we obtain the following expression for the Lagrange multiplier We require the sum K ǫ in (77) to be uniformly bounded while ǫ → 0, that is, we require Since γ u = 2 |u|/2 j∈u b j with {b j } j≥1 ∈ ℓ p * (N) and, both £ u and C α,λ,|u| are at most exponential in |u|, by applying [28, Lemma 1] the sum (78) is bounded if the following two conditions are satisfied: τ α 1 τ + λ (τ + d ′ ) < 1, or equivalently, and Note that it is required in Theorem 2 that λ ≥ 1, and together with (80) this restricts us to the case when p * is sufficiently small, that is, Making use of (73) and the first expression in (74), and then (75) and (77), the computational cost (66) is bounded like Using the stated conditions we know that K ǫ ≤ K < ∞. Hence we can write To optimize the speed of convergence we have to pick λ and τ as large as possible while satisfying (80). Hence we choose and we have λ ≥ 1 due to (81). By choosing the interlacing order of the interlaced polynomial lattice rule to be α = ⌊λ⌋ + 1 we satisfy α ≥ 2 and λ ∈ [1, α), and we also satisfy (79), since for any λ ≥ 0. This finishes the proof.
Remark 5. Since Proposition 8 is making use of interlaced polynomial lattice rules to achieve higher order convergence, i.e., λ ≥ 1, we end up with the condition 0 < p * ≤ (2 + d ′ /τ ) −1 there. Following the proof we see that if we make use of first order cubature rules with error bounds for 1 2 ≤ λ < 1, we obtain the condition p * ≤ (3/2+d ′ /(2τ )) −1 by using (80). In particular, we can use transformed randomly digitally shifted polynomial lattice rules from Remark 2 as cubature rules in the MDFEM algorithm. A full description and analysis of using such randomized cubature rules in the context of the MDFEM algorithm for the uniform case is given in [28]. We will write E ∆(ǫ) [|I(G(u)) − Q ǫ (G(u))| 2 ] to denote the total mean square error of the randomized MDFEM algorithm using randomly digitally shifted polynomial lattice rules. The expected value is taken over a set ∆(ǫ) := {∆ (u) } u∈U(ǫ) of independent random digital shifts, one for each subproblem u, and where each such shift ∆ (u) is uniformly distributed over [0, 1) |u| .

Main result
We are now able to analyze the complexity of the MDFEM algorithm. The analysis is under the conditions of Propositions 5-8 extended to the randomized setting using Remarks 2 and 5.
Theorem 3. Assume that for a given α ∈ N there is a sequence {b j } j≥1 ∈ ℓ p * (N) for some p * ∈ (0, 1), with 0 < b j ≤ 1 and Let G ∈ V * be a bounded linear functional. Assume that D, a, f and G have sufficient spatial regularity such that the application of G to the FE approximations of the truncated solutions converge like O(h τ u ) as in (64) and assume that the cost of solving the linear systems for the FEMs are O(h −d ′ u ) as in (65). For a given requested error ǫ > 0 take the active set U(ǫ) = U(ǫ, p * ) as in (69). Set Further, take the FE mesh diameters h u and the number of cubature points n u as the solution to the optimization problem (72) (see Proposition 8 and Remark 5, with solutions (74) and (76) and with n u = 2 log 2 (⌊ku⌋) ), such that the convergence of the cubature rules over the unit cube can be bounded by O(n −λ u ), see Theorems 1 and 2 and Remark 2. Then, for the MDFEM algorithm Q ǫ , given in (24), the following hold.
Finally we would like to compare the complexity of the MDFEM with the quasi-Monte Carlo finite element method (QMCFEM) [18] and with the multilevel quasi-Monte Carlo finite element method (MLQMCFEM) [17], both in the setting of using product weights and a wavelet expansion for the lognormal field.
The QMCFEM truncates the parameter vector y to some dimension s and then approximates the s-truncated PDE for different samples y (i) ∈ R s , i = 0, . . . , N − 1, obtained by a QMC method. Because s might be arbitrarily large, the QMCFEM requires QMC rules with convergence independent of the dimension of the integrand. Such QMC rules over the Euclidean space R s with the Gaussian distribution were developed in [24] by mapping randomly shifted lattice rules over the unit cube [0, 1] s to R s by the inverse of the normal cumulative distribution. However, since this mapping might damage the smoothness of the integrand, the convergence rate of these QMC rules was limited to first order (with respect to the number of QMC points N ). Particularly, the QMCFEM was analysed in [18,Section 10] under the same conditions as those of Theorem 3 (marked with the subscripts "PROD" for product weights and "wav" for wavelet expansion, and making use of "Gaussian weight functions" for the QMC rules) to achieve a root-mean-square error bound of the form where δ > 0 is a parameter that might be chosen arbitrarily small (but then increases the hidden constant towards infinity), N is the number of cubature points, h is the FE mesh diameter and s is the truncation dimension. (Note that [18, Section 10] considers the specific choice of a Gaussian random field with Matérn covariance for which we (optimistically) set p * = d/ν, with ν the smoothness parameter of the Matérn covariance function. Note additionally that in [3] it is proven that p * can be chosen arbitrarily close to d/ν in the case of a wavelet-type expansion.) The cost for the QMCFEM can be estimated by cost(Q QMCFEM ) N (s + h −d ′ ). Since we are interested in the asymptotic best possible rates, we will ignore the hidden constants and focus on the rate. This means we formally set δ = 0 in (83) as a proxy of being arbitrarily close to the optimal rate and thereby obtaining a slightly optimistic bound on the work. By balancing all three contributions to be ǫ/3 we find with a QMCFEM : A smaller exponent means less work. For both exponents, a QMCFEM as just defined, and a MDFEM as given in (82), we can recognize two parts: the first part is the convergence order of the QMC cubature w.r.t. its cost and the second part is the convergence order of the linear functional applied to the FE approximation w.r.t. the cost of the FE approximation.
Hence, for any p * such that 0 < p * ≤ (2 + d ′ /τ ) −1 the MDFEM performs better or similar compared to the QMCFEM, with the possibility of higher-order convergence for the MDFEM.
As a second comparison we look at the multilevel variant of the QMCFEM which is the MLQMCFEM algorithm given in [17]. Since the QMC rules for the MLQMCFEM are the same kind as those for the QMCFEM, they can achieve at most order 1 convergence and this requires 0 < p * < 2/3. The error bounds in [17, Theorem 6.3 and 6.5] take more complicated forms in whichξ there is the convergence of the QMC rule on each level, comparable with our λ, and η there is our δ ′ . They are written in terms of the dominating cost, being either ǫ −1/λ , with λ < 1 due to the QMC rules there, or ǫ −d ′ /τ with extra log factors depending on the situation. So also in this case it is clear that in the case 0 < p * ≤ (2 + d ′ /τ ) −1 < 1/2 < 2/3 if we take λ ≥ 1 for the MDFEM and we can take d ′ /τ ≤ 1/λ then the MDFEM performs better or similar compared to the MLQMCFEM, with the possibility of higher-order convergence for the MDFEM.
(∂ ωu yu F u )(y u ) = (∂ ωu yu F (· u ))(y u ) when ∀j ∈ u : ω j ≥ 1, i.e., for ω u ∈ N |u| . First we move the derivative operator inside the explicit form for F u , see (13), and realize that F (· v ) is a function which is constant in all variables not in v, and as such if we have an ω j ≥ 1 for which j / ∈ v then the derivative (∂ ωu yu F (· v )) is zero. Since ω u ∈ N |u| we have ω j ≥ 1 for all j ∈ u and thus the only remaining term for v ⊆ u is the one for which v = u.
To show the third property, (17), which is (∂ ωu yu F u )(y * u ) = 0 when ∃j ∈ u : y * j = 0 and ω j = 0, we first note that here ω u ∈ N |u| 0 , i.e., ω j is allowed to be zero for j ∈ u. Therefore, suppose there is a j ∈ u such that ω j = 0, then we are actually not taking the partial derivative w.r.t. the jth variable, and by the definition of the partial derivative this means we keep the jth variable fixed while we take the derivatives w.r.t. the variables in supp(ω u ). Then, in the case that ω j = 0 we can do the evaluation at y * j before taking the other partial derivatives, i.e., (∂ ωu yu F u )(y * u ) = (∂ ωu yu (F u | yj =y * j ))(y * u\{j} ). Since j ∈ u we can use the first property (15) to deduce that F u | yj =y * j = 0 when y * j = 0.

A.2 Derivation of anchored Sobolev kernels and Taylor representation
We provide the derivation of the reproducing kernel of a univariate anchored Sobolev space w.r.t. a positive weight function ρ for anchored functions. In particular this shows how to obtain the kernel given in (33) for the anchored Gaussian Sobolev space for anchored functions. For α ∈ N the space consists of functions that have absolutely continuous derivatives up to order α − 1 for any bounded interval and have square integrable derivative of order α w.r.t. the weight function ρ and are anchored at 0. The domain is the support of ρ. These spaces are meant to be used for functions F u obtained by the anchored decomposition (13).
In accordance with [26,Example 4.4] and [12], the only constant function in our univariate anchored space is the zero function, and we can use the tensor product of this univariate kernel to create a multivariate kernel which represents multivariate anchored functions. A reproducing kernel K for a reproducing kernel Hilbert space H(K) has the property that K(x, y) as a function of y is a function in H(K) for any x in the domain. To satisfy this property, the kernel for the anchored Sobolev space on the unit cube for anchored functions in [26,Example 4.2] and [6, Section 5.2], should be amended to have the sum over the derivatives from 1 to α − 1 where the "0 otherwise" appears. Then it agrees with the kernel given here.
Our proof uses similar arguments as in [32,Section 1.2] where the kernel of the anchored Sobolev space over the unit cube is derived, but we modify the techniques for the case when the inner product contains a positive weight function ρ in the L 2 inner product of the αth derivatives, and when the functions are anchored.
Proposition 9. The reproducing kernel of the anchored Sobolev space for anchored functions H α,0,ρ (R), with ρ(t) = ρ(−t), e.g., the anchored Gaussian Sobolev space, with inner product is given by where 1{X} is the indicator function on X.
Proof. By Taylor's theorem, and using F (0) = 0, we have where we have written both forms as this helps to interpret how our kernel will operate in connection to the integral over R in the inner product. By the reproducing property we require where the derivatives of K α,0,ρ are taken with respect to the first variable.
Comparing the two representations of F leads us to choose the kernel K α,0,ρ such that and Since K α,0,ρ (·, y) itself needs to be a function from the space H α,0,ρ (R) we require that K α,0,ρ (0, y) = 0 for any y and its Taylor expansion with respect to the first variable is given by Therefore, inserting (84) and (85) into (86) we find for y ≥ 0 that which can be written as Similarly, for y ≤ 0 we have Hence, under the assumption that ρ(t) = ρ(−t), the explicit formula for K α,0,ρ can be written as We have stated Proposition 9 in a form which is relevant to the paper, but in fact the proof is more general. As stated, we do not require ρ to be the Gaussian density. The condition ρ(t) = ρ(−t) in the proposition is only stated to obtain an easy final expression and could be removed. The resulting kernel is valid for other positive weight functions ρ and we assume the domain of the integrals are then truncated to the support of ρ. E.g., ρ could be the uniform density on the unit cube and we then recover the anchored Sobolev space for anchored functions on [0, 1]. If we want to drop the requirement that F (0) = 0 then we add the constant one to the kernel and include the term for τ = 0 in the inner product. Such a kernel for ρ ≡ 1 over R is given in [30,Section 11.5.1]. The statements are also easily generalizable for an arbitrary anchor point, similar as the kernels in [26,Example 4.2] and [6,Section 5.2].
We take the multivariate space to be the tensor product of the univariate spaces and therefore the kernel is obtained as the product, see, e.g., [26,Example 4.4]. For anchored functions in such a multivariate anchored Sobolev space (not necessarily just the Gaussian case as we use in this paper) we have the following representation as a Taylor series with integral remainder term.
with Ω the support of the positive weight function ρ, is obtained from an anchored decomposition (13) of a function F . Then we have the representation where F The functions F Proof. Functions in H α,0,ρ,|u| (Ω |u| ) satisfy all requirements to use the Taylor theorem with integral remainder up to order α in each direction successively, i.e., they have absolutely continuous derivatives up to order α − 1 for any bounded interval. Hence where we remark that we wrote ν u ∈ {0 : α} |u| , i.e., ν j is allowed to be zero also for j ∈ u. We can now make use of Lemma 1. In particular, property (17) implies that any term for which there is at least one j ∈ u for which ν j = 0 will vanish. This proves (87).
To obtain the expression for F (ωu) u (y u ) for ω u ∈ {0 : α} |u| we work in a similar way by applying the Taylor theorem with integral remainder term to F (ωu) u for each ω j < α. For ω j = α we cannot apply the Taylor theorem anymore, so we need to take care that in our expression we can just recover the αth derivatives, i.e., we should not integrate those components for which ω j = α. Therefore we introduce the set w := {j ∈ u : ω j = α}. As in the expression for F u we gather the indices for which we need the integral in the set v which is now modified to exclude those indices in the set w. This proves (88).
By property (16) we also know that F (νu) u (y u ) = F (νu) (y u ) for ν u ∈ {1 : α} |u| and any y u so also for y u = (t v , 0 u\v ) ∈ R N u or y u = (t v , y w , 0 z ) ∈ R N u as in (87) and (88) respectively.

A.3 Proof of Proposition 3: norm embedding after mapping
We deferred the proof of Proposition 3 due to its length. We first show two lemmas that we need for the proof. The following result is taken from [7, Lemma 3].
Lemma 6. For F : R s → R having mixed partial derivatives for all τ ∈ {0 : α} s we have for any τ ∈ {0 : α} s that where the sum is over all ω ∈ N s 0 for which 0 ≤ ω j ≤ τ j for all j ∈ {1, 2, . . . , s}, c(τ , ω) := , with H τ , for τ ∈ N 0 , the τ th normalized probabilistic Hermite polynomial given by with I 0 the modified Bessel function of the first kind of order 0.
We are now ready to give the proof of Proposition 3 which states that for F ∈ H α,0,ρ,s (R s ) with α ∈ N and T ≥ 1/(2 √ 2), the function (F ρ) • T : [0, 1] s → R s belongs to H α,s ([0, 1] s ) and with C 1,α defined in (43). The proof starts along the lines of [7, proof of Lemma 4] but we need some extra work to arrive at the norm in the space H α,0,ρ,s (R s ) for which we will make use of Lemma 7. We remind the reader that F (y 1 , . . . , y s ) is actually some relabelling of F u (y u ) with |u| = s coming from an anchored decomposition and that H α,0,ρ,s (R s ) is the anchored Gaussian Sobolev space specifically for such functions having the properties listed in Lemma 1.
To complete the proof we still need to show (93). We are going to use the Taylor representation from Lemma 5 for the derivatives of F = F u , where in our current exposition {1 : s} is a relabeling of u. Thus, using (88), and with the understanding that F = F u , ω = ω u ∈ {0 : α} |u| and y = y u ∈ R |u| , we have, with some slight abuse of notation, Note that inside the integral we evaluate the function ) and v, w, z are pairwise disjoint with u = v ∪ w ∪ z since the above definitions are equivalent to where we used that ω j ≤ ν j ≤ α for the set w.
A.4 Interlaced polynomial lattice rules for H α,s ([0, 1] s ) The aim of this section is to show that interlaced polynomial lattice rules can achieve the almost optimal order of convergence for integration in the space H α,s ([0, 1] s ) defined in Section 3.1.2.
Interlaced polynomial lattice rules were also used in the setting of PDEs with random diffusion coefficient, but for the uniform case, i.e., with integrals directly expressible over the unit cube, in [8]. Here we map our integrals over the full space into the unit cube by the strategy described in Section 3. But the unanchored Sobolev space here is different from the unanchored Sobolev space considered in [8]. We adjust the analysis of [8] and [13] to show that the fast componentby-component construction algorithm as in [8] can also construct optimal interlaced polynomial lattice rules for our space H α,s ([0, 1] s ). We remind the reader that the inner product of our space was already given in (40) and (41). As explained in the introduction, we do not consider weighted function spaces, since the MDM already takes care to limit the number of dimensions for each subproblem.
Interlaced polynomial lattice rules are a modification of polynomial lattice rules to achieve higher-order convergence for integration over the unit cube in classes of Walsh spaces and weighted unanchored Sobolev spaces, see, e.g., [8,13]. The aim is to approximate multivariate integrals over the s-dimensional unit cube We need to introduce some necessary definitions. For simplicity we restrict ourselves to polynomial lattice rules over the finite field Z 2 . Let Z 2 [χ] denote the set of all polynomials over Z 2 and Z 2 [χ −1 ] denote the set of all formal Laurent series over Z 2 . For any m ∈ N let us define a mapping ϑ m : In the following we will identify any integer k ∈ {0, . . . , 2 m − 1}, having binary expansion k = κ 0 + κ 1 2 + · · · + κ m−1 2 m−1 , with the polynomial k(χ) = κ 0 + κ 1 χ + · · · + κ m−1 χ m−1 ∈ Z 2 [χ] and vice versa.
A QMC rule using this point set is called a polynomial lattice rule with generating vector q and modulus p.
To analyse the error we will make use of the dual of the point set. First we need to define vectors which have a specified support. Therefore, define for an integer vector k the function supp(k) := {j : k j = 0} where the index j ranges over the dimensions of k. To range over all s-dimensional vectors with support on the set u we write k u ∈ N s u .
In order to state a bound on the worst-case error we still need to introduce a weight function which measures the importance of the kth Walsh basis functions and which will provide a link to the space H α,s ([0, 1] s ). For k ∈ N with binary expansion k = κ 1 2 m1−1 + κ 2 2 m2−1 + · · ·+ κ v 2 mv−1 such that m 1 > m 2 > · · · > m v > 0 define and µ α (0) = 0. For k ∈ N s 0 we define µ α (k) := s j=1 µ α (k j ). We can now state a first bound on the worst-case error for interlaced polynomial lattice rules in the space H α,s ([0, 1] s ).
Proof. The reproducing kernel for the space H α,s ([0, 1] s ) is well known and can be found, e.g., in [4]. For our unweighted tensor product space it is as follows where B τ is the Bernoulli polynomial of degree τ ∈ N. We already gave the inner product for this space in (40) and (41). We will expand the kernel in a double Walsh series, see, e.g., [4,13]. For k, ℓ ∈ N s 0 the (k, ℓ)th Walsh coefficient is defined by wal ℓ (y (i ′ ) ).

Using [4, Lemma 14 and Equation
from which the result follows.
We are now in a similar situation as [8,Equation (3.30)] where a function E d (q) is defined which is equal to the upper bound in Proposition 10 with "modified weights", which in our case would be γ v := (2 α(α−1) C α ) |uα(v)|/2 , and after which a fast component-by-component construction algorithm is presented. In [8] the weights of the function space γ uα(v) are also present in γ v , but in the unweighted setting here they are all 1. Hence we can pull out the modified weights using γ v ≤ 2 α(α−1)s/2 which holds for all v ⊆ {1 : αs} since C α < 1, see [4, Table 1 for q = 2]. The following proposition now follows immediately from [8, Theorem 3.9] by using γ v = 1 for all v. We note that using the actual weights γ v would improve the result, but would not change the complexity for the MDFEM so we prefer this simpler result.
Combining Propositions 10 and 11 for d = αs we obtain Theorem 1 in the main text.