Reduced Basis Greedy Selection Using Random Training Sets

. Reduced bases have been introduced for the approximation of parametrized PDEs in applications where many online queries are required. Their numerical efficiency for such problems has been theoretically confirmed in Binev et al. ( SIAM J. Math. Anal. 43 (2011) 1457–1472) and DeVore et al. ( Constructive Approximation 37 (2013) 455–466), where it is shown that the reduced basis space 𝑉 𝑛 of dimension 𝑛 , constructed by a certain greedy strategy, has approximation error similar to that of the optimal space associated to the Kolmogorov 𝑛 -width of the solution manifold. The greedy construction of the reduced basis space is performed in an offline stage which requires at each step a maximization of the current error over the parameter space. For the purpose of numerical computation, this maximization is performed over a finite training set obtained through a discretization of the parameter domain. To guarantee a final approximation error 𝜀 for the space generated by the greedy algorithm requires in principle that the snapshots associated to this training set constitute an approximation net for the solution manifold with accuracy of order 𝜀 . Hence, the size of the training set is the 𝜀 covering number for ℳ and this covering number typically behaves like exp( 𝐶𝜀 − 1 /𝑠 ) for some 𝐶 > 0 when the solution manifold has 𝑛 -width decay 𝑂 ( 𝑛 − 𝑠 ). Thus, the shear size of the training set prohibits implementation of the algorithm when 𝜀 is small. The main result of this paper shows that, if one is willing to accept results which hold with high probability, rather than with certainty, then for a large class of relevant problems one may replace the fine discretization by a random training set of size polynomial in 𝜀 − 1 . Our proof of this fact is established by using inverse inequalities for polynomials in high dimensions.


Introduction
Complex systems are frequently described by parametrized PDEs that take the general form ( , ) = 0.
(1.1) Here = ( ) =1,..., is a vector of parameters ranging over some domain ⊂ R and = ( ) is the corresponding solution which is assumed to be uniquely defined in some Hilbert space for every ∈ .
Throughout this paper, we denote by ‖ · ‖ and ⟨·, ·⟩ the norm and inner product of . In what follows, we assume that the parameters have been rescaled so that = [−1, 1] . Here is typically large and in some cases = ∞. We seek results that are immune to the size of , i.e., are dimension independent. Various reduced modeling approaches have been developed for the purpose of efficiently approximating the solution ( ) in the context of applications where the solution map ↦ → ( ), (1.2) needs to be queried for a large number of parameter values ∈ . This need occurs for example in optimal design or inverse problems where such parameters need to be optimized. The strategy consists in first constructing in some offline stage a linear space of hopefully low dimension , that provides a reduced map that approximates the solution map to the required target accuracy for all queries of ( ). The reduced map is then implemented in the online stage with greatly reduced computational cost, typically polynomial in . As opposed to standard approximation spaces such as finite elements, the spaces are specifically designed to approximate the image of , i.e., the elements in the parametrized family This optimal space is computationally out of reach and the above quantity should be viewed as a benchmark for more practical methods. Here denotes again the -orthogonal projection onto for any subspace of . One approach for constructing a reduced space , which comes with substantial theoretical footing and will be instrumental in our discussion, consists in proving that the solution map is analytic in the parameters and has a Taylor expansion ( ) = ∑︁ ∈ℱ , (1.6) where := ∏︀ =1 and ℱ := { ∈ N }. In the case of countably many parameters, ℱ is the set of finitely supported sequences = ( ) ≥1 with ∈ N. One then proves that the sequence ( ) ∈ℱ of Taylor coefficients has some decay property. Two prototypical examples of results on decay are given in [1,7] for the elliptic equation where the diffusion coefficient function has the parametrized form for some given functions and ( ) ≥1 . These results (see e.g. Thms. 1.1 and 1.2 in [1]) show that, under mild decay or summability conditions on the functions , one has that the sequence (‖ ‖ ) ∈ℱ is in ℓ for certain < 1 with a bound on the ℓ quasi-norm. It then follows that for each , there is a set Λ ⊂ ℱ with #(Λ ) = such that sup Similar results have been obtained for more general models of linear and nonlinear PDEs, see in particular [5,6], using orthogonal expansions into tensorized Legendre polynomials in place of Taylor series. Therefore the space := span{ : ∈ Λ } has dimension with an a priori bound − on its approximation error for all members of the solution manifold ℳ. One choice of Λ giving (1.9) is the set of indices corresponding to the largest ‖ ‖ . Further analysis Cohen and Migliorat [8] shows that the same convergence estimate can be obtained imposing in addition that the sets Λ are downward closed (or lower sets), i.e. having the property ∈ Λ and ≤ =⇒ ∈ Λ , (1.10) where ≤ is to be understood componentwise. We stress that the rate of decay − in the bound (1.9) may be suboptimal compared to the actual rate of decay of the -width (ℳ). The present paper is concerned with another prominent reduced modeling strategy known as the Reduced Basis Method (RBM). In this approach [10,11,14,15], particular snapshots = ( ), = 1, 2, . . . , (1.11) are selected in the solution manifold and the space is defined by A certain greedy procedure has been first proposed in [12] and analyzed in [3] for selecting these snapshots. It was shown in [2,9] that the approximation error provided by the resulting spaces has the same rate of decay (polynomial or exponential) as that of . In this sense, the method leads to reduced models with optimal performance, in contrast to sparse polynomial expansions.
In its simplest (and idealized) form, the greedy procedure can be described as follows: at the initial step, one sets 0 = {0}, and given that has been produced after steps, one selects the new snapshot by (1.14) Each greedy step thus amounts to maximizing over the parameter domain . While in this precise form the scheme cannot be realized in practice an important modification of this greedy selection, known as the weak greedy algorithm, allows the selection to be done in a a practically feasible manner while retaining the same performance guarantees, see Section 2 below. The optimization in the greedy algorithm is typically performed by replacing at each step by a discrete training set˜. In order to retain the performance guarantees of the greedy algorithm, this discretization should in principle be chosen fine enough so that the solution map ↦ → ( ) is resolved up to the target accuracy > 0, that is, the discrete set̃︁ is an -approximation net for ℳ.
Although performed in the offline stage, this discretization becomes computationally problematic when the parametric dimension is either large or infinite, due to the prohibitive size of this net as → 0. For example, in the typical case when the Kolmogorov width of ℳ decays like ( − ) for some > 0, we can invoke Carl's inequality [13] to obtain a sharp bound −1/ for the cardinality ofM and˜. This exponential growth drastically limits the possibility of using -approximation nets in practical applications when the number of involved parameters become large.
There is a preference toward the use of the greedy constructions over the Taylor expansion constructions because they guarantee error decay comparable to the decay of Kolmogorov widths while the Taylor polynomial constructions do not provide any such guarantee. Therefore, it is of interest to understand whether the apparent impediment of requiring such a fine discretization of the solution manifold can somehow be avoided or significantly mitigated. The main result of this paper is to prove that this is indeed the case provided that one is willing to accept error guarantees that hold with high probability rather than with certainty. Our main result shows that a target accuracy can generally be met with high probability by searching over a randomly discretized set˜whose size grows only polynomially in −1 rather than exponentially, in contrast to -approximation nets.
The paper is organized as follows. In Section 2, we elaborate on the weak form of the greedy algorithm, which is used in numerical computation, and recall some known facts on its performance and complexity. In Section 3, we use properties of downward closed polynomial approximation to show how a random samplingp rovides an approximate solution of the optimization problem engaged at each step of the greedy algorithm. In Section 4, we formulate our modification of the greedy algorithm based on such random selection and then analyze its performance. Finally, we illustrate in Section 5 the validity of the randomized approach by some numerical tests performed in parametric dimensions up to = 64 for which the size of -approximation nets become computationally prohibitive.

Performance and complexity of reduced basis greedy algorithms
The greedy selection process described in the introduction is not practically feasible, due to at least three obstructions: (1) Given a parameter value , the snapshot ( ), in particular, the generators of the reduced spaces cannot be exactly computed.
(2) For a given ∈ , the quantity ( ) to be maximized cannot be exactly evaluated.
(3) The map ↦ → ( ) is non-convex/non-concave and therefore difficult to maximize, even if it could be exactly evaluated.
The first obstruction can be handled when a numerical solver is available for computing an approximation of ( ) to any prescribed accuracy ℎ > 0, that is, such that Here ℎ > 0 is a space discretization parameter: typically, ℎ belongs to a finite element space ℎ of meshsize ℎ and (possibly very large) dimension ℎ . The selected reduced basis functions are now given by = ℎ ( ) ∈ ℎ and therefore the reduced basis space is a subspace of ℎ . Whenever the -widths decay much faster than the approximation order provided by ℎ the reduced space has typically much smaller dimension than ℎ , that is This yields substantial computational savings when using the reduced basis discretization in the online stage. Note that this numerical solver allows us in principle to also handle the second obstruction: we could now perform the greedy algorithm by maximizing at each step the quantity which, in contrast to ( ), can be exactly computed and satisfies In other words, the greedy algorithm is applied on the approximate solution manifold While the quantity ,ℎ ( ) can in principle be computed exactly, the complexity of this computation depends, at least in a linear manner, on the dimension ℎ = dim( ℎ ), which is typically much higher than . Substantial computational saving may still be obtained when maximizing instead an a posteriori estimator ,ℎ ( ) of this quantity that satisfies The computation of ,ℎ ( ) for a given ∈ is based in particular on replacing the orthogonal projection ℎ ( ) by a Galerkin projection. It, in turn, does not require the computation of ℎ ( ) and entails a computational cost ( ) depending on the small dimension , typically in a polynomial way, rather than on the large dimension ℎ . We refer to [6] for the derivation of a residual-based estimator ,ℎ ( ) having these properties in the case of elliptic PDEs with affine parameter dependence.
Maximizing the a posteriori estimator ,ℎ ( ) amounts to applying to ℳ ℎ a so called weak-greedy algorithm, where +1 now satisfies with parameter := / ∈ ]0, 1[. For such an algorithm, it was proved in [2,9] that any polynomial or exponential rate of decay achieved by the Kolmogorov -width is retained by the error performance for this algorithm. More precisely, the following holds, for any compact set in a Hilbert space , see [6].
where := 2 4 +1 0 . For any 0 , 0 > 0 and > 0, we have Remark 2.2. If the same rates of in the above theorem are only assumed within a limited range 0 ≤ ≤ * , then the same decay rates of are achieved for the same range 0 ≤ ≤ * , up to some minor changes in the expressions of the constants 1 and 1 , independently of * . Remark 2.3. The reduced basis algorithm aims to construct an -dimensional linear space that is tailored for the approximation of all solutions that constitute the solution manifold ℳ. Therefore, its performance is always bounded from below by the -width = (ℳ). Assuming some rate of decay on , at least polynomial, is thus strictly necessary if we want the reduced basis method to converge at such a rate. As discussed in the introduction, in the high dimensional parametric context, such rate can be rigourously established for certain instances of linear elliptic PDE's with affine parametrization of the diffusion function. A more general approach applicable to nonlinear PDE's and non-affine parametrizations for proving such rates is given in [6]. On the other hand, it is known that (ℳ) has poor decay for certain categories of parametrized problems. This includes, in particular, transport dominated problems with sharp transition locus that varies with the value of the parameter . Such problems are thus intrinsically not well tailored to reduced basis methods.
The additional perturbations due to the numerical solver and the a posteriori error indicator can thus be incorporated in the analysis of the reduced basis algorithm. If > 0 a posteriori is our final target accuracy, we set the space discretization parameter ℎ so that ℎ = 2 where ℎ is the space discretization error bound in (2.2). We then apply the greedy selection on ℳ ℎ based on maximizing ,ℎ ( ) until we are ensured that ,ℎ ( ) ≤ 2 for all ∈ . The target accuracy ( ) ≤ is thus met for all ∈ in view of (2.5).
Note that a decay rate (ℳ) ≤ ( ) for some decreasing sequence ( ) implies a comparable rate (ℳ ℎ ) ≤ 2 ( ), for the range ≤ * where * is the largest value of such that ( ) ≥ ℎ . Therefore, using Remark 2.2 in conjunction with Theorem 2.1 applied to ℳ ℎ , we obtain an estimate on the number of greedy steps ( ) that are necessary to reach the target accuracy .
The difficulty in item 3 is the most problematic one, in particular when the parametric variable is highdimensional, and is the main motivation for the present work. Since the quantities ( ), ( ), ,ℎ ( ) and ,ℎ ( ) may have many local maxima, continuous optimization techniques are not appropriate. A typically employed strategy is therefore to replace the continuous optimization over by its discrete optimization over a training set̃︀ ⊂ of finite size. This amounts to applying the greedy or weak-greedy algorithm to the discretized manifold̃︁ ℳ defined by (1.16) or, more practically, to its approximated versioñ︁ On a first intuition, the discretization should be sufficiently fine so that the manifold ℳ ℎ is resolved with accuracy of the same order as the target accuracy . Recall that if is a compact set in some normed space, a finite set is called a -net of if ⊂ ⋃︁ ∈ ( , ), (2.14) that is, any ∈ is at distance at most from some ∈ . The perturbation of the greedy algorithm due to this discretization can be accounted for jointly with the previously discussed perturbation, namely finite element approximation and a posteriori error estimation. Assuming for example that̃︁ ℳ ℎ is a /3-net of ℳ ℎ , we set the space discretization parameter ℎ so that ℎ = 3 . We then apply the greedy selection oñ︁ ℳ ℎ based on maximizing¯, ℎ ( ) over̃︀ until we are ensured that ,ℎ ( ) ≤ 3 for all ∈̃︀ . By the covering property, we have ,ℎ ( ) ≤ 2 /3 for all ∈ , and therefore the target accuracy ( ) ≤ is met for all ∈ . In addition, since (̃︁ ℳ ℎ ) ≤ (ℳ ℎ ) , the statement of Corollary 2.4 remains unchanged for this discretized algorithm.
The main problem with this approach is that the current estimates on the size of an -net of ℳ or ℳ ℎ become extremely large as → 0, especially in high parametric dimension.
A first natural strategy to generate such an -net would be to apply the solution map ↦ → ( ) to an -net for in a suitable norm, relying on a stability estimate for this map. For example, in the simple case of the elliptic PDE (1.7) with parametrized coefficients (1.8), one can easily establish a stability estimate of the form under the minimal uniform ellipticity assumption ∑︀ ≥1 | | ≤ min − for some > 0, with depending on min and . Thus, one possible -net of ℳ or ℳ ℎ in the norm is induced by a −1 -net̃︀ of in the ℓ ∞ norm. However, the size of such a net scales like with the parametric dimension , therefore suffering from the curse of dimensionality. In the case = ∞, one would have to truncate the parametric expansion (1.8) for a given target accuracy , resulting into an active parametric dimension ( ) < ∞. Assuming a polynomially decaying error ‖ ∑︀ > | |‖ ∞ < ∼ − , the growth of ( ) as → 0 is in ( −1/ ) resulting in a training set of size scaling like which is extremely prohibitive. One sharper way to obtain an estimate independent of the parametric dimension is to use a fundamental result that relates covering and widths. We define the entropy number := (ℳ) as the smallest value of > 0 such that there exists a covering of ℳ by 2 balls of radius . Then, Carl's inequality [13] states that for any > 0, ( + 1) ≤ sup (

2.19)
While this estimate is more favorable than (2.17), it is still prohibitive. Moreover, the construction of such an -net, as in the proof of Carl's inequality, necessitates the knowledge of the approximation spaces that perform with the -width accuracy − which is precisely the objective of the greedy algorithm.
The computational cost at each step of the offline stage is determined by the product between #(̃︀ ) and the cost ( ) of evaluating the error bound ,ℎ ( ) for an individual ∈̃︀ . Therefore, the prohibitive number of error bound evaluations is the limiting factor in practice and poses the main obstruction to the feasibility of certified reduced basis methods in the regime of polynomially decaying -widths, and hence in particular, in the context of high parametric dimension.
In what follows, we show that this obstruction can be circumvented by not searching for an -net of ℳ but rather defining̃︀ by random sampling of . This approach allows us to significantly reduce the size of training sets used in greedy algorithms while still obtaining reduced bases with the same guarantee of performance at least with high probability. In order to keep our arguments and notation as simple and clear as possible, we do not consider the issue of space discretization and error estimation, assuming that we have access to ( ) for each individual ∈ . As just described, a corresponding finer analysis can incorporate the perturbation of using instead ( ), ,ℎ ( ) or ,ℎ ( ), with the same resulting overall performance.

Polynomial approximation
Let := span{ 1 , . . . , } be the reduced basis space at the -th step of the weak greedy algorithm. The next step of the greedy algorithm is to search over to find a point ∈ where (in practice¯, ℎ ( )) is large, hopefully close to its maximum over . In this section we show that random sampling gives a discrete set̃︀ , of moderate size, on which the maximum of ( ) can be compared with the maximum of ( ) over all of with high probability. To obtain a result of this type we use approximation by polynomials.
Recall that Λ ⊂ ℱ is said to be a downward closed set if whenever ∈ Λ and ≤ , then ∈ Λ, where ≤ is to be understood componentwise. To such a set Λ, we associate the multivariate polynomial space We define Several foundational results in parametric PDEs prove that the solution map ↦ → ( ) belongs to classes , as already mentioned in our introduction. An important observation to us is that whenever ∈ and is a finite dimensional subspace of then both and − are also in . For example, for any downward closed set Λ ⊂ ℱ and an approximation ( ) = ∑︀ The next result shows that when a function belongs to the class , its maximum over a random set of point︀ is above a fixed fraction of its maximum over with some controlled probability. Proof. From the definition of , there exists a downward closed set Λ with #(Λ) = and a -valued polynomial ∈ Λ such that We use the Legendre polynomials to represent . We denote by ( ) ≥0 the sequence of univariate Legendre polynomials normalized in 2 ([−1, 1], d 2 ). Their multivariate counterparts are an orthonormal basis on 2 ( , ), where is the uniform probability measure on . We write in its Legendre expansion where the coefficients are elements of . We next invoke a result from [4] which says that for any downward closed set Λ, one has max Thus, it follows from the Cauchy-Schwartz inequality that for any ∈ , Inserting this into (3.17) gives In other words, Suppose now that̃︀ is a set formed by independent draws with respect to the uniform measure on . The probability that none of these draws is in is at most (1 − 3 4 2 ) . So, with probability greater than Accordingly, with at least the same probability, we have from (3.12) which proves the lemma.
Remark 3.2. Intuitively, the above proof relies on the fact that is close to a polynomial , and that the ∞ norm of on a sufficiently fine discrete set˜is comparable to its ∞ norm on the continuous domain . A general line of research is to look for equivalences between discrete and continuous norms for a given -dimensional space of functions, thus of the form Such results are refered to as Marcinkiewicz-type discretization theorems, see [16] for a recent survey. In several settings where consists of algebraic or trigonometric polynomials, it is known that random sampling for yields such inequalities with high probability at a sampling budget #(˜) that grows polynomially, and sometimes linearly, with . Note that the inequality (3.15) is used in [4] to show that for 2 norms and downward closed polynomials spaces in any dimension, the norm equivalence (3.23) holds with high probability for random samples of cardinality (#(Λ) 2 ) up to logarithmic factors.

The main result
We are now in position to formulate our main result. We suppose that we are given an error tolerance and we wish to use a greedy algorithm to construct a space such that with high probability, say probability greater than 1 − , we have dist(ℳ, ) ≤ , (4.1) with hopefully small and the off-line complexity also acceptable. We assume that the solution map ↦ → ( ) belongs to for some > 2 and that we have an upper bound 0 for ‖ ‖ . This assumption is known to hold in a great variety of settings of parametric PDEs, see [6].
Given and the user prescribed , we first define as the smallest integer such that 32 0 − +2 ≤ and 2 4 +2 − ≤ 1.
We then define as the smallest integer such that We consider the following greedy algorithm for finding a reduced basis. In the first step, we make independent draws of the parameter according to the uniform measure . This produces a set̃︀ 0 of cardinality . We then use 0 := ( 0 ), is only approximated as described in Section 2. However, we do not incorporate these facts in the analysis that follows in order to simplify the presentation. Letˆ: = max ∈̃︀ ( ), be the computed error and the true error for approximation of ℳ by . We terminate the algorithm at the smallest integer ≤ 2 for whichˆ≤ 8 . If this does not occur before 2 steps we then terminate after step = 2 has been completed. The is the output of the algorithm. and requires ( ) error bound evaluations, where The constants in the above bounds depend only on ( , , 0 , 0 ).

Remark 4.2.
Note that the assumption ∈ implies in particular that the Kolmogorov -width of ℳ decays at least like − and therefore we may assume that ≥ in the above theorem, although this is not used in the proof.
Proof. We first show that with probability greater than 1 − , the algorithm produces at each step ≤ a snapshot = ( ) which realizes a weak greedy algorithm, applied over all of , with parameter := 1 8 . Indeed, for any , let ( ) = ( ) − ( ) be the error function at the step after is defined. As shown in the previous section, ∈ and ‖ ‖ ≤ 0 . Since the algorithm has not terminated, we have where the last inequality is the first condition in (4.2). Therefore, we can apply Lemma 3.1 to and find that with probability greater than 1 − (︁ 1 − 3 4 2 )︁ , and thus from (4.3) with probability greater than 1 − 2 , we haveˆ= max (4.10) This means that with this probability the function +1 is a selection of the weak greedy algorithm with parameter . Since, the draws are independent and there are at most 2 sets̃︀ , the union bound implies that with probability at least 1 − , the sequence 1 , . . . , is a sequence that is the realization of the weak greedy algorithm with this parameter. For the remainder of the proof, we put ourselves in the case of favorable probability.
Now consider the termination of the algorithm. If < 2 , then ≤ 8ˆ≤ , (4.11) and so dist(ℳ, ) ≤ . We now check the case = 2 . Since by assumption, the solution map belongs to , we know that From the estimates on the performance of the weak greedy algorithm given in Theorem 2.1, we thus know that where we have used the product of the two conditions in (4.2). Hence at step = 2 we have dist(ℳ, ) ≤ . Therefore, we have completed the proof of (i). We next prove (ii). So assume that (ℳ) ≤ 0 max{1, } − , for some > 0. Then, according to Theorem 2.1, (4.14) It follows that the numerical algorithm will terminate at a ( ) with ( ) ≤ where is the smallest integer that satisfies Using the first condition in (4.2), this leads to the estimate (4.7) with multiplicative constant := 2 4+ 10 1/ 0 (32 0 ) 3 ( −2) . Finally, to execute the algorithm, we will need to draw ( ) sets (̃︀ 0 , . . . ,̃︀ −1 ), each of them of size . The total number of error bound evaluation is thus From the definition of in (4.3) we derive that Using the first condition in the definition (4.2) of , this leads to where depends on and 0 . Combining this with (4.7), we obtain (4.8), which concludes the proof of (ii). Remark 4.3. The above theorem can be improved by sampling according the tensor product Chebychev measure (3.24), in view of Remark 3.3. Here, we require that the solution map ↦ → ( ) belongs to for some > 2 := ln 3 ln 2 . We then define as the smallest integer such that 32 0 − +2 ≤ and 2 4 +2 −(2 −1) ≤ 1, (4.20) and as the smallest integer such that We terminate the algorithm at the smallest integer ≤ 2 for whichˆ≤ 8 . With the exact same proof, we reach the statement as Theorem 4.1, however with a number of step  Let us comment on the difference in performance between the above algorithm using random sampling and the greedy algorithm based on using an -net for the solution manifold as a training set. We aim at a target accuracy , and assume that the -widths of the solution manifold decay like (ℳ) ≤ − . Then, the approach based on an -net constructs a reduced basis space of optimal dimension but the total number of error bound evaluation is now of the order where is the probability of failure. In summary, while our approach allows for a dramatic reduction in the offline cost, it comes with a loss of optimality in the performance of reduced basis spaces since ( ) scales with −1 with an exponent larger than 1 . In particular, this affects the resulting online cost. Inspection of the proof of the main theorem reveals that this loss comes from the fact that the greedy selection from the random set can only be identified with a weak-greedy algorithm with a parameter = 1 8 which instead of being fixed becomes small as grows, or equivalently as decreases. Let us still observe that the above perturbation of 1 by 3 ( −2) becomes negligible as gets larger. We have also seen that this perturbation can be reduced to This leaves open the question of finding a sampling strategy for the training set which leads to reduced basis of optimal complexity ( ) ∼ −1/ and where the number of error bound evaluation in the offline stage remains polynomial in −1 .

Numerical illustration
The results that we have obtained in the previous section can be rephrased in the following way: a polynomial rate of decay of the error achieved by the greedy algorithm in terms of reduced basis space dimension can be maintained when using a random training set of cardinality that scales polynomially in . More precisely, in view of (4.26) and (4.27), a sufficient scaling is independently of the parametric dimension . Note that we obviously have In this section, we illustrate these findings through the following numerical test: we consider the elliptic diffusion equation (1.7)  where { 1 , . . . , } is a uniform partition of into = × squares of equal size. We take = 8 and therefore = 64, and take = − , for some > 0. The effect of taking larger is to raise the anisotropy of the parametric dependence. The results from [1] show that this is directly reflected by a rate − of sparse polynomial approximation of the parameter to solution map, for any < − 1 2 , and therefore on the rate of decay of the -width (ℳ). The effect of taking closer to 0 is to make the problem more degenerate as 1 gets close to −1.
We test the performance of the reduced basis spaces generated by the greedy algorithm using random training sets̃︁ ℳ of cardinality = ( ) = ⌊ ⌋, (5.4) for some ≥ 1. Note that the case = 1 amounts in selecting the reduced basis element completely at random since all the elements from̃︁ ℳ are necessarily chosen. We expect the performance to improve as becomes larger since we then perform a particular selection of the reduced basis elements within ℳ through the greedy algorithm. On the other hand, our theoretical results indicate that a fixed value = * should be sufficient to ensure that the algorithm performs almost as good as if the selection process took place on the whole of ℳ.
In our numerical test, we took = 10 −2 , that is¯= 1.01. As to the parametric dimension, we considered = 16 and = 64 that correspond to subdivisions of into 4 × 4 and 8 × 8 subdomains, respectively. As to the decay of , we test the values = 1 and = 2. Finally for the growth of the training sample size ( ), we test the values = 1, 1.25, 1.5, 1.75, 2. (5.5) The error curves of the reduced basis approximation, averaged over 20 realizations of the random training sample, for the various choices of , and , are displayed on Figures 1 and 2. The reduced basis approximation error has been computed by using separate test sample of cardinality similar to that of the training sample.
As predicted by the theoretical approximation results, the convergence rate of the reduced basis method improves as gets larger. The observed convergence rates (for the highest value of ) are closer to the value than to − 1/2 which can be rigourously established from the polynomial approximation rates. This reflects the fact that the reduced basis approximations perform generally better than sparse polynomial approximations.  We also note that, for the same value of , the errors are smaller in the higher parametric dimension = 64 than for = 16. This apparent paradox can be explained: in both cases, the most active variables are the first ones ( 1 , then 2 , . . .) yet they are associated to domains of smaller size in the high dimensional case, therefore having less impact on the variation of the solution with these variables.
As expected, we observe that the error curve behaves better as we increase the value of but we observed that this phenomenon stagnates at = 2. This hints that the scaling ( ) = 2 is practically sufficient to ensure in this case the optimal convergence behaviour which would be met with a very rich training set, for example an -net. The value = 2 is much smaller than the value * given by the theoretical analysis which is thus too pessimistic.