Goal-oriented error estimation for parameter-dependent nonlinear problems

. The main result of this paper gives a numerically eﬃcient method to bound the 5 error that is made when approximating the output of a nonlinear problem depending on a unknown 6 parameter (described by a probability distribution). The class of nonlinear problems under considera-7 tion includes high-dimensional nonlinear problems with a nonlinear output function. A goal-oriented 8 probabilistic bound is computed by considering two phases. An oﬄine phase dedicated to the com-9 putation of a reduced model during which the full nonlinear problem needs to be solved only a small 10 number of times. The second phase is an online phase which approximates the output. This ap-11 proach is applied to a toy model and to a nonlinear partial diﬀerential equation, more precisely the 12 Burgers equation with unknown initial condition given by two probabilistic parameters. The savings 13 in computational cost are evaluated and presented. 14


Introduction
Numerical simulation is a key component of numerous domains: industry, environment, engineering, physics for instance. In some cases time is the limiting factor, and the numerical simulation should be very fast and accurate. For example, the control of the trajectory of a space satellite may require efficient real-time computations. Another example would be the iterative optimization algorithm used in numerical weather prediction, which requires numerous calls to a numerical atmosphere model, to be performed in a limited time. In both examples, the computing time is a key factor: it must be very short, either because the computation is repeated many times in a relatively short interval (many-query context) or because the result cannot wait (real-time context).
In this context, it is crucial to provide fast numerical evaluations for linear or nonlinear problems. These procedures are generally called "metamodelling", "model reduction", "dimension reduction". It consists in replacing the existing model, called the "full" model, by a fast approximation. There exist both stochastic and deterministic approaches to building such approximations. On the stochastic part we can mention polynomial error bound. Some numerical experiments are presented in Section 4, first with a linear transport equation, then with the nonlinear Burgers partial differential equation. Section 5 contains some concluding remarks and Appendix A collects the proof of some intermediate results.

Problem statement
Let P ⊂ R d denote a parameter space, and let P be a probability distribution on P. Let X (resp. Y ) be vector space of dimension N (resp. S) endowed with a scalar product ·, · X (resp. ·, · Y ). In the following, when there is no ambiguity, the dependence in the vector space for the scalar product will be omitted in the notation ·, · . Let us consider a nonlinear function M : P × X → Y . Given a parameter µ ∈ P, we denote by u(µ) ∈ X a solution to the equation: M(µ, u(µ)) = 0, (2.1) and we define the output by for a given ∈ X.
We assume that for every µ ∈ P, equation (2.1) admits a unique solution in X, so that the application s : P → R is well-defined.
In a many-query context, that is in a context requiring a potentially large number of evaluations of the output, it is common to call for model reduction. More precisely, let X be a subspace of X, of dimension N such that N N . We consider u : P → X an approximation (in a very wide sense of the term) of u : P → X. Let us define the approximate output s(µ) by s(µ) = , u(µ) X .
For sake of efficiency, the computation of the approximate output can be split into two phases: • an offline phase, dedicated to the construction of the reduced model u, during which one has to solve the full dimensional problem (2.1) only for a reasonably small number of parameters µ 1 , . . . , µ κ ; • an online phase, during which we evaluate the approximate output s(·) = , u(·) X for all queried µ.
In practice, for any µ ∈ P, the computational time of u(µ) is much smaller than the one of u(µ), hence this splitting into offline and online phases can be interesting in terms of overall computing time: the offline phase can be computationally expensive, provided that the number of queries is large enough and/or the online phase per query is fast enough.
In this article, we will not focus on the ways of constructing efficient offline-online approximation procedures for u(µ), as in e.g., [11,16,21]. Assumptions on the approximation procedure in use are very mild (see Sect. 3.5 and more specifically Lem. 3.11). Under these mild assumptions, we propose hereafter a new procedure to compute efficiently, using an online / offline decomposition, a goal-oriented probabilistic error bound (µ; α) which generalizes the error bound described in [9] (see also [10] for further results in control theory).

Probabilistic nonlinear error bound
In this section, we aim at providing a goal-oriented probabilistic error bound on the output. In [9], the authors propose such an error bound in the linear context, that is assuming that for any µ ∈ P, the operator M(µ, ·) : X → Y is affine (linear operator + a constant), and that the output is also linear. In the sequel we will call linear this case, as opposed to the nonlinear case where the model is not affine.
By accepting a small risk α ∈ (0, 1) that this bound could be violated, the authors avoid the use of (often pessimistic) Lipschitz bounds. In this section, we extend the results in [9] to the cases where the operator M(µ, ·) : X → Y is nonlinear. In Section 3.2, the output is assumed to be linear, then in Section 3.3, the output may be nonlinear.
To derive an error bound, it seems natural to consider the so-called residual In the sequel we explain why we need to define a new adjoint. To do so we recall the computations of the linear case, in order to draw the parallel with the nonlinear case and motivate the need for a new adjoint definition. In the linear case, M(µ, u) = A(µ)u − b(µ) where b(µ) ∈ Y is a given vector and A(µ) : X → Y is a linear operator which is assumed to be invertible. In the following, A(µ) denotes the transpose of A(µ). We can define w(µ) ∈ Y as the solution of the so-called dual problem: where ∈ X is used in the definition of the linear output in (2.2), and with M (µ, ·) the linear adjoint of M(µ, ·). Let Φ = {φ 1 , . . . , φ S } denote any orthonormal basis of Y . We then have In order to adapt this procedure to the nonlinear context, we need to define a generalization of the adjoint M for nonlinear operators, that still allows (3.3). In Section 3.1, we will define M : P × X × X × Y → X, and the associated adjoint problem for w(µ): which generalizes (3.2).

Finite difference adjoint of an operator
The definition of the adjoint problem for nonlinear operators is not unique. In order to generalize (3.3) for nonlinear problems, we require a specific property, which leads us to choose the following definition: Definition 3.1 (Finite difference adjoint). We call finite difference adjoint any operator linear in the last variable, such that the following identity holds: ∀µ ∈ P, ∀x 1 , x 2 ∈ X, ∀y ∈ Y, Let us underline that previous definitions of nonlinear adjoint do not readily allow for this property, such as, e.g., the one offered by Definition 2.1 in [20]: In our case the dependence in both x 1 , x 2 is crucial, and missing in previous definitions. Indeed, as we will see later on in equation (3.10) we do need a dependence in u and u, and not only in u − u as in the linear case. A similar trick can be found in [2] (p. 12), formalized below in Proposition 3.2, in which we propose a formula for finite difference adjoint candidate.
Proposition 3.2. Assume that the operator M : P × X → Y is continuously Fréchet-differentiable with respect to the second variable. Let dM(µ, x) : X → Y denote the Fréchet-derivative of M with respect to x ∈ X at (µ, x). Let dM (µ, x) : Y → X denote the (linear) adjoint of dM(µ, x). We now denote Proof of Proposition 3.2. The proof is postponed to Appendix A.1.
From now on, we will choose formula (3.6) for the adjoint of M. The following properties are immediate: 2. For all µ ∈ P, and for all x 1 , x 2 ∈ X, M (µ, x 1 , x 2 , ·) is linear.
Let us now consider the adjoint problem described by (3.4): Find w(µ) solution of M (µ, u(µ), u(µ), w(µ)) = . (3.7) This problem is linear. Let us assume that, for all µ ∈ P, it admits a solution. We have then the following lemma: Assume that the operator M : P × X → Y is continuously Fréchet-differentiable with respect to the second variable, and let M * be defined by (3.6). Let s(µ) = , u(µ) s(µ) = , u(µ) , r(µ) be defined in (3.1), let w(µ) be a solution of (3.7) and let {φ 1 , . . . , φ S } denote any orthonormal basis of Y . Then it holds Proof of Lemma 3.3. Proposition 3.2 applies and Item 2 after Proposition 3.2 claims that M is linear in its fourth argument, thus the adjoint problem described in (3.4) is linear. We assume that for all µ ∈ P it admits a solution w(µ).
Following the beginning of the proof of Theorem 1.1 in [9], we expand the residual in the basis Φ: Then: As w(µ) is solution of (3.4), we get: Then, applying Identity (3.5) we obtain: Finally, recalling (3.9), and as the basis Φ is orthonormal, we get (3.8).

Probabilistic error bound for a nonlinear model with linear output
This section is devoted to the statement of our probabilistic error bound, in the context where the model is nonlinear and where the output is linear. We now introduce some notation necessary to the statement of our bound. Recall that Φ = {φ 1 , . . . , φ S } denotes any orthonormal basis of Y . Let K ≤ S be a "truncation index". For any i ∈ {1, . . . , K}, we define: The probabilistic error bound depends on the residual defined by (3.1): r(µ) = M(µ, u(µ)) − M(µ, u(µ)) = M(µ, u(µ)).
Our aim is to propose a probabilistic upper bound for |s( u(µ)) − s(u(µ))|. For this, let us consider the righthand term in (3.8): In order to bound this term, it seems natural to define, for any µ ∈ P, and for any 1 ≤ i ≤ S: We then introduce a truncation argument, K. Our aim is then to bound the truncated sum and We refer to Section 3.5 for the computation of T 1 (µ, K, φ). Let us just underline that it will require, during the online phase, the computation of the K scalar products r(µ), φ i Y , i = 1, . . . , K. Under additional assumptions (see Lem. 3.11), it is possible to compute each of these scalar products with a cost independent from the full dimension S. However, we still need to choose a truncation argument K independent from S to keep the cost of the online phase independent from S. Note also that during the offline phase, 2K optimization problems have to be solved for the computation of β min Thus if we increase K, we also increase the cost of the offline phase. To deal with the remainder of the sum, we define: with E µ denoting the expectation with respect to the probability distribution of µ.
Our main result is then: where the error bound (µ; α) is defined by Remark 3.5. The result of Theorem 3.4 is a generalization of Theorem 1.1 in [9] to nonlinear operators M.
The result of Theorem 3.4 is true for any orthonormal basis Φ of Y . For efficiency reasons, we would like to choose Φ so that the parameter-independent part T 2 (K, Φ) is the smallest possible, for a fixed truncation index K ∈ N * .
To our knowledge, minimizing T 2 (K, Φ) over orthonormal bases of Y is an optimization problem for which no efficient algorithm exists. However, we can minimize an upper bound of T 2 (K, Φ).
We define a self-adjoint, positive semi-definite operator G : Y → Y by: Let λ 1 ≥ λ 2 ≥ · · · ≥ λ S ≥ 0 be the eigenvalues of G. Let, for i ∈ {1, 2, . . . , S}, φ G i be an unit eigenvector of G associated with the ith eigenvalue, and We recall the following result that will be useful to prove our main result.
Lemma 3.6 (Thm. 1.2. in [9]). It holds Lemma 3.6 suggests a heuristic choice of Φ = Φ G . Indeed, we see that the study of the eigenvalues of G allows to bound T 2 (K, Φ G ), e.g., if there exist 0 < ρ < 1 and C > 0 such that In many applications, the eigenvalues of G are observed to fast decay to zero.
We are now in position to prove our main result.
Proof of Theorem 3.4. We start from the result of Lemma 3.3: Then, we can argue as in the proof of Theorem 1.1 in [9]. By construction of T 1 (µ, K, Φ) one gets: Thus, for any α ∈ (0, 1), where in the last inequality, Lemma 3.3 has been used. Then, by Markov Inequality, using α ∈ (0, 1), and by definition of T 2 (µ, K, Φ) we get: This concludes the proof of Theorem 3.4.

Corollary: error bound for a nonlinear output
In this section we provide an extension of Theorem 3.4 to the context of a nonlinear output S(µ). To do so we consider the following problem: where H : P × X → Y is a (not necessarily linear with respect to the second argument) function, and consider the following output: where f is a (not necessarily linear) function from Y to R.
In the context of this section, our main result is based on Lemma 3.8. Problem 3.7 can be written in the framework of a non necessarily linear model M : P × (X × R) → Y and of a linear output s(µ) = , u(µ) X with ∈ X × R.
Proof of Lemma 3.8. It relies on a classical idea in the field of Data Assimilation (see, e.g., [29]), which consists in augmenting the state vector v(µ) with the output S(µ): where u(µ) ∈ X denotes the first component of u(µ) (corresponding to v(µ)) and u(µ) ∈ R its last component (corresponding to S(µ)). We then define M : and consider the following linear output: Problem 3.7 is then equivalent to: find u(µ) such that M(µ, u(µ)) = 0 with the output s(µ) = .
This concludes the proof of Lemma 3.8.
By combining Lemma 3.8 with Theorem 3.4, we get an error bound in the context of a nonlinear output S(µ). This gives a solution to Problem 3.7.

Computation of the finite difference adjoint of M
Except in some particular cases there exists no explicit formulation of the adjoint of M in the context of Proposition 3.2. To illustrate this purpose, let us consider the case where H is affine (with respect to the second argument). For the sake of simplicity, let us fix in this section X = R N . Let B(µ) denote the matrix representation of the linear part of H(µ, ·) with respect to the canonical basis of X. Even in that case, as the output is nonlinear, the operator M is also nonlinear. Recall that we assume that M is continuously-Fréchet differentiable with respect to its second variable. We want to provide an explicit formulation for the adjoint of the operator M, starting from (3.6). We first consider dM(µ, ·). For v ∈ R N +1 , recall that: which leads immediately to: so that dM(µ, u) is the following matrix, defined by blocks: where the top left block has size S × N , the top right block S × 1, the bottom left 1 × N (as f : R N → R) and the bottom right lives in R. Then we have, for x, x ∈ R N : The above formula cannot be simplified, in general. Except in special cases, the integral over (0, 1) therefore must be numerically computed. In Section 4 we will consider both cases, analytical (Section 4.1) or numerical computation (Sect. 4.2).
Below we provide examples for which an explicit formulation for the integral Example 3.9 (Special case N = 1). In the special case where N = 1 we can change variable in the integral: Although this case is exceedingly simple (because for any numerical problem N > 1), this kind of simplification can happen in other cases, as we will see below.
Example 3.10 (Special cases f explicit). In some cases the above integral can also be explicitly computed.
We give a few nonlinear examples below.
In that case, the previous change of variable still applies, and we get: For example: e xi , and it holds which can therefore be explicitly computed as a function of x and x coordinates:

Dual error bound in the context of a nonlinear output
Let us come back to our initial purpose, that is the extension of our procedure to the context of a nonlinear output. The adjoint problem writes: In a general context, the existence of a solution to this problem is not trivial, and may fail. However, if the operator H is linear, even if the output is nonlinear, as the adjoint problem writes equivalently: the unicity of the solution is provided as soon as B(µ) is invertible. In other words, w is equal to:

Efficient bound evaluation in a many-query or real-time context
In practice, the error bound (µ; α) used in Theorem 3.4 can not be directly evaluated, and one has to define a computable approximation (µ; α). Our approximation is justified and commented in [9] Section 1.3, and we recall it here for sake of self-containedness. We end this section with Lemma 3.11, which gives sufficient conditions to ensure efficient computation of our online error bound.

Estimation of Φ G .
We consider a finite subset of parameters Ξ ⊂ P, randomly sampled from the probability distribution P , and we estimate the linear operator G : Y → Y by a linear operatorĜ : Y → Y defined as: and we take as {φ i } i=1,...,K the unit eigenvectors ofĜ associated with its K largest eigenvalues. The computation of these eigenvectors can be entirely processed during the offline phase (see [9], Sect. 1.3 for more details).

Computation of
Recall that The β(µ, Φ) values can be approximated using a simple discrete minimization (i.e., replacing P by a discrete sample Ξ in the minimum/maximum defining β max (Φ) and β min (Φ)). In some cases, one can use a continuous optimization method to solve these minimum/maximum problems. It is clear that all these computations can be done during the offline phase.
We now discuss the computation of the K scalar products r(µ), φ i Y for i = 1, . . . , K with an offline/online procedure. Recall that X is a subspace of X, of dimension N such that N N .

15)
and {ϕ l } l∈{1,...,N } a set of functions from R to R. We set T = max j=1,...,S T j . Assume that, for α ∈ I, l ∈ V α , the cost of computing ϕ α l (v l ) is bounded by some positive constant R. Then, it is possible to compute each of the scalar products r(µ), φ i Y for i = 1, . . . , K, with an offline/online procedure whose online phase has the cost T × M × L × R.
Proof of Lemma 3.11. The proof is postponed to Appendix A.2.
Let us now comment the result and the assumptions in Lemma 3.11 with a few remarks.
Remark 3.12. The decomposition (3.13) plays an analogous role to the "affine parameter dependence" that is commonly assumed in the literature (see, e.g., [16], p. 1526) on linear problems. In case decomposition (3.13) does not hold true anymore but one still wants a complexity independent from the full dimension N , we may employ empirical interpolation methods (EIM) introduced in [1] to recover the required affine decomposition in the parameter µ.
Remark 3.13. Equations (3.14) and (3.15) imply that the functions {Q k,j (v 1 , . . . , v N )} 1≤j≤N ,1≤k≤Tj admit a sparse representation on {ϕ α l (v l )} α∈I,l∈Vα and that the interaction order is bounded by L. Let us emphasize that the result in Lemma 3.11 implies that the cost does not depend on the high dimension N . Therefore if we assume that T , M , L and R S, then it is possible to compute the K scalar products, with an offline/online procedure with a small cost (with respect to S). A bound for the cost in the particular case where ϕ α l (v l ) = v α l l is provided in Remark A.1 in Appendix A.2.
Remark 3.14. Obtaining such a sparse representation for the {Q k,j (v 1 , . . . , v N )} 1≤j≤N ,1≤k≤Tj is not an obvious task and requires the use or development of a high-dimensional approximation tool. Assuming that the interaction order is bounded by L may also be questionable for some applications. Note that, in case these assumptions are not satisfied, it is still possible to work with the K scalar products themselves, without any approximation. In that case, the cost of the online phase is O (S), which is still better than the full problem, whose complexity is O (S α ) with α ≥ 2 in most cases.
In the context of nonlinear partial differential equations, DEIM, a discrete variant of EIM, was suggested and analyzed in [4], allowing to reduce the computational complexity of the reduced order model due to its dependence on the nonlinear full dimension model N . Contrarily to DEIM, our approach is not specific to projection-based metamodels.

Approximation of T 2 (K, Φ).
A Monte-Carlo estimator of T 2 (K, Φ) is used: where Ξ is a sample of P.
As this quantity is µ-independent, it can be computed for once during the offline phase. Of course this computation is a numerical estimation, as such it can induce approximation errors. Such error analysis, which is related to the central limit theorem, is discussed in Section A in [9].

Computable error bound
We now rely on Theorem 3.4 and set: It is an estimator for the error bound (µ; α) in Theorem 3.4. As before, the error associated to the approximation is analyzed and discussed in [9], Section A.

First experiments with a toy model
We now apply our error bound on a non-homogeneous linear transport equation with a nonlinear output. We use the results of Section 3.3.

Toy model
Let u e = u e (t, x) be the solution of the linear transport equation: for all (t, x) ∈ (0, 1) × (0, 1), satisfying the initial condition: and boundary condition: The parameter µ is chosen in P = [0.5, 1] and P is endowed with the uniform measure. We choose a number of timesteps N t and a number of space points N x , we set ∆ t = 1/N t and ∆ x = 1/N x and we introduce the discrete unknown vector u = (u n i ) i=0,...,Nx;n=0,...,Nt . We note here that the considered PDE is an hyperbolic evolution equation, and that we perform the reduction on the space-time unknown u, of dimension N = (N x + 1) · (N t + 1). This is different from reducing the space-discretized equation at each time step.
The u vector satisfies the discretized initial-boundary conditions: ∀n, u n 0 = 0, (4.2) and the first-order upwind scheme implicit relation: Let B(µ) (resp. φ) be the matrix (resp. the vector) so that (4.1), (4.2) and (4.3) are equivalent to: We consider the different outputs of interest of Example 3.10 in Section 3.3: • Square output: s(µ) = u Nt In the following, we take ∆ t = 0.02 and ∆ x = 0.05.

Reduction
The approximation u of u is computed by using a "reduced basis" approach [16]. To be more specific, u is the solution of: where Z is an appropriate matrix found by Proper Orthogonal Decomposition (POD) (see [22] for instance). The Z matrix is the matrix of an orthogonal set of N vectors in X = R N , endowed with the Euclidian scalar product. The N number is called the reduced basis size.
The Z matrix is computed using a POD snapshot of size 70, and K = 20 retained φ G i vectors. We took a very low risk level α = 0.0001.

Results
In the following, the true error for a given parameter µ is defined as |s(µ) − s(µ)|, and the error bound aŝ (µ; α).
Let us underline that in the present study the error is given with respect to the full discretized solution (and not to the theoretical solution of the partial differential equation).
In Figure 1, we plotted, as functions of the reduced basis size, the averaged effectivity, that is defined by as far as the standard deviation of the effectivity (S Eff,Ntest ), that is defined by with N test = 200 the size of the random sample of parameter values µ 1 , . . . , µ Ntest , for three different output cases (square, exponential and triple exponential). The graphs show that the averaged effectivity decreases at the logarithmic scale almost linearly as the size of the reduced basis increases. We see that our error bound becomes sharp (close to one) as the size of the reduced basis increases, despite the highly-nonlinear output functions that have been chosen (yet, it seems almost unaffected by the degree of nonlinearity in the output). The standard deviation also decreases almost linearly at the logarithmic scale.

Nonlinear models
In this section, we illustrate the results of Section 3.2 on the discretized non-viscous Burgers equation, as an example of nonlinear model. There exist various results for the application of the certified reduced basis method to nonlinear problems (see, e.g., [8,17,24,27,28]). In these papers, the authors study error bounds specifically designed for (linearized or not) (viscous) Burgers equation, and linear or quadratic outputs. We apply here a space-time reduction procedure. We then show numerical results obtained for our probabilistic error bound, with a linear output.

Description of the model and output of interest
We are looking for u = u(t, x) satisfying: where the parameter vector µ = (α, β) belongs to [0, 1] × [0, 1] and follows an uniform law.
We discretize the above equation by using an upwind scheme. We choose a number of timesteps N t and a number of space points N x , and we set ∆ t = 1/N t and ∆ x = 1/N x , and we look for (u n i ) i,n , where i = 0, . . . , N x − 1 and n = 0, . . . , N t − 1 so that: The output functional of interest is given by the vector defined by: In other words, the output of interest is the spatial average at final time.

Reduction
As for the toy model, the reduction is performed on the full space-time state vector (u n i ) i,n . We also choose a Z matrix by a POD procedure, then define the reduced state vector ( u n i ) i,n as: where Range(Z) is the column space of Z, and || · || 2 denotes the Euclidean norm. We note here that we do not claim that this reduction is the state-of-the-art for Burgers equation, as model reduction by itself is not the goal of this paper. Table 1 gives the name and description of the various parameters used in the numerical code. Table 2 describes the various experiments that have been performed and provides references to the associated figures.

Numerical experiments
Let us define the averaged true error and the averaged error bound as Index K for the estimation of T 1 using basis φ G 8 N basis Size of the POD basis 3-10 ∆ t Time step  as far as the standard deviation of the true error (S e,Ntest ) and the standard deviation of the error bound (Sˆ ,Ntest ), that means and Reduced-basis size  with N test = 100 the size of the random sample of parameter values µ 1 , . . . , µ Ntest .
These quantities are plotted in Figure 2 (resp. 3), for a size of the POD truncated basis varying from 3 to 10, with N t = 10, 20, N x = 40 (resp. N x = 80) and other parameters described in Table 2 (as before, the error is given with respect to the full discretized solution). Note that the averaged true error and the averaged error bound (at the logarithmic scale) decrease with the same slope as the size of the POD basis increases. Also the standard deviation of the true error and the one of the error bound (at the logarithmic scale) decrease with the same slope as the size of the POD increases. Moreover, the standard deviation has the same order of magnitude as the mean, both decreasing linearly with the size of the POD basis. To quantify the computing gain we define and compute the following speed-up ratios. The first ratio r 1 is suitable to study real-time problems computing gain: full pb computing time online computing time .
Indeed for real-time problem the offline cost is not an issue, and one is really interested in the online acceleration.
On the contrary, for many-query problems, the total computing time is the quantity of interest, and we shall therefore define and compute the second speed-up ratio r 2 : K × full pb computing time offline + K × online computing time , with K = 1000. The larger the speed-up ratios, the more efficient the use of a reduction procedure is. In our experiments, the computing time were computed using Matlab cputime function. We summarize in Table 3 the full, online and offline costs, as well as the speed-up ratios, for the various experiments described in Table 2. Note that the full problem is solved by an implicit nonlinear scheme, which probably explains that the time required to solve the full problem depends not only on N x and N t but on the interaction between both. Contrary to the full problem computation time, the online computation time remains low as the number of time and space discretization points increase, ranging from 14 to 19. Thus the speed-up ratio r 1 increases with N x and N t , starting around 12 and up to 850. The offline computation time seems to be around five times larger than the full problem computation time. Although it becomes non negligible as N x and N t increase, even higher than the full problem computation time, the increase of the speed-up ratio r 2 still proves the efficiency of the reduction for solving many-query problems.

Conclusion
A class of nonlinear problems depending on a probabilistic vector has been considered, and a numerically efficient method has been designed to compute the error estimation, when approximating the output error. This method is based on two phases. The offline phase requires to compute the solution of a high-dimensional problem, and the online phase is based on the computation of the solution of a reduced-order problem. This approach has been applied to a toy model and to a nonlinear partial differential equation, namely the Burgers equation parametrized by two probabilistic coefficients. An application of this numerical method to other mathematical problems is under investigation, more precisely, it could be fruitful to investigate the impact of this new result in control theory (as done in [10] for a linear problem). Perspectives in environmental modeling, among other domains where the sensitivity analysis is crucial, are also worth considering. Also other infinite dimensional nonlinear problems could be considered, as those described by a nonlinear partial differentiable equation, where shocks may appear (it was not the case for the Burgers equation studied here.)

Appendix A. Postponed proofs
A.1 Proof of Proposition 3.2 Proof. For all µ ∈ P, x, y ∈ X, z ∈ Y we have: x − y, M (µ, x, y, z) X = x − y, Here we describe the online/offline procedure to compute where v ∈ X and µ ∈ P are given. We also make all the asumptions of Lemma 3.11 regarding the decomposition of M and m j . Using the decomposition (3.12) we have We replace (3.13) and (3.14) in (A.4): Tj k=0 α∈I j,k q j,k,α l∈Vα ϕ α l (v l ) h k (µ) y j , φ i Y .

Now we set:
q j,k,α = 0, if α ∈ I \ I j,k or if k > T j , to get During the online phase we are given µ and v. The following quantities are independent of µ and v, therefore can be computed during the offline phase: Looking back to (A.6) and using notations (3.15), the total operation count for the online phase is given by: This concludes the proof of Lemma 3.11.
Remark A.1. Let us consider the particular case (polynomial case) where for any α ∈ I and any l ∈ V α , ϕ α l (v l ) = v α l l . Let us decompose v onto a basis {f 1 , . . . , f N } of X ⊂ X. First we write each f k in the basis {x 1 , . . . , x N } of X: so that we can write: Formula (3.14) requires v l to the power α l , so we use the multinomial formula to get: (v k f k,l ) β k .
The product N k=1 (v k f k,l ) β k costs (up to a multiplicative constant) β 1 + . . . + β N = α l multiplications, so that the computation of v α l l costs (up to a multiplicative constant) #B(N, α l ) × α l operations. We know that then the cost of computating v α l l is (up to a multiplicative constant) bounded by R . Looking back to (A.6) and using notations (3.15), the total operation count for the online phase is bounded by: const. × T × M × L × R .