Non-intrusive double-greedy parametric model reduction by interpolation of frequency-domain rational surrogates

We propose a model order reduction approach for non-intrusive surrogate modeling of parametric dynamical systems. The reduced model over the whole parameter space is built by combining surrogates in frequency only, built at few selected values of the parameters. This, in particular, requires matching the respective poles by solving an optimization problem. If the frequency surrogates are constructed by a suitable rational interpolation strategy, frequency and parameters can both be sampled in an adaptive fashion. This, in general, yields frequency surrogates with different numbers of poles, a situation addressed by our proposed algorithm. Moreover, we explain how our method can be applied even in high-dimensional settings, by employing locally-refined sparse grids in parameter space to weaken the curse of dimensionality. Numerical examples are used to showcase the effectiveness of the method, and to highlight some of its limitations in dealing with unbalanced pole matching, as well as with a large number of parameters.


Introduction
The numerical simulation of dynamical systems in frequency domain is of utmost importance in several engineering fields, among which electronic circuit design, acoustics, resonance modeling and control for large structures, and many others. The computational burden of such simulations has kept increasing in the last decades: on one hand, the problem size has been growing because of the need for higher numerical resolution; on the other hand, the necessity to tune design parameters and model uncertain features has lead researchers to tackle parametric models, possibly with a large number of parameters. The purpose of model order reduction (MOR) in general, and of parametric MOR (pMOR) in the specific case of dynamical system in the presence of parameters, is to alleviate this computational load. The main strategy to reach this goal relies on building a surrogate model (reduced order model, ROM), which mimics accurately the original problem, but which can be solved at a much reduced cost. In the last two decades, the field of pMOR has thrived, leading to the development, analysis, and application of a wide collection of surrogate modeling strategies. In general, we can assign each of these methods to one of two main categories: • Projection-based pMOR. The surrogate model is built by restricting the original problem onto a suitable subspace, computed from a set of solutions (most often, snapshots of the system state) of the full problem. This requires access to the operators of the full model, which are not necessarily available in applications, for instance in the case of a black-box solver, or if the system operators never get fully assembled in the solution process. Some subcategories of projective pMOR can be identified depending on whether a global basis (such as POD/Reduced Basis [6] or multi-parameter multi-moment-matching [8,41]) or a collection of local bases (e.g., manifold interpolation of local bases [2] or of reduced system matrices [3,27,30]) are employed.
• Non-intrusive pMOR. The surrogate model is constructed by interpolation or regression of a set of solutions (usually, output samples) of the full problem. As long as the system state is not necessary for the application at hand, it is common to work directly with the system output. In this case as well, the methods can be further split into two subgroups, although the boundary between the two is more vague: some approaches set up the surrogate by solving a unique global interpolation problem [20,24,26], whereas others build it by first constructing several (rational) models in frequency only, and then combining them over parameter space [17,43,44].
Here, we focus on this last class of techniques: non-intrusive approaches which build a surrogate by interpolating local frequency models, i.e., models constructed at some values of the parameters. A discussion of the pros and cons of this local idea, compared to global approaches, can be found in the excellent survey [9] (whose focus is, however, mostly on projective pMOR), as well as in [44]. The latter article is closer in spirit to the present discussion and, as such, we will take it as starting point and main reference for our presentation. Building upon [43,44], the purpose of this paper is the description of a fully non-intrusive pMOR technique based on parameter interpolation of frequency surrogates. As will be made more clear in the next sections, the objects that are interpolated over parameter space are poles and residues of the Heaviside decomposition of the frequency surrogates. Without going into too much detail here, we summarize briefly the main novelties of our approach, focusing on how it generalizes [43]: • Both frequency space and parameter domain are sampled adaptively, allowing for a better exploration of frequency and parameter domains, as well as for an improved efficiency in the construction of the reduced model. To this aim, we leverage the minimal rational interpolation technique as described in [34].
• Our proposed approach for pole/residue matching has polynomial worst-case complexity despite the combinatorial nature of the task. Notably, our approach can also be applied in the case of unbalanced (i.e., with different numbers of poles and residues) frequency surrogates. This is a critical property in view of our adaptive frequency sampling strategy.
• We propose a framework for adaptive parameter sampling in a general high-dimensional setting by introducing a hierarchical locally-refined sparse grid structure in parameter space.
Before proceeding, we deem of importance to remark that local approaches based on interpolation of poles and residues, by their very nature, struggle in dealing with poles and residues that do not depend smoothly on the parameters. In such cases, surrogates that avoid the Heaviside decomposition, e.g., by interpolating the rational frequency response directly rather than its poles and residues [17], can have more beneficial properties, since they do not rely on the smoothness of poles and residues. Still, we choose to pursue an Heaviside-based approach rather than a rational-based one for two main reasons: • By interpolation of poles and residues, one can handle with good flexibility the fairly common case of unbalanced frequency surrogates. More specifically, we will describe how spurious poles may be removed (and missing poles reconstructed) on the fly by exploiting the structure of the Heaviside expansion.
• Despite its intrinsic difficulties in approximating non-smooth poles, an adaptive selection of the parameter sample points, as proposed in this work, can recover a good performance, by refining locally at critical parameter locations (e.g., branch points of the poles).
Outline of the paper. We introduce the parametric framework for our approach in Section 1. The ensuing Section 2 contains our main contribution, in the form of a description of the proposed pMOR technique, with each of the subsections therein examining a different feature of the algorithm. In Section 3, we investigate some of the limitations of our method in the case of crossing and non-smooth poles. We integrate our discussion with two numerical examples in Section 4, showcasing the effectiveness of our method. We conclude with a summary and an outlook for future research in Section 5.

Problem framework and notation
The field of pMOR is closely entwined with the study of parametric dynamical systems in the frequency domain. More precisely, one considers the problem: We call X and Y the state and the output of the system, respectively. The sets Z ⊂ C and P ⊂ C d (R d in most applications) are frequency and parameter domains, respectively. The subscript p of the system matrices denotes their eventual dependence on the parameters p. We remark that we do not exclude the case of the state being the output of interest: such case can be obtained quite trivially by setting n O = n S and C p equal to the identity matrix. The behavior of state X and output Y with respect to z only (for fixed p, the non-parametric case) is well understood, and is backed up by an extensive literature in system theory [4]. Among the many properties of these systems, one is crucial to our discussion, namely the Heaviside decomposition: under some quite broad assumptions on the spectral properties of the pencil (A p , E p ), we can write p ⊂ C n O ×n I , for all j. We remark that our discussion relies only on the decomposition (1.2) of the output, together with the following assumptions: (a) Y : Z × P → V , with V a normed vector space (for (1.1), we may set V = C n O ×n I endowed with the Frobenius norm).
(c) In (1.2), m (j) is independent of p for all j. return {ROM k } S k=1 , ψ 14: end function As such, our proposed strategy extends further than linear parametric dynamical systems (1.1). Some examples of practical interest are parametric scattering problems in frequency domain [22] and parametric nonlinear eigenproblems [10]. Still, for simplicity of exposition, we will restrict our discussion to the finite-dimensional linear parametric dynamical system (1.1).
In order to simplify the presentation, from here onward we will replace (c) with the stronger assumption that m (j) be equal to 1. This excludes the possibility of degenerate eigenvalues. We remark that, in most practical applications, each multiplicity m (j) is indeed identically equal to 1. We postpone a discussion on this till Sections 3 and 5.

The double-greedy pMOR strategy
We start this section by detailing our pMOR technique in its formulation without p-adaptivity. The general structure of the pole-matching-based pMOR algorithm is summarized in Algorithms 1 and 2, and the different building blocks are discussed more extensively in the following subsections.
An online-offline decomposition of the algorithm is performed, in the usual MOR fashion [36]: first, the surrogate model is built in an expensive training phase, which requires solving the original problem at several values of frequency and parameters; the reduced model is then stored and can be evaluated with a (hopefully) much reduced computational cost at arbitrary frequency and parameter values.
The offline phase is shown in Algorithm 1. The input of the procedure is a set of parameter values P train = {p 1 , . . . , p S } ⊂ P, where reduced models in frequency only are built: more precisely, for each k = 1, . . . , S, we compute the approximation (2.1) More details on this step are given in Section 2.1. The addends of the sum in (2.1) are then sorted in such a way that poles λ (j) k and residues Y (j) k with the same j but different k "correspond to each other". We explain what we mean by this, and how we achieve it, in Section 2.2. In the same section, we also discuss how we deal with the situation where two surrogate models have different amounts of poles, i.e., n k = n . For the remainder of the present overview, for simplicity we assume n 1 = . . . = n S =: n.
for k = 1, 2, . . . , S do Loop over frequency models 6: Extract λ Add local contribution 8: end for 9: Add Heaviside term 10: end for 11: return Y 12: end function At this point, it only remains to prescribe a rule to define the reduced model at a new parameter value p ∈ P \ P train . To this aim, we define S weight functions ψ = (ψ 1 , . . . , ψ S ) : P → C S , which we employ to interpolate poles and residues over p-space: The resulting global model can then be evaluated through Algorithm 2, as Note that, in order for the frequency surrogates to be interpolated exactly, the weights should satisfy the conditions If this is the case, we do not need to modify our algorithm, because the additional terms do not require to be matched nor permuted. The resulting global surrogate (after pole matching) is simply

Frequency adaptivity via Minimal Rational Interpolation
Here we provide some details on the function BuildFrequencyROM in Algorithm 1, which encodes the construction of a ROM for a non-parametric dynamical system with respect to a single ROM← BuildMRI(Z train , snapshots) SVD of a matrix of size #Z train [33] 10: Evaluate Y (z , p) Full model solve 12: snapshots ← snapshots ∪ {Y (z , p)}

13:
Move z from Z test to Z train 14:

17:
return ROM 18: end function parameter, namely the frequency z. A number of surrogate modeling strategies for such problems have been proposed in the MOR literature: the most famous are, among projection-based methods, the Proper Orthogonal Decomposition/Reduced Basis [9] and the Krylov/Moment Matching [8] methods, and, among the non-intrusive techniques, the Loewner Framework [24] and the Vector Fitting algorithm [15,19]. Any of them could be used to supply the surrogate modeling that we require. In fact, different frequency surrogates could even be obtained by different MOR approaches, as long as each reduced model allows for a Heaviside expansion (2.1).
In this work, we consider the Minimal Rational Interpolation (MRI) method proposed in [33], generalizing [11]. MRI is non-intrusive, i.e., it does not require access to the matrices appearing in the dynamical system (1.1), allowing for a wider applicability of the method. In particular, MRI can be used to efficiently build surrogates for vector-valued quantities, e.g., high-(or even ∞-)dimensional states X of dynamical systems (1.1). (In fact, the effectiveness of MRI may even improve if the ambient space, where snapshots of the approximation target are located, is large.) At the same time, one can apply it in a greedy fashion [34], so that the number and location of frequency samples are selected adaptively, in such a way that a prescribed accuracy is attained over the whole frequency domain Z. The specific greedy strategy that we choose has a proper theoretical motivation only if the parametric problem depends on z in a simple way, e.g., linearly, as in (1.1). If this is not the case, or if the dependence on frequency is not known, one may want to consider possible alternatives [34]. More details on the method are provided in [33]. Here we only give a short overview, which is summarized in Algorithm 3.
Let a parameter p ∈ P be fixed. Given S distinct frequency sample points Z train ⊂ Z and corresponding snapshots {Y (z, p)} z∈Z train , the MRI procedure first builds a surrogate denominator Q of degree ≤ S − 1 by solving a minimization problem involving the snapshots: in practice, this problem can be solved at O(S 3 ) computational cost by SVD. Now the surrogate poles {λ (j) } n j=1 (n ≤ S − 1 is the degree of Q) appearing in (2.5) can be extracted from Q through any root-finding algorithm 1 . Then we set m = S − n − 1, and we compute the S terms The procedure just described is encoded by the function BuildMRI in Algorithm 3. The adaptive selection of frequency sample points is carried out via the typical greedy-MOR loop: at each iteration a new sample gets added, at a position selected in the test set Z test , based on the current reduced model. More precisely, the next sample z is selected as the maximizer over Z test of some a posteriori indicator, and the algorithm terminates when such indicator is smaller than a prescribed tolerance. However, in our framework, standard residual estimators (e.g., the classic Reduced Basis one [36]) cannot be employed in a non-intrusive manner. Instead, we apply the "look-ahead" idea introduced in [34], to which we refer for more details: • Exploiting only the current reduced model, we select the next sample point z , as the maximizer over Z test of • Since an unknown scaling constant is involved, (2.7) does not tell us whether the prescribed tolerance is satisfied; to obtain this information, we perform an expensive solve of the full model at z , and evaluate the approximation error explicitly.

Remark 2.2.
As is evident from Algorithm 3, the extra expensive solve of the full system does not go to waste, as it is exactly the snapshot which is necessary at the next iteration. Only at the last iteration, when the tolerance is finally satisfied, the additional solution is not included in the surrogate. In practice, to avoid wasting the "free" final snapshot, we can actually add even this last sample to the reduced model by running BuildMRI an additional time after the greedy loop is completed. Remark 2.3. As discussed in more detail in Section 2.2, for a reliable matching of poles and residues of different frequency ROMs, it is crucial to remove any unwanted spurious (sometimes referred to as "parasitic" or "non-physical") pole-residue pairs. As such, it is usually worth the effort to add a post-processing "clean-up" step, to remove unwanted pole-residue pairs. The development of reliable strategies to identify spurious effects remains an open problem in rational approximation. However, two quite simple and inexpensive criteria that can be applied are: • Remove poles which are too far away from Z.
In addition, it might be possible to remove poles which are provably spurious based on the properties of the problem, e.g., non-real poles for self-adjoint problems, or unstable poles for stable systems. Any pole removed in this context should be automatically balanced by increasing the number of smooth terms in the ROM, namely m in Remark 2.1. We refer to [33] for a more detailed discussion on this procedure, which, employing the notation therein, could be summarized as "choosing N < S − 1". Thanks to this strategy, if the eliminated poles are indeed spurious, their removal does not impact negatively the accuracy of the frequency surrogate.
At this point, it is important to note that, in general, we cannot guarantee all the frequency ROMs to have the same number of pole-residue pairs n, even after a perfect removal of all the spurious effects. The main reason for this is that the full model (1.1) may have a different number of poles in Z for different values of p. A numerical example showcasing poles entering and leaving the frequency domain can be found in Section 4.1. This can quickly become problematic, since the greedy MRI algorithm is not guaranteed to identify well poles outside Z (actually, even to capture them at all). One could try to counteract this issue by building frequency models on an enlarged domain Z ⊃ Z, at the cost of a higher offline time. Still, the same issue of migrating poles might simply arise again at a larger scale.

Matching frequency models
The offline matching loop in Algorithm 1 takes care of permuting the Heaviside addends of the frequency ROMs, see (2.1). The objective of this step is to identify with good accuracy how the location of each pole-residue pair evolves as p changes: this means that, for each j, we wish λ S ) to be approximations of the "same" pole (resp. residue) of the parametric problem for p = p 1 , p 2 , . . . , p S . By "same" pole (resp. residue) we mean some continuous curve p → λ defined in Section 1. Note that this notion is not unique whenever poles intersect. This case is discussed in Section 3.
The frequency surrogate models are explored in a breadth-first fashion, matching one new ROM at a time to its closest (with respect to p) neighbor. We exclude a global approach, matching all frequency models at the same time, due to its computational unfeasibility: Spartite matching is notoriously an NP-hard problem for S ≥ 3 [25].
Let us characterize one of this bipartite matching problems, namely the one between ROM k and ROM . For the remainder of this section, we assume that the two models have the same number of poles, i.e., n k = n =: n. The more general and, due to our choice of frequency surrogate modeling, (potentially) more common case n k = n is discussed in Section 2.2.1. We seek σ = (σ 1 , . . . , σ n ) which attains where (1, 2, . . . , n)! denotes the set of permutations of the tuple (1, 2, . . . , n). We will refer to the minimal value of (2.8) as the Heaviside distance 2 between models ROM k and ROM . In (2.8), w ∈ [0, ∞] denotes a weight, representing the relative importance given to poles and residues in the matching. We note that a cost functional almost identical to (2.8) was first introduced in [43] to tackle the pole-residue matching. An alternative formulation of the same problem is: given the cost matrix D ∈ R n×n , with entries we wish to extract exactly one value per row and one value per column, so that the sum of the selected entries is minimized. Despite the combinatorial nature of this optimization problem, there exist polynomial-time algorithms to solve it (e.g., using ideas from maximum flow problems [14]). Here, we solve this problem via the linear sum assignment function in the scipy.optimize module [40], which requires as only input the cost matrix D. As shown in [14, Section II.C], this implementation has O(n 3 ) worst-case complexity, but we note that only O(n 2 ) operations are necessary if just O(1) pole swaps are needed, i.e., if the optimal σ is "close" to (1, 2, . . . , n).
Remark 2.4. When n k = n , the matching optimization problem is symmetric. By this, we mean that, once the optimal permutation has been found, we can either apply it to the "new" model ROM , or apply its inverse to the "old" model ROM k . In practice, in the scope of the breadth-first search in Algorithm 1, it is convenient to always rearrange the poles-residues of the new model , since applying the permutation to model k would require permuting all the already explored models {ROM k } k ∈J .

Unbalanced matching
Suppose n k < n (the converse case can be treated analogously by considering a "transposed" matching problem). We can cast the matching optimization problem in rectangular form: find σ = (σ 1 , . . . , σ n ) which attains (The minimization of (2.10) does not require more computational effort than a balanced problem (2.8) of size n [14].) Then (σ 1 , . . . , σ n k ) gives the desired permutation, while the indices σ n k +1 , . . . , σ n remain unassigned. It remains to choose how to deal with the n − n k extra pole-residue pairs with indices σ n k +1 , . . . , σ n : this choice depends on whether we think that they are (•) missing from ROM k or (•) spurious in ROM .
The solution for case (•) is quite straightforward: we simply remove the n − n k erroneous poles and residues from ROM , as well as from all the surrogates matched to it. Note, however, that the n k remaining residues of ROM should be updated after the removal of the spurious poles, in order to guarantee good approximation properties at p . To this aim, it is quite cheap to recompute the residues of ROM by solving an interpolation problem, cf. (2.6), exploiting the already available snapshots at p . In particular, the approach described in Remark 2.3, i.e., the addition of smooth terms to compensate for the removed poles/residues, can be applied here.
Instead, in case (•), the problem is much harder: we wish to reconstruct the poles and residues unaccounted for at p k from information at p . One naive way to achieve this involves "copying" the missing poles and residues from ROM to ROM k , i.e., appending the synthetic terms Y to the Heaviside expansion of ROM k . This results in an adjusted surrogate, with a balanced pole matching. Note that, similarly to pole removal, the n k original residues of ROM k should be updated to account for the added synthetic terms. We remark that more refined (but also potentially less stable) approaches may be based on other forms of extrapolation of the additional Heaviside terms, not only from ROM , but also from all the other frequency surrogates that contain the poles with indices σ n k +1 , . . . , σ n , cf. Remark 2.7 and Section 2.3.
Remark 2.5. In case (•), one could impose a balanced matching by retraining the poorer model ROM k , forcing more iterations of greedy MRI until the surrogate has exactly n poles. However, this approach has two clear issues, which might make it disadvantageous in practice: • If many surrogates need retraining, one may incur in substantial additional computational cost. This is the case especially if p-adaptivity is employed, see Section 2.4.
• Instead of identifying correctly the missing poles, MRI could introduce spurious effects, thus interfering with the matching procedure, rather than helping it. This is likely to happen if the missing poles are located outside the frequency domain Z. if n k > n then 7: Extrapolate at index 8: Flag the n k − n poles added to ROM as "synthetic" 9: n ← n k 10: else if n k < n then 11: for i ∈ J do Extrapolate at indices J 12:

13:
Flag the n − n i poles added to ROM i as "synthetic" 14: n i ← n k is synthetic} 21: if synthetic count > S(1− tol synth ) then 22: Remove λ (j) from all ROMs Remove poles which are synthetic too many times 23: end if 24: end for 25: Optional : apply higher-order reconstruction to all remaining synthetic poles Remark 2.7 At this point, we deem important to give a caveat: neither of the two approaches above (pole reconstruction and removal) is, on its own, able to solve adequately all practical situations, as we showcase in a practical example in Section 4.1. Instead, a hybrid version, with some poles getting removed and some reconstructed, has the potential to perform better than either approach. Here, we choose to pursue this third approach, whose effectiveness obviously hinges on how well we can differentiate between "good" and "bad" poles.
We summarize our proposed strategy in Algorithm 4, which replaces the simple matching loop in Algorithm 1. The main idea is the following: whenever it becomes necessary to match unbalanced models, the naive reconstruction (2.11) is used to augment the less rich surrogate(s); however, the added synthetic poles are flagged as unreliable. At the end of the matching loop, all the models contain the same number of poles n = max k=1,...,S n k . At this point, if a pole with a certain index is too often unreliable, it is removed from the pROM. Here, "too often" is determined based on a given tolerance tol synth between 0 and 1: the extreme values 0 and 1 correspond to cases (•) and (•), respectively.
Remark 2.6. The reconstruction of missing poles introduces an asymmetry in the matching procedure, so that the order in which the models are matched matters: for instance, see the situation depicted in Figure 1. As such, it seems important to choose well the root in the breadth-first exploration of P train . However, choosing optimally this root remains an open problem. From a computational point of view, it makes sense to choose as root the surrogate with the largest number of poles, so that we never have to retrace our steps to add synthetic poles. Remark 2.7. The constant pole reconstruction (2.11) is quite blunt, especially when the parameter resolution is low. If the poles depend smoothly on p, it is preferable to employ a reconstruction with a larger stencil, for instance global (least squares) polynomial extrapolation, using information from all the surrogates which contain the missing pole, see Section 2.3. However, this is not always viable during the matching loop, since we explore P train breadthfirst: for instance, if n k < n , we are forced to reconstruct poles from the single new model ROM (a similar problem may arise in the case n k > n if J is too small). Still, as shown in the last line of Algorithm 4, it remains feasible to apply a higher-order pole reconstruction after the matching loop is complete.

Global vs local interpolation
In this section we describe how one can employ the local pole-residue information (2.1) at P train to obtain an approximate Heaviside expansion at a new point p ∈ P \ P train . As shown in (2.2), we rely on an interpolation strategy encoded by the weight functions ψ : P → C S , so that we simply need to prescribe how such weights are constructed. It is important to note that this task, in its natural formulation, is independent of the frequency surrogates, and depends only on the location of the sample parameter points P train , cf. Remark 2.8.
Setting (2.4) as target, we can cast the problem of finding ψ as S independent P → C interpolation problems (one for each ψ ). To address this problem, an extensive amount of techniques and results are available in the literature [13,32,42]. Here, we consider 3 options: • Global. We can seek weights ψ within some function space with global regularity, e.g., polynomials, or radial basis functions (using a smooth kernel to achieve interpolation). This approach can potentially achieve high accuracy, but relies on some level of smoothness of poles and residues with respect to p. It is important to note that, in this framework, the interpolation condition (2.4) could be weakened and enforced only in a least-squares sense: the resulting surrogate would then require less memory for the storage of approximate poles and residues, cf. the "regression" step in [43].
• Local structured. To satisfy (2.4), we can employ locally supported basis functions, e.g., piecewise linear "hat functions", or splines. If d ≥ 2, this approach requires the sample points to be selected in a structured way, for instance using sparse grids, see [5] and Section 2.4, or a mesh-based discretization of P [17].
• Local unstructured. A very simple, but nonetheless practical, way to enforce (2.4) is to construct the weights using a Voronoi tessellation [16] of P based on the sample points P train , i.e., ψ (p) = 1 if p = arg min p ∈P train p − p , 0 otherwise. (2.12) This results in a nearest-neighbor approach, characterized by low accuracy, but also by a great flexibility. In fact, this strategy does not even require the poles to be matched. The main drawback of this approach is that it does not "follow" the evolution of the poles in p-space, resulting in limited predictive capabilities.
Remark 2.8. Depending on the application, the interpolation conditions (2.4) may be complemented by additional constraints to preserve important system properties, like stability, realness, or passivity [17,19].

Parameter adaptivity
Until now, we have assumed the parameter sample points P train to have been fixed in advance. However, in many situations, it proves extremely useful to have some kind of adaptivity included in the sampling of P, so that samples may be added only where the surrogate model is particularly inaccurate, e.g., in our case, near pole mismatches or where large interpolation errors occur. Still, it is quite difficult to devise adaptive strategies in non-intrusive MOR, especially if the number of parameters d is large, since not much is known about the parametric dependence of the problem. In the context of sampling from high-dimensional parametric spaces, sparse grids have been often employed in MOR as a way to alleviate the curse of dimensionality, see, e.g., [6,12,21]. Here, the focus is on adaptive sampling, and we propose a technique based on locally-refined sparse grids, closely related to that considered in [1], which, in turn, relies on some ideas from [28,31]. For simplicity, we carry out our construction in the case P = [−1, 1] d .
It is useful to define the infinite point set which is dense in P (it coincides with the dyadic rationals in P) and also a superset of any tensor grid (by construction). We will choose the adaptive sampling points within Φ. Now, assume that i.e., that the coordinates of p = j 1 /2 n 1 −1 , . . . , j d /2 n d −1 ∈ Φ are fractions in lowest terms (with n k = 0 if j k = 0). We define the forward points of p as the (≤ 2d) elements of the discrete neighborhood U (p ) = Moreover, to each p satisfying (2.14), we associate a hierarchical hat function ϕ p : P → [0, 1] according to the definition with ϕ p,0 (x) ≡ 1 and, for n = 1, 2, . . ., By construction, ϕ p is zero at all p ∈ Φ of which p is a forward point (the backward points of p ), and also at all backward points p ∈ Φ of such p , etc., all the way back to p = 0. We show some two-dimensional examples of forward points and of hierarchical hat functions in Figure 2. We rely on hierarchical hat functions to cast piecewise-linear interpolation problems over subsets of sparse grids. More precisely, given sample points P ⊂ Φ, and data {f (p )} p ∈P , the piecewise-linear interpolant of f based on samples at P is the unique element f P of span{ϕ p } p ∈P which interpolates exactly the data: this means that there exist unique coefficients {c p } p ∈P , depending only on P and {f (p )} p ∈P , such that The desired Lagrangian basis (2.4) can then be found by setting the data as {f (p ) = δ p p l } p ∈P , with δ the Kronecker delta. We note that the expression of each hierarchical basis function (2.15) depends only on its support point p ∈ P , whereas the expression of each Lagrangian basis function (2.4) depends on the whole support set P . Now we are ready to describe our adaptive technique, which is summarized in Algorithm 5. As in the greedy selection of frequency samples (Algorithm 3), the adaptivity is achieved through a "look-ahead" idea, although here the approach is rather heuristic: we use the forward points of the current training set as test set, i.e., parameter values at which the accuracy of the current pROM is evaluated. If the surrogate model is too inaccurate at some of the test points, they are added to the training set. This loop is repeated until a specified tolerance is achieved at all current test points. This approach, differently from the usual isotropic adaptive sparse grid sampling [5,29], in general does not add whole levels Γ(n), but only subsets of them. In fact, the training set is not even guaranteed to be downward-closed, i.e., a point might be in the training set while some of its backward points are not. The matter of missing backward points is discussed to some detail in [1, Section 3.2]. In the remainder of our presentation and in our numerical experiments, we do not require missing backward points to be added to the training set, both for simplicity of exposition and (mainly) to reduce the cost of the offline phase 3 . We remark that we are allowed to work with a non-downward-closed training set because the error estimator driving our padaptivity does not rely on interpreting the expansion coefficients c p in (2.16) as "hierarchical surpluses", as is commonly done in adaptive sparse grids [28]. As a side note, we observe that not including the backward points makes it necessary to recompute the expansion coefficients (2.16) from scratch whenever new training points are added.
Within each iteration, in order to quantify the accuracy of the pROM at a test parameter value p, we use the following strategy: (a) We use the current pROM (whose training set does not include p, nor any of the other test points) to predict the frequency response at p, using (2.3), where the weight functions ψ k are hierarchical hat functions (2.15) built by piecewise-linear interpolation (2.16), cf. Remark 2.9.
(b) Through Algorithm 3, we build a frequency surrogate at p, which we take as "truth frequency response" at p. This requires solving the full model at p, at as many frequency points as required by the z-greedy procedure.
(c) We compare poles and residues of the two models by employing (2.8) as distance; this requires the solution of a pole-matching problem.
For the sake of efficiency, it is crucial to observe that, over the different p-greedy iterations, function BuildFrequencyROM may be called multiple times with the same argument (not only when the pROM is built through pROM Train, see Algorithm 1, but also when evaluating the accuracy of the current model on the test set). As long as memory is not an issue, one should store frequency surrogates built at previous p-greedy steps, so that no expensive solve of the full model is wasted. Remark 2.9. In (a), we have forced our pROM technique to reconstruct poles and residues only through piecewise-linear hat functions. However, as discussed towards the end of Section 2.3, in some cases one may want to employ a matching-free nearest-neighbor reconstruction. This is easily achieved by using the piecewise constant basis (2.12) instead of hierarchical hat functions. In this case, to better account for the approximation properties of interpolation basis, it is natural to employ Haar-type sparse grids [13,28], which can be obtained as in Section 2.4, replacing (2.13) by if n > 0. This essentially corresponds to restricting the sparse grid points to the interior of P.
Whatever the reconstruction strategy in step (a), any of the methods presented in Section 2.3 can still be applied as a post-processing step, at the end of the greedy loop. The reason for this additional computation could be, for instance, a smoother representation of poles and residues, or the removal of eventual noise by regularization. To this aim, we wish to stress that some care should be used when selecting the pole-residue reconstruction strategy. Indeed, due to the local Initialize P train ⊂ Φ (with "few" elements), tol > 0 return pROM 18: end function nature of the p-refinements, finer and coarser sampling regions may arise, which, if not taken into account, could lead to a poorly-behaving reconstruction. For instance, global polynomial interpolation over sampling points which are "too wild" can be an extremely ill-posed problem, due to a large Lebesgue constant [32], whereas polynomial regression with low enough degree can be expected to behave more nicely.
Remark 2.10. The strategy that we presented is heuristic. In particular, it does not guarantee that, at the end of the greedy loop, the tolerance will be attained over the whole parameter domain, since we are using a relatively small (and sparse) test set to quantify the approximation error. Representing ("sketching") the parameter domain by the test set can be justified only by assuming the resolution of the test set to be sufficiently fine. However, in practice, this is usually computationally unfeasible (especially if the number of parameters is large, due to the curse of dimensionality).
Remark 2.11. In [43], a somewhat similar p-adaptive approach was proposed, which, however, can be applied only to the single-parameter case. While the adaptivity there was essentially "unidirectional", adding samples progressively from one end of P to the other, here, through sparse grids, our approach acts more "isotropically". Remark 2.12. As in the frequency-adaptivity, see Remark 2.2, once the greedy iterations are over, we can take advantage of the extra samples taken at test parameter points, and build a much richer pROM than the one that satisfied the tolerance constraint. Here, this idea is even more attractive than for MRI, since the test set can (and usually does) contain quite a large number of parameter values, as opposed to just 1.

Remarks on the smoothness of the Heaviside decomposition
Based on how smoothly the system matrices {A, B, C, E} in (1.1) depend on p, it is possible for the spectral quantities {λ (j) , m (j) , Y (j) } in (1.2) to depend smoothly on p as well. More precisely, continuous dependence is often passed on from matrices to Heaviside terms: small perturbations of the system matrices yield small perturbations of the poles λ (j) and of the residues Y (j) , at least as long as multiplicities m (j) are independent of p and poles do not cross. However, inheritance of analytic dependence cannot in general be guaranteed, since (polynomial) branches may naturally arise when poles cross. We refer to [37,Chapter 12] for an introductory discussion on the topic, and we report here two simple representative examples.

A toy example of mode steering
First, we showcase some of the intrinsic difficulties in dealing with crossing or almost crossing poles, even in the absence of bifurcations. The example below was obtained by generalizing a numerical test from [3].
For some fixed ε ∈ R, set d = 1 and take as the matrices defining a parametric dynamical system of the form (1.1), which depend smoothly on p ∈ R (and ε). The system output can be explicitly computed as where the poles and residues are , it is not difficult to conclude that, if ε = 0, the two poles coincide for p = 0. However, for fixed ε, this degeneracy does not have a negative impact on the smoothness of the Heaviside decomposition 4 , since the poles remain simple: We illustrate how the pole-matching algorithm performs in this simple example, using the exact Heaviside expansion (3.2) in place of the frequency surrogates one would obtain, e.g., via MRI. First, we fix ε = 0, and consider the two parameter values p 1 = −1 and p 2 = 1, where the Heaviside expansions of Y are We depict poles and residues in Figure 3. If the matching is accurate, pole 0 should be matched with itself, and −2 with 2. As described in Section 2.2, the matching criterion can be stated in terms of the cost matrix (2.9), which here equals Interestingly, the joint dependence on p and ε is non-smooth. Indeed, by comparing (3.3) and Y (z, 0; ε) = 1/2 z − ε we observe that the residues are discontinuous at (p, ε) = (0, 0). However, it is important to note that, while discontinuous, the residues stay uniformly bounded, since the matrix Ap is real symmetric, hence diagonalizable, for all (p, ε) ∈ R 2 . This means that matching −2 with 2 and 0 with itself has cost 4, whereas matching −2 with 0 and 0 with 2 has cost 4 + 2w. Hence, as long as w > 0, the algorithm performs the correct matching and recovers the exact response. However, if w = 0, the two costs are the same, and any matching is allowed.
The matching becomes less trivial if ε = 0. For instance, let ε = 1/2. We represent graphically poles and residues in Figure 4 (a). By building the cost matrix (2.9) in this case (we omit the calculation here), we can conclude that the optimal matching changes depending on whether w > < 5 − 2 √ 5 0.53: if poles have more importance than residues (i.e., w is small), the surrogate poles do not cross, see Figure 4 (b), whereas they do if the weight of residues is dominant (i.e., w is large), see Figure 4 (c). At least qualitatively, case (b) appears to be a slightly better approximation of the poles and residues of the system. Still, neither of the surrogates (b) and (c) identifies the pole-residue behavior in a satisfactory way, and only adding more sample points (starting from p 3 = 0) will allow for a significant improvement in the quality of the approximation.

A toy example of bifurcation
In this section we show a deceptively simple example of bifurcation, which, in practical applications, may arise due to unfavorable spectral properties of the problem (e.g., local nondiagonalizability).
Fix a small ε, and consider the scalar problem with a single parameter For p = ε, the corresponding Heaviside expansion is readily found: whereas, for p = ε, Y (z, ε; ε) = z −2 , with a double pole at 0. We can see that the poles are non-smooth (as functions of p) at p = ε, where they exhibit a bifurcation of degree 2, see Figure 5 (a). Moreover, the residues are not only discontinuous at p = ε, but also unbounded there. This is due to the poles transitioning from single to double. We apply our pMOR approach in this simple example, assuming, for simplicity, that the surrogate poles and residues coincide with the exact ones. If only the two parameter values p 1 = −1 and p 2 = 1 are considered, the pROM will inevitably fail to identify (p, z) = (ε, 0) as a branch point. Indeed, our method "follows" separately the evolution with respect to p of each pole. Consequently, the degree of each denominator in the Heaviside expansion (2.1) is kept equal to 1 even when crossing the singularity. The result, displayed in Figure 5 (b), shows two surrogate pole lines which miss the branch point by "twisting" around it in the complex plane. Of course, as soon as more parameter points are added, the approximation quality improves significantly: in particular, if p-adaptivity is applied in this case, a progressively more accurate approximation of the exact poles is built as the greedy iterations proceed, see Figure 5 (c). It is important to remark that the structure of this kind of bifurcation would be much better identified by grouping together all of the involved branches, i.e., by merging several terms of the Heaviside decomposition into a single fraction with denominator degree > 1 (here, 2) and coefficients smoothly dependent on the parameter, cf. the exact expression (3.6). This approach is implicitly applied by most methods based on global rational interpolation, e.g., [18,20,24], which, in fact, do not rely on the Heaviside decomposition of the output (1.2), but on its rational form where Q is a polynomial in z of degree ≤ n S , with p-dependent coefficients. Procedures to integrate global rational interpolation concepts in our approach are being investigated, with the objective of improving the effectiveness in the approximation of non-smooth poles and residues. A possible idea revolves around a "batch-matching" of poles and residues (matching few-to-few instead of 1-to-1), which would allow an exact recovery of the simple bifurcation in the example above. Before proceeding, we wish to briefly discuss what happens when a parameter sample point is added exactly at the branch point, which is the case, e.g., if ε = 0 and we place a parameter sample at p = 0. In such situations, there are two possibilities: • The frequency surrogate (built, e.g., by MRI) identifies the double pole correctly. In this case, a simple Heaviside decomposition of the form (2.1) does not exist and our algorithm, as we presented it, fails. The batch-matching approach mentioned above (currently under investigation) would allow dealing in a natural way with this case.
• The frequency surrogate mistakenly identifies the double pole as a couple of very close simple poles, whose residues have a large magnitude as a result, cf. (3.7). In this case, we may compute a Heaviside decomposition of the form (2.1), and proceed as usual with the pMRI algorithm. However, due to the unboundedness of the residues, the approximation quality may be locally sub-optimal near the branch point.
We remark that, due to round-off noise (in the computation of the snapshots, in the MRI procedure, and in the Heaviside decomposition), the latter case is much more likely to present itself in practice.

Numerical examples
We report in this section two numerical tests as evidence of the effectiveness of our technique. Our simulations were performed on the Helvetios cluster at EPFL [38]. For the sake of reproducibility, the corresponding code has been made available in [35].

Laplacian eigenvalues on a parametric rectangle
In this section we study a somewhat academic application in the field of PDEs for d = 1, which was originally considered in [39]. Given (k, H) ∈ C × R + , we take the following Helmholtz equation on the rectangle with f : ]0, 1[ 2 → R a piecewise constant forcing term (see [39,Section 4.1] for the exact expression). Given how simply the geometry and the data of the problem depend on the parameter, we can recast (4.1) on the reference domain Ω 1 = ]0, 1[ 2 to make the parametric dependence emerge more clearly: By inspection of the PDE above, we infer that (z, p) := (k 2 , H −2 ) is a good choice of parameters to study (4.2). We set as frequency and parameter ranges Z = [10,50] and P = [0.2, 1.2], respectively. After spatial discretization by FEM on a regular mesh with n S degrees of freedom, we obtain an algebraic problem of the form where K 1 , K 2 , and M are n S × n S matrices, and U (z, p) and b are vectors of size n S (here, we choose n S = 10201). We remark that (4.3) is in the form of a dynamical system (1.1) with B p = b. In order to conform to the functional setting of the PDE, we choose as norm over V the functional L 2 norm, which, in the discrete setting, corresponds to the energy norm induced by M : The functional H 1 0 or (k-weighted) H 1 norms are also viable options; in our experience, the results of the simulation are barely affected by the choice of the norm.
We set as our target the FEM solution U (z, p), i.e., we fix C p = I, the identity matrix, in (1.1). The poles of the Heaviside decomposition of U gain additional importance as FEM approximations of the eigenvalues of the Laplace operator on the parametric domain Ω H . Due to our choice of domain, such eigenvalues 5 are actually available in closed form: see Figure 6 ( ). We use the exact expression of the poles (4.4) to validate the results obtained by our double-greedy pMOR technique. We employ the following computational setup: • The frequency and parameter training sets are initialized as 3 equispaced points in Z and P, respectively, whereas the frequency test set contains 100 equispaced points in Z.
• For MRI, Legendre polynomials are employed, whereas global monomials of degree 2 are used to interpolate (in a least squares sense) poles and residues after the greedy loop, in a post-processing "compression" step.
• After both frequency and parameter greedy loops, the information at the test points is not wasted, but included in the final surrogates, see Remarks 2.2 and 2.12. After the frequency greedy loops, we remove from the frequency surrogates any pole whose distance from Z is larger than 20, see Remark 2.3.
• The matching weight w is set to 1. The frequency and parameter greedy tolerances are set to 10 −4 and 10 −2 , respectively.
On top of this, we consider three different choices for the tolerance tol synth , which is employed to deal with unbalanced matching, see Algorithm 4: Also, after the pole-matching has been completed, we improve the synthetic poles by extrapolating via global degree 2 monomials the non-synthetic poles, see Remark 2.7. This affects only tol We show the surrogate poles in Figure 6. In all cases, we can observe that the pole-crossings are handled well by the matching algorithm. This is likely due to the fact that residue information is taken into account (w > 0), cf. Section 3.1.
In case (i), one erroneous pole crosses the frequency range Z for small p. This is not caused by spurious poles being present in the frequency surrogates, but by an inaccurate matching of correct poles lying on different sides of Z. Due to the strict tolerance in case (ii), the pROM is blind to most of the poles which are not uniformly inside Z. Instead, the hybrid approach (iii) achieves a good compromise, missing only one pole that leaves Z "too quickly" on the bottom right.  The "hystory" of Algorithm 5, namely the location of the new samples and the magnitude of the greedy error indicator, is portrayed in Figure 7. In the less strict cases (i) and (iii), the algorithm correctly identifies the "busiest" region of P, i.e., small values of p, as critical for a good approximation. This results in local refinements near p = 0.2. Instead, case (ii) only performs global refinements before terminating. This is actually a symptom of a general (undesirable) property of Algorithm 5: if tol synth is too large, pole-residue pairs might be removed too "aggressively" from the surrogate, so that the p-greedy termination criterion based on the Heaviside distance (2.8) contains only a few terms. This, in turn, yields a smaller Heaviside distance, which is more likely to satisfy the prescribed p-greedy tolerance, potentially leading to an early termination. The simplest solution is to reduce tol synth . Alternatively, one could partition the parametric domain P = i P i , and then build a different pMOR surrogate on each parameter sub-domain P i , see, e.g., [21]. On each sub-domain, it is less likely that a relevant pole will be discarded due to it being "too often" synthetic, cf. Section 2.2.1. This second approach is more costly (especially if P is large and/or high-dimensional), but is particularly advantageous when the poles move very quickly through the parametric domain (i.e., if the gradient ∇ p λ is large).
In Figure 8 (left), we compare exact and surrogate models at the point p = 0.425. For simplicity, we only consider the best surrogate, obtained with tol (iii) synth . The approximation seems of good quality, with pole locations and residue magnitudes being identified extremely well, and the approximation error is small. In particular, the results seem better than those obtained by using the closest frequency surrogate (i.e., the MRI built at the element of P train closest to p , namely, p close = 0.41875), see Figure 8 (right).
We also report a summary of the execution of the method and of the resulting pROM in Table 1. Our results agree with the main motivation behind the introduction of tol synth , namely that it should control how to deal with uncertain or missing information, resulting in richer (but also more noise-prone) or poorer (here, insufficient) surrogates.

Transmission line with high-dimensional parameter space
Our last numerical example concerns the analysis of the admittance parameters of the 3-port transmission tree depicted in Figure 9. A motivation and similar tests can be found, e.g., in [19]. Each branch is composed of a series of unit RLC cells: the "main" branch contains 400 cells, whereas the "up" and "down" branches contain 200 cells each. Resistance, inductance, and capacitance vary between cells: more precisely, if we restrict our focus to the main branch, the values of R, L, and C of the j-th cell are, for all j = 1, . . . , 400, We consider the frequency range Z = [0, 8] GHz, and, as parameters in our pMOR approach, we take the 9-dimensional vector The admittance parameters can be found by solving a system of the form (1.1), obtained by Modified Nodal Analysis: in particular, while B p = B and C p = C are independent of p. In our case, the state X, which contains currents and voltages within the circuit, is a matrix of size 1603 × 3, whereas the output Y (the admittance matrix) has size 3 × 3. Our MOR setup for Algorithm 5 is as follows: • The frequency training set is initialized to the order 10 Chebyshev points of Z, whereas the frequency test set contains 100 equispaced points in Z; the parameter training set is initialized to 19 points in P: the origin p = 0 and its 18 forward points.
• For MRI, Legendre polynomials are employed, whereas piecewise linear hat functions are used to interpolate poles and residues.
• After both frequency and parameter greedy loops, the information at the test points is not wasted, but included in the final surrogates, see Remarks 2.2 and 2.12. After the frequency greedy loops, we remove from the frequency surrogates any pole whose distance from Z is larger than 2 GHz, see Remark 2.3.
• The matching weight w is set to 1; the frequency and parameter greedy tolerances are set to 10 −3 and 10 −2 , respectively; the tolerance for unbalanced matching tol synth is set to 3/4.
We show a summary of the results of the offline training in Table 2.
In Figure 10 we compare exact and surrogate models at the randomly chosen point For simplicity, we only show the magnitude of the admittance between ports I and U. The quality of the approximation appears good, with pole locations and residue magnitudes being  Figure 10: On the left, the dashed line represents the surrogate magnitude of the admittance Y IU obtained by double-greedy pMOR at p . On the right, the dashed line represents the admittance magnitude of the closest z-surrogate (at p close ). In both cases, the exact magnitude is denoted by a solid line, and the absolute error by a dotted line.   Heaviside distance no. of observations Figure 12: Histogram of the p-greedy error indicator (measured by HeavisideDistance in Algorithm 5) between pMOR surrogate and local (high fidelity) MRI surrogate at 100 verification parameters, not belonging to the training or test sets of the pMOR surrogate. identified well. In particular, the results seem better than those obtained by using the closest frequency surrogate (i.e., the MRI built at the element of P train closest to p , namely, p close = [0, −0.1, 0, 0, 0, −0.1, 0, 0, 0]).
We look at a more global picture in Figure 11. We let p vary along a diagonal line between the two vertices (0. 1 with θ an auxiliary parameter. We plot the magnitude of the admittance between the two output ports for z ∈ Z and θ ∈ [−1, 1]. Two different models are considered: • The exact full order model (1.1).
In Figure 11, we can observe that the exact admittance has 10 poles (darker lines) in the frequency range, some of which cross, and which, overall, create quite an intricate pattern. Still, the double-greedy pMOR surrogate seems to identify well the behavior of the quantity of interest, at least qualitatively. As could be expected, the quality of the approximation degrades slightly around pole intersections. A similar decrease in the accuracy of the surrogate can be observed also near the boundary of the parameter domain, since the two vertices of P obtained for θ = −1 and θ = 1 are not elements of P train .
To conclude the experiment, we also perform the following verification of the p-greedy tolerance employed to build the surrogate: we select 100 quasi-random (Halton) parameter points in P, outside the training and test sets of the pMOR surrogate. At each such point p, we build a reference frequency surrogate by z-adaptive MRI with the same parameters as above (10 Chebyshev points as starting training set, z-greedy tolerance of 10 −3 ). Note that we only use new snapshots at p to build this reference model. Then we compare this model with the prediction given by our pMOR surrogate, obtained by simply plugging the value of p in (2.3). We make this comparison quantitative through the Heaviside distance (2.8), which is also the metric driving the p-adaptivity, cf. Algorithm 5. We plot the results in the form of a histogram in Figure 12. We can observe that 80% of the verification points lie below the prescribed tolerance and that 99% of them lie below 1.5 times the tolerance. Considering the discrete (sparse grid) nature of the test set, it is reasonable to expect that the tolerance will not be attained everywhere. In this context, we find our results satisfactory since they show that, even in the few cases where the tolerance is not satisfied, the error indicator is still within a small margin of the tolerance.

Conclusions and outlook
We have described the double-greedy pMOR approach for non-intrusive surrogate modeling of parametric problems, which samples adaptively in both frequency space and (potentially highdimensional) parameter domain. In particular, the selection of parameter samples advances by trying to make the surrogate error small over a growing test set. We have illustrated with numerical examples the effectiveness of the method. Notably, we have shown that an accurate identification of number and behavior of poles and residues depends critically on some hyper-parameters (w and tol synth ) and, more generally, on the choice of a good strategy for interpolation over parameter space.
Among the several issues which remain unanswered we can find the following: • Thanks to the degree of freedom provided by tol synth , the proposed heuristic strategy for unbalanced matching is reasonably flexible. Still, it is unclear whether an "optimal" choice of this tolerance exists and, if it does, how to find it for a given application, since it depends on quantities unavailable a priori.
• In Section 3, we have showcased some of the difficulties related to intersecting poles, in particular the potentially discontinuous behavior of residues. Applying the double-greedy approach in this case may yield inadequate surrogates, with the risk of a large number of iterations of the greedy loop. However, we have observed no such issues in our latter two numerical examples, despite multiple pole intersections. This is likely due to the beneficial spectral properties of the full order problems that we considered.
• Issues similar to those discussed in the previous point are also possible (to a larger degree) in the case of multiple poles, even though, as discussed in Section 3, this case is quite unlikely to present itself thanks to numerical noise. On one hand, this problem should be partially addressed by MRI: for instance, when building the frequency surrogate, it should be possible to determine whether two poles are a noisy double pole or a couple of single poles. On the other hand, our pMOR algorithm should be extended to allow dealing with multiple poles, without compromising the overall complexity of the algorithm. Both of these directions are object of ongoing research.