Space-time registration-based model reduction of parameterized one-dimensional hyperbolic PDEs

We propose a model reduction procedure for rapid and reliable solution of parameterized hyperbolic partial differential equations. Due to the presence of parameter-dependent shock waves and contact discontinuities, these problems are extremely challenging for traditional model reduction approaches based on linear approximation spaces. The main ingredients of the proposed approach are (i) an adaptive space-time registration-based data compression procedure to align local features in a fixed reference domain, (ii) a space-time Petrov-Galerkin (minimum residual) formulation for the computation of the mapped solution, and (iii) a hyper-reduction procedure to speed up online computations. We present numerical results for a Burgers model problem and a shallow water model problem, to empirically demonstrate the potential of the method.


Introduction
Several studies have demonstrated the inaccuracy of linear approximation spaces to deal with parameterdependent hyperbolic partial differential equations (PDEs) with parameter-dependent shocks: this challenge hinders the application of parameterized model order reduction (pMOR, [21,37]) techniques to this class of problems. To address the slow decay of the Kolmogorov N -width of the solution manifold associated with the problem of interest [34], several authors have proposed to resort to nonlinear approximations. The goal of this paper is to develop a Lagrangian nonlinear compression method, and associated reduced-order model (ROM) for one-dimensional (systems of) conservation laws: the key element of the approach is a space-time registration procedure to improve the linear reducibility of the solution manifold. In computer vision and pattern recognition, registration refers to the process of finding a transformation that aligns two datasets; in this paper, registration refers to the process of finding a parametric spatio-temporal transformation that improves the linear compressibility of the solution manifold.
We denote by µ the vector of model parameters in the parameter region P ⊂ R P , we denote by Ω ⊂ R 2 the spatio-temporal domain over which the PDE is defined, and we define the Hilbert space X = [L 2 (Ω)] D , where D ≥ 1 denotes the number of state variables, and the Banach space Lip(Ω) of Lipschitz functions over Ω. Then, we introduce the solution U µ to the PDE for a given µ, U µ : Ω → R D , and the solution manifold M := {U µ : µ ∈ P} ⊂ X . Linear compression methods rely on approximations of the form U µ ≈ U µ = Z N α µ , where Z N : R N → X is a linear parameter-independent operator and α : P → R N is a function of the parameters. On the other hand, we might distinguish between Eulerian and Lagrangian nonlinear approximation/compression methods. We do not provide here a comprehensive survey of nonlinear compression methods, but rather cite a few representative approaches.
• Eulerian approaches [26,38,40,49] consider approximations of the form U µ := Z N,µ ( α µ ), where Z N : R N × P → X is a suitably-chosen operator which might depend on the parameter µ and might also be nonlinear in the first argument.
Any Lagrangian method is equivalent to an Eulerian method with Z N,µ (α) := ( Z N α) • Φ −1 µ for some linear operator Z N : R N → X , while the converse is not true. On the other hand, as discussed below, Lagrangian methods naturally fit within the standard framework of projection-based pMOR: this is in contrast with Eulerian approaches, which might require more involved strategies for the online computation of the solution coefficients α µ (see [26]). formulation, we introduce the spatio-temporal domain Ω = (0, L) × (0, T ) and the gradient ∇ := [∂ x , ∂ t ] T . We denote by x = (x, t) a generic element of Ω, and by n the outward normal to ∂Ω.
Given the reference domain Ω ⊂ R 2 , we introduce the parameterized mapping Φ : Ω × P → Ω; we denote by X a generic element of Ω and we define the mapped gradient ∇ = [∂ X1 , ∂ X2 ] T . We further define the Jacobian matrix G µ := ∇Φ µ and determinant g µ := det( ∇Φ µ ), which is assumed to be strictly positive over Ω. In this work, we consider bijections from Ω into itself: for this reason, we replace Ω with Ω.
We define the Hilbert space X = [L 2 (Ω)] D where D is the number of state variables. We denote by (·, ·) the L 2 (Ω) inner product, (w, v) = Ω w · v dx, and by · = (·, ·) the corresponding induced norm. Given the linear space W ⊂ X , Π W : X → W denotes the projection operator onto W. We further denote by e 1 , . . . , e N the canonical basis in R N and by · 2 the Euclidean norm. Given the linear operator Z N : R N → X , we define ζ n = Z N e n for n = 1, . . . , N and we define the space Z N := span{ζ n } N n=1 ; in the remainder, we shall assume that {ζ n } N n=1 is an orthonormal basis of Z N . We further introduce the relative best-fit error E bf , which is used to assess the performance of data compression. Given the reduced space Z N ⊂ X and µ ∈ P, we define We further define the "registered" best-fit error as Note that the latter expression is convenient for finite element calculations -since it avoids the computation of the inverse map Φ −1 µ in all quadrature points -and is used in the numerical results. We use the method of snapshots (cf. [42]) to compute POD eigenvalues and eigenvectors. Given the snapshot set {U k } ntrain k=1 ⊂ M and the inner product (·, ·), we define the Gramian matrix C ∈ R ntrain,ntrain , C k,k = (U k , U k ), and we define the POD eigenpairs {(λ n , ζ n )} ntrain n=1 as Cζ n = λ n ζ n , ζ n := ntrain k=1 (ζ n ) k U k , n = 1, . . . , n train , with λ 1 ≥ λ 2 ≥ . . . λ ntrain = 0. In our implementation, we orthonormalize the modes, that is ζ n = 1 for n = 1, . . . , n train . To stress dependence of the POD space on the choice of the inner product, we use notation L 2 -POD if (·, ·) = (·, ·) L 2 (Ω) , and · 2 -POD if (·, ·) is the Euclidean inner product. Finally, we shall choose the size N of the POD space based on the criterion where tol pod > 0 is a given tolerance.

High-fidelity discretization
Our reduced-order formulation relies on a high-fidelity (hf) DG finite element (FE) discretization; we refer to the textbook [22] for an introduction to DG methods for conservation laws. We denote by T hf = {D k } Ne k=1 a non-overlapping, straight triangulation of Ω and we denote by ∂T hf = {f i } N f i=1 the set of facets of the mesh. We denote by N + the positive normal to a given facet in ∂T hf : N + coincides with the outward normal on ∂Ω and is chosen arbitrarily for interior facets. We further define the negative normal N − = −N + . Then, we define the DG FE space of order p, where P p denotes the space of two-dimensional polynomials of total degree at most p. Given w ∈ X hf and X ∈ ∂T hf , we define w ± (X) = lim →0 + w(X − N ± ). We denote by {ϕ d i,k = ϕ i,k e d } i,k,d the Lagrangian basis of the space X hf , with i = 1, . . . , n lp = p(p+1) 2 , k = 1, . . . , N e , d = 1, . . . , D, and we define N hf = dim(X hf ) = n lp N e D; to shorten notation, we might also use the linear indexing ϕ j := ϕ d i,k = ϕ i,k e d where j = j i,k,d = i + (k − 1)n lp + (d − 1)n lp N e . Given w ∈ X hf , we denote by w ∈ R N hf the corresponding FE vector, w(·) = j (w) j ϕ j (·). With some abuse of notation, given the functional F ∈ X hf , we denote by F ∈ R N hf the corresponding FE vector such that F j = F (ϕ j ), for j = 1, . . . , N hf .
In view of the definition of the ROM, we introduce the discrete L 2 and H 1 norms where H d (·, ·; 1, N) is a suitable diffusion flux, here associated with the diffusion matrix K = 1. In this work, we consider the BR2 flux introduced in [3] (see also [2]). We refer to [1] for a detailed discussion concerning the analysis of DG formulations for second-order elliptic problems. In the following, with some abuse of notation, we denote by X hf the linear space (4) equipped with the discrete L 2 norm v := v T X hf v, and we denote by Y hf the linear space (4) equipped with the discrete H 1 norm |||v||| := v T Y hf v: note that X hf and Y hf coincide as linear spaces but are different Hilbert spaces. We denote by R Y hf : Y hf → Y hf the Riesz operator in Y hf .

Space-time formulation of conservation laws
In this section, we omit dependence on the parameter µ for notational brevity. We consider a general system of one-dimensional conservation laws: where U : Ω → R D is the vector of conserved variables, f : R D → R D is the physical flux, and S : R D → R D is the source term. The problem is completed with suitable boundary conditions, which depend on the number of incoming characteristics. We provide two examples of problems of the form (6) at the end of this section. We remark that the solution U may contain discontinuities and might not be unique: we here seek U satisfying (6) away from discontinuities and satisfying suitable Rankine-Hugoniot and entropy conditions at discontinuities, [27]. We recast (6) as ∇ · F (U ) = S(U ) in Ω where The problem is completed with suitable boundary conditions on {0, L} × (0, T ), which depend on the number of incoming characteristics. Note that at Γ in,0 all D characteristics are incoming and at (0, L)×{T } all D characteristics are outward: this implies that (7) requires the prescription of the full initial datum at t = 0 and does not require any condition at t = T , consistently with (6). Note that, for a proper choice of F , (7) encapsulates two-dimensional steady conservation laws and unsteady one-dimensional conservation laws. We recast (7) on a reference domain: this will lead to the variational formulation exploited in section 4 for model reduction. Given the mapping Φ : Ω → Ω, recalling standard change-of-variable formulas (cf. Appendix A), we obtain that the mapped solution field U = U • Φ satisfies where We recall that G, g are the Jacobian matrix and determinant defined in the introduction.
Remark 2.1. Arbitrary Lagrangian Eulerian (ALE, [23,14]) methods involve the use of space-time mappings of the form Φ(X, t) = [Φ 1 (X, t), t]: this class of mappings allows the use of time-marching schemes to solve (7). Note, however, that the deformation that can be achieved using this class of mappings is relatively modest: in particular, for any given time t > 0, the mapped solution U has the same number of discontinuities as U , possibly at different locations. In the numerical results of section 5 and Appendix E, we empirically show that the possibility of "moving" shock waves in space and time is key to improve the linear reducibility of the mapped manifold.

High-fidelity space-time formulation
We discretize (8) using a high-order nodal DG method. In presence of shocks and other discontinuities, highorder schemes for hyperbolic PDEs require specific stabilization techniques to avoid instabilities. In this work, we resort to the sub-cell shock capturing method based on artificial viscosity proposed in [36]. More in detail, we consider the piecewise-constant artificial viscosity where ε 0 > 0 is a positive constant, s(U ) is a suitable scalar function of the state, Π p−1 : P p → P p−1 is the projection onto the space of polynomials of total degree p − 1, and In this work, we consider s(U ) = U for Burgers equation and s(U ) = h for the shallow water equations (here, h is the flow height, see section 2.3.2); representative values of the constants in (9) considered in the numerical simulations are s 0 = −2.5, κ = 1.5, 0 = 10 −2 , ε 0 = 5 · 10 −4 .
We have now the elements to introduce the DG discretization of (8): find U hf ∈ X hf such that where the variational discrete operator R Φ : X hf × X hf → R is the sum of the convection and diffusion contributions. Here, R c Φ is given by where H Φ is the numerical convective flux. Following [55], we choose H Φ such that where H is a standard numerical flux in the physical domain. Note that for piecewise-linear maps (11) is equivalent to the "Eulerian" DG convective term associated with the numerical flux H, and with the triangulation In the numerical examples of this paper, we resort to the local Lax-Friedrichs (Rusanov) flux: The diffusion form R d is defined as where H d is the BR2 diffusion flux associated with the diffusion matrix K = ε(U )1. Note that we consider an artificial-diffusion form that is independent of the mapping Φ.
Remark 2.2. The space-time formulation of the conservation law (6) discussed here provides the foundations for the projection-based ROM proposed in section 4. Despite the recent advances in space-time solvers for conservation laws, we envision that the approach might not be feasible for high-fidelity calculations. In this case, we might employ a third-party time-marching solver to generate the space-time snapshots and then use the spacetime formulation exclusively for ROM calculations. We refer to a future work for the integration between an external solver and the space-time ROM. We further remark that other choices for the convection and diffusion fluxes and for the artificial viscosity are available: we refer to the DG literature for thorough discussions and comparisons.

A Burgers model problem
We consider the Burgers equation: where L = 1, T = 0.8, and   Figure 1 shows the behavior of U over Ω for two values of µ. The solution is characterized by the transition between the three "prototypical" behaviors depicted in Figure 2: for small values of t, the solution exhibits two shock waves (cf. Figure 2 Figure  2(b)); for large values of t, the solution is nearly constant over (0, L) (cf. Figure 2(c)). Despite its simplicity, this problem is extremely challenging for model reduction techniques: linear methods (i.e., methods based on linear approximation spaces) require a large number of modes to correctly represent the solution; on the other hand, to our knowledge, the transition from two shocks to one shock and from one shock to the smooth solution poses fundamental challenges for several nonlinear proposals: see the discussion in [40].

A shallow-water model problem
We consider a transient shallow water (Saint Venant) flow over a bump in a frictionless channel. We here denote by U = [h, q] T the vector of conserved variables, where q = hu is the discharge, h is the flow height, and u is the flow x-velocity; then, we introduce the flux f (U ) := [q, hu 2 + g 2 h 2 ] T where g = 9.81 is the gravity acceleration; we further introduce the (parameter-independent) bathymetry b such that We consider the system of Saint-Venant equations: Note that the parameter µ 1 influences the pick of the incoming discharge, while µ 2 affects the time scale and the integral Finally, U bf corresponds to the limit solution for t → ∞ of the PDE: In Figures 3 and 4, we show the behavior of the free surface z = h + b for two values of the parameter µ in P. We observe that for t ≈ 1.5 the incoming wave associated with the inflow boundary condition interacts with the bump; for sufficiently large values of µ 1 , we also observe a backward-propagating wave generated by the interaction between the incoming wave and the bump.

Data compression (RePOD)
We denote by W hf = span{ϕ hf m } M hf m=1 a M hf -dimensional space contained in Lip(Ω; R 2 ); following [45], given the identity function id(X) = X for all X ∈ Ω, we seek mappings of the form We refer to ϕ as to parameterized displacement, and we denote by C hf a subset of W hf such that Φ = id + ϕ is bijective from Ω in itself for all ϕ ∈ C hf . We devise a computational procedure that takes as input the snapshots {U k } ntrain k=1 ⊂ M and returns (i) the linear operators Z N : R N → X and W M : (1), and (ii) the coefficients {α k } ntrain k=1 ⊂ R N and Towards this end, we proceed as follows. 1. In section 3.1, we present guidelines for the definition of the space W hf and we provide sufficient and computationally-feasible conditions for the bijectivity of Φ. The discussion exploits results first presented in [45] and is here reported for completeness.
2. In section 3.2.1, given a dictionary of functions -here referred to as template space -T N ⊂ L 2 (Ω) and the field s k , we present a general optimization-based procedure for the construction of a mapping Φ k such thats k = s k • Φ k is well-approximated by elements in T N . Our approach is a generalization of the optimization-based technique in [45]; it exploits the theoretical results of section 3.1 to effectively enforce the bijectivity of the mapping. The field s k is a suitable function of the k-th snapshot, s k = s(U k ), that will be introduced below.
3. In section 3.2.2, we present a greedy procedure for the adaptive construction of the template space T N . Our greedy approach returns the mappings {Φ k } k for all training points and the low-dimensional operator 4. In section 3.3, we finally apply POD to the mapped snapshots { U k = U k • Φ k } k to obtain the reduced operator Z N : R N → Z N ⊂ X and the solution coefficients α 1 , . . . , α ntrain ∈ R N .

Affine mappings
Next Proposition provides the mathematical foundations for the registration algorithm discussed below.
It can be shown that mappings satisfying (16) and (17) map each edge of the rectangle in itself and each corner in itself. We here enforce condition (16) for all elements of the search space W hf : more precisely, we where { i }M i=1 are the firstM Legendre polynomials in (0, 1) and M hf = 2M 2 . Clearly, other choices satisfying (16) (e.g., Fourier expansions) might also be considered. On the other hand, condition (17) is difficult to impose computationally and should be replaced by a computationally feasible surrogate. We propose to replace (17) with the approximation where ∈ (0, 1); we further define the subset C hf of W hf as Provided that ϕ is sufficiently smooth, condition (19) enforces the bijectivity of the mapping: more precisely, given ∈ (0, 1) and C > 0 there exist δ, C exp > 0 such that if ϕ belongs to The constant ∈ (0, 1) can be interpreted as the maximum allowed pointwise contraction induced by the mapping Φ and by its inverse. On the other hand, we observe that the constant δ should satisfy so that ϕ = 0 is admissible. In all our numerical examples, we choose

Optimization-based registration
We first present the registration procedure for a single field. Given the set C hf in (20), we introduce the template space The functional f k (ψ, Φ) = f ψ, Φ, µ k -which is here referred to as proximity measure -is given by Here, s : X → L 2 (Ω) is a suitable registration sensor that will be introduced below. On the other hand, the H 2 seminorm is given by |v| 2 in (23a), which was introduced in (19), weakly enforces that g ∈ [ , 1/ ] and thus that Φ is a bijection from Ω into itself for all admissible solutions to (23a). We observe that, compared to [45], we here optimize with respect to both displacement, ϕ, and template ψ. Rather than minimizing the distance from a template fieldψ in the mapped configuration, we here minimize the best-fit error from a given linear space: in our experience, the statement in (23a) outperforms the one in [45] for fields with several local extrema for which a one-dimensional template might not suffice. Note that the optimal solution (ψ k , ϕ k ) to (23) for moderate values of N , we empirically find that the cost of solving (23) is comparable to the cost of solving the statement in [45].
Since |v| H 2 (Ω) = 0 for all linear polynomials, we find that the penalty term measures deviations from linear maps; in particular, we find that mapping and displacement have the same H 2 seminorm, that is id+ϕ H 2 (Ω) = ϕ H 2 (Ω) . Due to the condition Φ(Ω) = Ω, we might further interpret the penalty as a measure of the deviations from the identity map. The penalty in (23a) can be further interpreted as a Tikhonov regularization, and has the effect to control the gradient of the Jacobian g -recalling the discussion in section 3.1, the latter is important to enforce bijectivity. The hyper-parameter ξ balances accuracy -measured by f -and smoothness of the mapping.
is the bilinear form associated with | · | H 2 (Ω) . Since ϕ hf 1 , . . . , ϕ hf M hf are polynomials, computation of the entries of A reg is straightforward. Finally, since the registration problem is non-convex in ϕ, careful initialization of the iterative optimization algorithm is important: we refer to [45, section 3.1.2] for further details.
Remark 3.1. Choice of the registration sensor s. As for shock capturing methods (e.g., [36]), the choice of the sensor s is important to correctly capture relevant features associated with the solution field. In this work, we consider s(U ) = U for the Burgers equation and s(U ) = h for the shallow water equations. For nearly discontinuous fields, we empirically found that filtering might improve the robustness of the registration procedure. In this work, we resort to a spatial moving average filter based on the Matlab routine smooth.

Parametric registration
In Algorithm 1, we propose a Greedy procedure to iteratively build a low-dimensional approximation space W M ⊂ W hf for the displacement field and the template space T N . Here, the routine takes as input the spaces W M , T N , the target snapshot U and returns a local minimum (ϕ , ψ ) to (23a) and the corresponding value of the proximity measure f N, , tol pod , (·, ·) takes as input the optimal displacement fields obtained by repeatedly solving (23a) for different target snapshots, the tolerance tol pod > 0, and the inner product (·, ·) and returns the POD space associated with the first M modes W M = span{ϕ m } M m=1 , where M is chosen according to (3), and the vectors of coefficients {a k } k such that a k m = (ϕ m , ϕ ,k ) . As in [45], we consider the inner product If W hf = ∅, Algorithm 1 reduces to the well-known strong-Greedy algorithm (see, e.g., [5]) for the manifold M s = {s(U µ ) : µ ∈ P}. In our experience, for moderate values of N max , the offline cost is dominated by the cost of performing the first iteration: POD indeed effectively leads to an approximation space W M of size M M hf and ultimately simplifies the solution to the optimization problem for the subsequent iterations.

Algorithm 1 Registration algorithm
Hyper-parameters: tol pod (cf. (3)), Cexp, δ, (cf. (19)), Nmax maximum number of iterations, tol tolerance for termination condition, (·, ·) mapping inner product (cf. (24)). if max k f ,k N,M < tol then, break 6: else 7: We remark that our approach requires the definition of the initial template space T N =N0 : in this work, we use T N0=1 = span{U µ=μ } for the Burgers equation, and we consider T N0=2 = span{h µ=μ , h bf } for the shallowwater model problem, whereμ denotes the centroid of P, and h bf = (U bf ) 1 denotes the initial height. For the Burgers equation, we empirically found that our Greedy procedure weakly depends on the choice of the initial template T N =1 . On the other hand, for the shallow-water problem, the use of a two-dimensional template is important for accuracy. Nevertheless, if compared to [45], our empirical findings suggest that the Greedy procedure significantly reduces the sensitivity of the algorithm with respect to the initial choice of the template, which represented an issue of the original proposal (cf. [45, Fig. 7]).

POD compression
Given the mappings {Φ k = id + ϕ k } ntrain k=1 , we define the mapped snapshots U k := U k • Φ k for k = 1, . . . , n train . Then, we apply POD based on the L 2 inner product to generate the space Z N = span{ζ n } N n=1 and the coefficients {α k } ntrain k=1 such that (α k ) n = (ζ n , U k ) for n = 1, . . . , N and k = 1, . . . , n train . The dimension N is chosen according to (3). Note that application of POD requires the definition of all mapped snapshots on a parameterindependent spatio-temporal mesh.
We observe that, although our data compression procedure is applied to a specific class of problemshyperbolic conservation laws with parameter-dependent discontinuities -our approach is general, that is, independent of the underlying mathematical model. Given the space-time snapshots {U k = U µ k } ntrain k=1 ⊂ M, our data compression procedure returns (i) the linear operators Z N : R N → X and W M : where the user-defined tolerance tol pod > 0 serves to choose the cardinalities N, M of the linear operators Z N , W M .

Projection-based reduced-order model
In section 3, we discussed how to generate the operators Z N , W M associated with (1) based on the snapshots {U k = U µ k } k , and how to compute (quasi-)optimal values for the solution/mapping coefficients for all training points, {α k } k , {a k } k . In this section, we address the problem of predicting solution and mapping coefficients for a new value of the parameter µ ∈ P: as anticipated in the introduction, we resort to a kernel-regression algorithm to predict the mapping coefficients, while we resort to minimum residual projection to predict the coefficients associated with the estimate U of the mapped solution. To clarify the presentation, we here focus on the main features of the formulation and we defer to the appendix for a thorough discussion of several technical aspects.

Non-intrusive construction of the mapping Φ
Algorithm 1 returns the space W M = span{ϕ m } M m=1 and the mapping coefficients {a k } ntrain k=1 . As in [45], we apply a multi-target regression procedure based on radial basis function (RBF, [50]) approximation to compute predictors of the mapping coefficients for all µ ∈ P: Each entry of a is built separately based on the datasets D m = {(µ k , a k m )} ntrain k=1 , m = 1, . . . , M , a k m := a k m . To assess the goodness of fit of the regression model, we compute an estimate of the out-of-sample R-squared (e.g., [39,Chapter 14]) using cross-validation. We recall that given training and test sets {(µ k , a k m )} ntrain k=1 and {(µ j , a j m )} ntest j=1 , the R-squared is defined as To reduce the risk of over-fitting, we only keep coefficients for which R 2 m ≥ R 2 min = 0.75. We remark that the mapping in (25) is not guaranteed to be bijective for all µ ∈ P, particularly for small-to-moderate values of n train : as discussed in [45], this represents a major issue of the proposed approach and is the motivation to consider intrusive methods to simultaneously learn mapping and solution coefficients.

Projection-based ROM for the solution coefficients
and the Jacobian J hf N : R N × P → R N hf ,N such that where J hf (U, µ) ∈ R N hf ,N hf is the high-fidelity Jacobian associated with a given field U ∈ X hf and the parameter µ, and DR Φµ [U ] : X hf × X hf → R is the Frchet derivative of R Φµ at U . We present below several projection-based statements: to clarify the approaches we report both the variational and the algebraic formulations.
We have now the elements to introduce the Galerkin ROM: find U µ = Z N α µ ∈ Z N such that Similarly, we introduce the minimum residual ROM: where w 2 In the numerical results, we demonstrate the superiority of the minimum residual ROM (29) compared to the Galerkin ROM (28).
In order to devise an online-efficient ROM, we first introduce the approximate minimum residual statement: where Y J = span{ψ j } J j=1 ⊂ Y hf is referred to as empirical test space. If (ψ j , ψ i ) Y hf = δ i,j , it is possible to verify that the solution coefficients α µ satisfy Then, following [53], we replace the truth residual in (30) with the EQ residual where I eq ⊂ {1, . . . , N e } is a subset of the mesh elements and ρ 1 , . . . , ρ Ne ≥ 0 are a set of non-negative weights.
In conclusion, we obtain the hyper-reduced approximate minimum residual ROM: with R eq N,J : R N × P → R J , R eq N,J (α, µ) j = R eq Φµ (Z N α, ψ j ). In the next two sections, we discuss how to construct the test space Y J and how to compute the empirical quadrature rule.
Remark 4.1. Online efficiency. Computation of R eq N,J and its Jacobian J eq N,J can be performed efficiently, provided that |I eq | N e ; furthermore, since (31) is a nonlinear least-squares problem, we can resort to the Gauss-Newton method to efficiently compute the solution. More in detail, computation of {r c,µ k (Z N α, ψ j ) + r d,µ k (Z N α, ψ j )} j for a given k ∈ I eq requires the storage of ζ 1 , . . . , ζ N , ψ 1 , . . . , ψ J in the k-th element and in its neighbors. Note that if Z N , Y J ⊂ C(Ω), computation of r c,µ k , r d,µ k only depends on the value of trial and test functions in the k-th element (see Appendix B): continuous approximations of trial and test spaces thus allow quite significant reductions in online memory and computational costs. In the numerical results, we investigate the accuracy of continuous approximations for the two model problems. The continuous trial space Z N is obtained by applying POD to continuous approximations of the mapped snapshots { U k } k : in our implementation, the continuous approximation is computed by simply averaging over facets.

Construction of the empirical test space
It is possible to verify that the solution U µ to (29) satisfies Note that the algebraic representation Y opt N,µ of the space Y opt N,µ satisfies Y opt N,µ = Y −1 hf J hf ( U µ , µ)Z N . For this reason, we propose to choose the test space Y J to approximate the manifold M test,N = µ∈P Y opt N,µ . In Appendix C, we rigorously justify our choice by presenting a detailed analysis for the linear case; in Algorithm 2, we summarize the computational procedure for the construction of the test space employed in our code. We remark that problem-adapted test spaces have been first considered in [13] for linear problems, and more recently in [12]: a thorough comparison with [12,13] is beyond the scope of the present paper.
1: for k = 1, . . . , n train , n = 1, . . . , N do 2: Remark 4.2. Continuous approximation. As discussed in Remark 4.1, for computational reasons, it might be convenient to consider a continuous test space Y J . This can be achieved by performing POD over continuous approximations of the test functions {ψ k,n } k,n . As for the trial space, the continuous approximation is computed by simply averaging over facets.

Construction of the empirical quadrature rule
We denote by ρ eq ∈ R Ne + a vector of positive weights associated with (31) and we denote by I eq the set of indices k ∈ {1, . . . , N e } such that ρ eq k > 0. We seek ρ eq ∈ R Ne + such that 1. the number of nonzero entries in ρ eq is as small as possible;

the constant function is approximated correctly,
3. for all µ ∈ P train = {µ k } ntrain k=1 , the empirical residual satisfies Here, α train µ is chosen equal to the projection, that is Z N α train The constant function constraint -which was considered in [53] -is empirically found to improve the accuracy of the EQ procedure when the integral is close to zero due to the cancellation of the integrand in different parts of the domain. The accuracy constraint in (34) is an adaptation of the manifold accuracy constraints in [53] to minimum residual ROMs and is motivated by the error analysis in Appendix D. As shown in [53], the hyperreduced system inherits the stability of the DG discretization: (i) energy stability for linear hyperbolic systems, (ii) symmetry and non-negativity for steady linear diffusion systems, and hence (iii) energy stability for linear convection-diffusion systems. It is easy to verify that (33) and (34) could be rewritten in matrix form as where G const = [|D 1 |, . . . , |D Ne |], and G µ ∈ R N,Ne , b µ ∈ R N are a suitable matrix and vector, whose explicit expressions can be derived exploiting the same argument as in [16,44,53]. The problem of finding ρ eq can thus be reformulated as a sparse-representation (or best-subset selection) problem: for suitable choices of the vector norm · , and the tolerance δ > 0. Here, · 0 is the 0 -norm, which counts the number of nonzero entries. In this work, we resort to the nonnegative linear least-squares method considered in [16] to obtain approximate solutions to (35): more in detail, we rely on the Matlab function lsqnonneg which implements a variant of the Lawson and Hanson active set iterative algorithm [25]. Note that Yano in [53] relies on 1 relaxation to obtain approximate solutions to (35): a thorough comparison between the two methods is beyond the scope of this work.

Summary of the offline/online computational procedure
We conclude this section by summarizing the offline/online computational decomposition. During the offline stage, given the snapshots {U k = U µ k } k , we proceed as follows: 1. we apply RePOD to obtain the low-dimensional operators Z N , W M and the training coefficients {α k } k ⊂ R N and {a k } k ⊂ R M (cf. section 3); 2. we apply kernel regression to obtain the predictor a : P → R M for the mapping coefficients (cf. section 4.1); 3. we perform hyper-reduction and we build the ROM for the solution coefficients (cf. section 4.2).
Then, during the online stage, given a new value of the parameter µ ∈ P, 1. we evaluate the regression model, µ → a µ ; 2. we query the ROM to estimate the solution coefficients α µ .
Note that the online cost is independent of the dimension of the hf space.

Numerical results
We illustrate here the numerical performance of the proposed approach for the two model problems introduced at the end of section 2. In Appendix E, we present further investigations of the data compression approach.

Burgers equation
In Figure 5, we illustrate performance of space-time registration. Here, the mapping is generated based on n train = 200 snapshots through Algorithm 1 with N max = 3, ξ = 10 −4 , M hf = 128, tol pod = 10 −4 . The resulting map consists of a three-term expansion (M = 3). In Figure 5(a), we show the behavior of normalized POD eigenvalues associated with the unregistered and registered space-time snapshots, while Figure 5(b) shows the out-of-sample maximum relative projection error with and without registration E bf,∞ = max j=1,...,ntest E bf (µ j ), where µ 1 , . . . , µ ntest iid ∼ Uniform(P) (cf. (2)). Note that for N ≥ 4, projection error is less than 10 −2 and is comparable with the discretization error. We conclude that the space-time registration procedure dramatically improves the linear reducibility of the space-time solution manifold. In Figure 6, we assess performance of the Galerkin ROM (28), the minimum residual ROM (29), and the approximate minimum residual ROM (30) for three different choices of the test space Y J for each value of N ; in all cases, we do not perform hyper-reduction. In Figure 6(a), we consider continuous trial and test spaces (cf. Remarks 4.1 and 4.2), while in Figure 6(b), we consider discontinuous trial and test spaces. On the y-axis, we here report the average relative out-of-sample L 2 error in the reference configuration: where the superscript hf emphasizes the fact that the ROM relies on the hf quadrature rule (hf ROM). We observe that minimum residual projection is superior to Galerkin projection in terms of performance; furthermore, our approximate minimum residual ROM approaches exact minimum residual for J 2N . Finally, we observe that the continuous approximation introduces an additional error that is negligible for N ≤ 5. In Figure 7, we illustrate performance of the hyper-reduction procedure; here, we consider empirical test spaces of size J = 2N . In Figure 7(a), we show the number of sampled elements Q for several choices of N for two different tolerances (the total number of elements is equal to N e = 2616): we observe that the number of sampled elements grows linearly with N , and ranges from 1% to 4% of the total number for tol = 10 −8 , and from 1% to 10% for tol = 2.5 · 10 −11 . Here, the tolerance tol is a lower bound on the size of a step: the active-set iterative procedure terminates at the k-th iteration if ρ eq,k+1 − ρ eq,k 2 ≤ tol. In Figure 7(b), in the case of continuous approximations, we show the relative L 2 error for the hyper-reduced ROM, and we compare it with the error E hf avg . Finally, in Figure 8, we show the mesh and the reduced meshes for two choices of trial and test spaces: we observe that most sampled elements are located in the proximity of the shock.

Shallow water equations
In Figure 9, we illustrate performance of space-time registration. The mapping is generated based on n train = 100 snapshots through Algorithm 1 with N max = 5, ξ = 10 −4 , M hf = 128, tol pod = 10 −4 . We recall that the algorithm is applied to the filtered height h f µ and that the initial template space is set equal to T N0=2 = span{h fμ , h f bf }. The resulting map consists of a five-term expansion (M = 5). Figure 9(a) shows the behavior of the L 2 POD eigenvalues associated with the snapshots {U µ k } ntrain k=1 and with the mapped snapshots { U µ k } ntrain k=1 ; Figure 9(b) shows the behavior of the out-of-sample relative projection error based on n test = 20 snapshots (2)). We observe that the approach is extremely effective to reduce the linear complexity of the solution manifold for moderate values of N . In Figure 10, we illustrate performance of projection-based ROMs without hyper-reduction. Similarly to the Burgers equation (cf. Figure 6), we empirically find that Galerkin projection might lead to instabilities; on the other hand, our approximate minimum residual approach is effective, provided that the size of the test space satisfies J 2N . Furthermore, the continuous approximation introduces an additional error that is negligible for N ≤ 7.
In Figure 11, we illustrate performance of the hyper-reduction procedure. Figure 11(a) shows the number of sampled elements Q for several choices of N , J = 2N , and two different tolerances (the total number of elements is equal to N e = 2364). Note that as for the Burgers model problem the number of sampled elements grows linearly with N . Figure 11(b) shows the behavior of the relative L 2 error (37) for the hyper-reduced ROM, and we compare it with the error of the ROM based on the truth quadrature. Note that for tol = 2.5 · 10 −11 the hyper-reduced ROM guarantees the same accuracy as the non-hyper-reduced ROM, for all values of N considered.

Conclusions
In this work, we developed and numerically validated a model reduction procedure for hyperbolic PDEs in presence of shocks. The approach relies on a general (i.e., independent of the underlying PDE model) data compression procedure: given the snapshot set, we first perform space-time registration to "freeze" the position of the shock; then, we resort to POD to approximate the registered (mapped) field. To estimate the registered  field, we resort to an hyper-reduced approximate minimum residual formulation: our statement is based on the introduction of a low-dimensional empirical test space [44] and of an empirical quadrature rule [16,53] to reduce online assembling costs. We aim to extend the approach in several directions. In this work, we resorted to a non-intrusive method for the computation of the mapping coefficients, and to an intrusive (projection-based) method for the computation of the registered solution: in the future, we aim to combine our data compression procedure with fully-intrusive ROMs and non-intrusive ROMs. Fully-intrusive approaches based on projection (see [31]) might help us devise robust ROMs based on moderate-dimensional snapshot sets. Furthermore, if complemented by reliable a posteriori error indicators, they might also lead to the development of adaptive sampling algorithms to dramatically reduce offline training costs. On the other hand, non-intrusive techniques (see [11,17,20]) might be considerably easier to implement and also to integrate with existing codes, and -at the price of larger training costs -might contribute to reduce online costs.
We also wish to investigate the performance of the proposed model reduction technique to a broader class of PDE models in computational mechanics, in one and more dimensions. In this respect, we wish to consider twodimensional steady and unsteady advection-dominated problems that arise in incompressible and compressible fluid mechanics applications. Furthermore, we envision that our approach might be of interest for solid mechanics applications such as contact problems.

B Online residual calculations
We provide some details concerning the online calculation of the residual; we further explain why the CG approximation reduces the memory cost of the ROM. As in the main body of the paper, we denote by I eq ⊂ {1, . . . , N e } the sampled elements over which we perform online integration, and we define X cg hf = X hf ∩ [C(Ω)] D . We also denote by U hf the scalar DG FE space of order p such that X hf = [U hf ] D . We define the average operator {·}, the normal vector average {·} n , and the jump operator J (·) such that We further denote by Γ i D ⊂ ∂Ω the Dirichlet boundary for the i-th component of the solution field, i = 1, . . . , D; and we introduce the Dirichlet operators where w ∈ X hf and v ∈ U hf . Finally, given the facet ∂D k , = 1, 2, 3, k = 1, . . . , N e , we introduce the lifting operator r ,k : Recalling the expression of R c Φ and R d , we find that R c,eq for all w, v ∈ X hf , where η = 3, δ D i = 1 2 on interior facets, δ D i = 1 on Γ i D and δ D i = 0 on ∂Ω \ Γ i D . Then, exploiting the consistency of the numerical flux, and the fact that if w is continuous, w + = w − on interior facets, we obtain and Note that the computation of r c k (w, v) and r d k (w, v) in (43) requires the knowledge of w, v only in D k and can be performed using element-wise residual evaluation routines implemented in many DG codes.

C Approximate minimum residual: analysis of the linear case
We study the performance of the approximate minimum residual (AMR) formulation for linear inf-sup stable problems: find u ∈ X : where (X , · = (·, ·)) and (Y, |||·||| = ((·, ·))) are suitable Hilbert spaces, and A and F are a bilinear and a linear form, A ∈ L(X , Y ), F ∈ Y . We denote by γ and β the continuity and inf-sup constants associated with the form A: Note that AMR reduces to Galerkin for Y J = Z N , while AMR reduces to minimum residual for Y J = Y.
In view of the analysis, we introduce the reduced inf-sup constant furthermore, we introduce the supremizing operator S : Z N → Y such that S(ζ) = R Y A(ζ, ·), that is and the constant where Y opt N = {S(ζ) : ζ ∈ Z N }. Note that Y opt N is the linear counterpart of the space in (32): the constant δ test n,m measures the proximity between the test space Y J and the optimal test space Y opt N . Next Proposition contains the key results of this section. In particular, we observe that the performance depends on the behavior of δ test N,J and thus on the proximity between Y J and Y opt N : this motivates the sampling strategy in Algorithm 2.
Proposition C.1. If β, β N,J > 0, the solutionû to (45) exists and is unique. Furthermore, the following hold: Proof. We first observe that any solution to (45) satisfies (the proof is straightforward): where φ J n satisfies ((φ J n , v)) = A(ζ n , v) for all v ∈ Y J , n = 1, . . . , N . Then, we observe that A(ζ, ψ) = A(ζ, Π Y Y N,J ψ) for all ζ ∈ Z N and ψ ∈ Y J , where Π Y Y N,J : Y → Y N,J denotes the projection operator on Y N,J with respect to the Y norm. As a result, we find that the inf-sup constant β N,J associated with (49) satisfies In conclusion, exploiting a standard argument for inf-sup stable problems, we find that the solutionû to (49) -and thus the solution to (45) -is unique and û ≤ 1 β N,J F Y , which is (48a). In order to prove (48b), we first exploit the argument in [51] to show that Towards this end, we define the operator S : X → Z N , S(w) := arg min ζ∈Z N A(ζ − w, ·) Y J . Note that u = S(u ). Clearly, S(ζ) = ζ for all ζ ∈ Z N : this implies that S is idempotent, that is S(S(w)) = S(w) for all w ∈ X . Therefore, exploiting a standard result in Functional Analysis (see, e.g., [43]), we have S L(X ,X ) = 1 − S L(X ,X ) . Furthermore, recalling (48a), we find In conclusion, we obtain, for any ζ ∈ Z N , which is the desired result. Note that in the second identity we used the fact that (1 − S)ζ = 0 for all ζ ∈ Z N . It remains to prove that β N,J ≥ δ test N,J β. Recalling the definition of the supremizing operator S and the projection theorem, we find Then, exploiting the previous estimates, we find which is the desired result.

D Derivation of the accuracy constraints (34)
We illustrate how to apply the Brezzi-Rappaz-Raviart (BRR, [6,9]) theory to estimate the error between the solution U hf to (30) and the solution U to (31), E eq = U hf − U = α hf − α 2 . We omit the dependence on µ for notational brevity. First, we present the following Lemma (see [53,Lemma 3.1]).
Lemma D.1. We introduce the C 1 function N : R N → R N , α ∈ R N such that the Jacobian DN (α) ∈ R N,N is non-singular, and constants , γ and L(r) such that Suppose that 2γL(2γ ) ≤ 1. Then, for all β ≥ 2γ such that γL(β) < 1, there exists a unique solution α that satisfies N (α ) = 0 in the ball of radius β centered in α. Furthermore, we have We apply Lemma D.1 to analyze the quadrature error E eq . Towards this end, we define Clearly, any α satisfying N (α ) = 0 is a stationary point of the objective function in (30): as a result, if α satisfies the hypotheses of Lemma D.1 with N as in (52), there exists a unique solution α such that N (α ) = 0 in a neighborhood of α and α − α 2 ≤ 2γ N ( α) 2 .
Since J eq N,J ( α) T R eq N,J ( α) = 0, by straightforward manipulations, we find that . This estimate shows that the residual N ( α) 2 is controlled by the quadrature errors (I) and (II). Note that (II) corresponds to the accuracy constraint (34); on the other hand, we choose to exclude the constraints associated with the Jacobian. The reason is twofold: first, controlling (I) requires N Jn train additional constraints and is thus expensive for offline calculations; second, (I) is multiplied by the empirical residual R eq N,J ( α) 2 , which is expected to be small for J = O(N ).

E Further investigations on data compression E.1 Burgers equation
We investigate the compressibility of the manifold M space = {U µ (t) : t ∈ (0, T ), µ ∈ P} ⊂ L 2 (0, L), which needs to be approximated in time-marching ROMs. Towards this end, we assess performance of POD in the unregistered and in the registered case; for simplicity, we here restrict ourselves to the case P = {μ},μ = [1, 0.25]. Figure 12 shows the behavior of U µ (t) and the projection Π Z N U µ (t) for two time instants. Here, Z N is the N = 20-dimensional POD space built based on n train = 200 temporal snapshots associated with the equispaced sampling times {t k s } ntrain k=1 . As expected, linear methods are extremely inefficient to capture shock waves: the projection error is indeed significant despite the relatively-large number of retained modes. In Figures 13, 14 and 15, we investigate performance of spatial registration. Here, the mapping is generated based on n train = 200 snapshots through Algorithm 1 with N max = 5, ξ = 10 −2 , M hf = 100, tol pod = 10 −4 . We further consider T N0=2 = span{U µ (0), U µ (T )} as initial template space; the resulting map consists of a four-term expansion (M = 4). In Figure 13, we show the behavior of U µ (t) and the projection Π Z N U µ (t), where Z N is the N = 20-dimensional POD space built based on the mapped snapshots. In Figure 14(a), we compare the behavior of the normalized POD eigenvalues with and without registration; similarly, in Figure 14(b), we show the in-sample projection error E bf,∞ = max j=1,...,ntrain E bf (t k s ), for registered and unregistered configurations (cf. (2)). We observe that registration improves performance of POD for this model problem. Figure 15, which depicts the behavior of the physical and mapped solution for two time steps, shows that the mapping has the effect of "squeezing" the transition from one shock to zero shock by artificially increasing the wave speed. In the framework of projection-based ROMs, this poses serious issues for the numerical temporal integration.
In Figure 16, we show the unregistered and registered solution fields for two values of the parameter and for three horizontal slices of Ω. In the unregistered case, these slices correspond to the solution for three time instants. We observe that space-time registration is able to nearly "freeze" the position of the jump discontinuities with respect to parameter. These results suggest that the self-similar structures of the present problem can only be captured by considering the space-time behavior of the solution field.

E.2 Shallow water equations
We present further investigations of the space-time registration for the shallow water equations. Figure 17 shows the unregistered and registered free surface z for two values of the parameter and for three horizontal slices of Ω. As for the previous model problem, the registration procedure is able to nearly fix the position of the travelling wave with respect to parameter.   In Figure 18, we investigate the optimal reconstruction properties of the proposed approximation. Given U µ ∈ M and the POD space Z N = span{ζ n } N n=1 ⊂ X obtained based on the mapped snapshots { U µ k } ntrain k=1 , we define U opt µ = Π Z N,µ U µ where Z N,µ = span{ζ n • Φ −1 µ } N n=1 . Figures 18(a)-(b)-(c) show the solution U µ and the approximation U opt µ (black continuous line) for two values of µ and three time instants. We here report results for N = 3. We observe that we are able to obtain accurate reconstructions with an extremely low-dimensional