A generalized finite element method for the strongly damped wave equation with rapidly varying data

We propose a generalized finite element method for the strongly damped wave equation with highly varying coefficients. The proposed method is based on the localized orthogonal decomposition introduced and is designed to handle independent variations in both the damping and the wave propagation speed respectively. The method does so by automatically correcting for the damping in the transient phase and for the propagation speed in the steady state phase. Convergence of optimal order is proven in $L_2(H^1)$-norm, independent of the derivatives of the coefficients. We present numerical examples that confirm the theoretical findings.


Introduction
This paper is devoted to the study of numerical solutions to the strongly damped wave equation with highly varying coefficients. The equation takes the general form (1.1)ü − ∇ · (A∇u + B∇u) = f, on a bounded domain Ω. Here, A and B represent the system's damping and wave propagation respectively, f denotes the source term, and the solution u is a displacement function. This equation commonly appears in the modelling of viscoelastic materials, where the strong damping −∇ · A∇u arises due to the stress being represented as the sum of an elastic part and a viscous part [6,13]. Viscoelastic materials have several applications in engineering, including noise dampening, vibration isolation, and shock absorption (see [21] for more applications). In particular, in multiscale applications, such as modelling of porous medium or composite materials, A and B are both rapidly varying.
There have been much recent work regarding strongly damped wave equations. For instance, well-posedness of the problem is discussed in [7,20,22], asymptotic behavior in [8,3,31,35] solution blowup in [12,2], and decay estimates in [18]. In particular, FEM for the strongly damped wave equation has been analyzed in [25] using the Ritz-Volterra projection, and [24] uses the classical Ritz-projection in the homogeneous case with Rayleigh damping. In these papers, convergence of optimal order is shown. However, in the case of piecewise linear polynomials, the convergence relies on at least H 2 -regularity in space. Consequently, since the H 2 -norm depends on the derivatives of the coefficients, the error is bounded by u H 2 ∼ max(ε −1 A , ε −1 B ) where ε A and ε B denote the scales at which A and B vary respectively. The convergence order is thus only valid when the mesh width h fulfills h < min(ε A , ε B ). In other words, we require a mesh that is fine enough to resolve the variations of A and B, which becomes computationally challenging. This type of difficulty is common for equations with rapidly varying data, an issue for which several numerical methods have been developed (see e.g. [5,4,23,29,19]). None of these methods are however applicable to the strongly damped wave equation, where two different multiscale coefficients have to be dealt with. In this paper, we propose a novel multiscale method based on the localized orthogonal decomposition (LOD) method.
The LOD method is based on the variational multiscale method presented in [17]. It was first introduced in [30], and has since then been further developed and analyzed for several types of problems (see e.g. [27,28,1,16,15]). In particular, [26] studies the LOD method for quadratic eigenvalue problems, which correspond to time-periodic wave equations with weak damping. The main idea of the method is based on a decomposition of the solution space into a coarse and a fine part. The decomposition is done by defining an interpolant that maps functions from an infinite dimensional space into a finite dimensional FE-space. In this way, the kernel of the interpolant captures the finescale features that the coarse FE-space misses, and hence defines the finescale space. Subsequently, one may use the orthogonal complement to this finescale space with respect to a problem-dependent Ritz-projection as a modified FE-space. In the case of time-dependent problems, the LOD method performs particularly well in the sense that the modified FE-space only needs to be computed once, and can then be re-used in each time step.
Multiscale methods, as the localized orthogonal decomposition, are usually designed to handle problems with a single multiscale coefficient. In this sense, the strongly damped wave equation is different, as an extra coefficient appears due to the strong damping. Hence, one of the main challenges for the novel method is how to incorporate the finescale behavior of both coefficients in the computation. Nevertheless, it should be noted that existing multiscale methods are applicable for some special cases of this equation. An example is the case of Rayleigh damping where the coefficients are proportional to each other. Other examples are the steady state case, the transient phase in which the solution evolves rapidly in time, as well as the case of weak damping where no spatial derivatives are present on the damping term.
In this paper we present a generalized finite element method (GFEM) for solving the strongly damped wave equation. The method uses both the damping and diffusion coefficients to construct a generalized finite element space, similar to those in e.g. [30,28]. The solution is then evaluated in this space, but to account for the time dependence, an additional correction is added to it. However, this correction is evaluated on the fine scale, and thus expensive to compute. To overcome this issue, we prove spatial exponential decay for the corrections so that we can restrict the problems to patches in a similar manner as for the modified basis functions in [30]. The effect of the proposed method is that the multiscale basis compensates for the damping early on in the simulation when it is dominant and then gradually starts to compensate for the wave propagation which is dominant at steady state. This is done seamlessly and automatically by the method. Furthermore, we prove optimal order convergence in L 2 (H 1 )-norm for this method. Following this, we show that it is sufficient to compute the finescale corrections for only a few time steps by applying reduced basis (RB) techniques. For related work on RB methods, see e.g. [14,10,9], and for an introduction to the topic we refer to [33].
The outline of the paper is as follows: In Section 2 we present the weak formulation and classical FEM for the strongly damped wave equation, along with necessary assumptions. Section 3 is devoted to the generalized finite element method and its localization procedure. In Section 4 error estimates for the method are proven. Section 5 covers the details of the RB approach, and finally in Section 6 we illustrate numerical examples that confirm the theory derived in this paper.

Weak formulation and classical FEM
We consider the wave equation with strong damping of the following form where T > 0 and Ω is a polygonal (or polyhedral) domain in R d , d = 2, 3, and Γ := ∂Ω. The coefficients A and B describe the damping and propagation speed respectively, and f denotes the source function of the system. We assume A = A(x), B = B(x) and f = f (x, t), i.e. the multiscale coefficients are independent of time.
Denote by H 1 0 (Ω) the classical Sobolev space with norm v 2 (Ω) whose functions vanish on Γ. Moreover, let L p (0, T ; B) be the Bochner space with norm where B is a Banach space with norm · B . In this paper, following assumptions are made on the data.
For the spatial discretization, let {T h } h>0 denote a family of shape regular elements that form a partition of the domain Ω. For an element K ∈ T h , let the corresponding mesh size be defined as h K := diam(K), and denote the largest diameter of the partition by h := max K∈T h h K . We now define the classical FE-space using continuous piecewise linear polynomials as are appropriate approximations of u 0 and v 0 respectively. Here (·, ·) denotes the usual L 2 -inner product, a(·, ·) = (A∇·, ∇·), and b(·, ·) = (B∇·, ∇·).
For the temporal discretization, let 0 = t 0 < t 1 < ... < t N = T be a uniform partition with time step t n − t n−1 = τ . The time step here is chosen uniformly for simplicity, but the choice of varying time step is still viable. We apply a backward Euler scheme to get the fully discrete system: find Here, the discrete derivative is defined as∂ t u n h = (u n h − u n−1 h )/τ . For results on regularity and error estimates for the FEM solution of the strongly damped wave equation, we refer to [24]. Moreover, existence and uniqueness of a solution to (2.6) is guaranteed by Lax-Milgram.
In the analysis, we use the notations · 2 a := a(·, ·), · 2 b := b(·, ·), as well as |||·||| 2 =ã(·, ·) := a(·, ·) + τ b(·, ·), and the fact that these are equivalent with the H 1 -norm. That is, there exist positive constants C a , C b , Cã, c a , c b , cã ∈ R, such that Due to Cauchy-Schwarz and Young's inequality we have the following lower bound For the sum involving the bilinear form b(·, ·) we use summation by parts to get Using (2.8), the equivalence of the norms (2.7), and Young's weighted inequality we have Since C ǫ can be made arbitrarily small, it can be kicked to the left hand side. Using that f j 2 H −1 ≤ C f j 2 L2 we deduce (2.9).

Generalized finite element method
This section is dedicated to the development of a multiscale method based on the framework of the standard LOD. First of all, we introduce some notation for the discretization. Let V H be a FE-space defined analogously to V h in previous section, but with larger mesh size H > h. Moreover, we assume that corresponding family of partitions {T H } H>h is, in addition to shape-regular, also quasi-uniform. Denote by N the set of interior nodes of V H and by λ x the standard hat function for x ∈ N , such that V H = span({λ x } x∈N ). Finally, we make the assumption that T h is a refinement of T H , such that V H ⊆ V h .
3.1. Ideal method. To define a generalized finite element method for our problem, we aim to construct a multiscale space V ms of the same dimension as V H , but with better approximation properties. For the construction of such a multiscale space, let I H : V h → V H be an interpolation operator that has the projection property I H = I H • I H and satisfies where N (K) := {K ′ ∈ T H : K ′ ∩ K = ∅}. Furthermore, for a shape-regular and quasi-uniform partition, the estimate (3.1) can be summed into the global estimate where C γ depends on the interpolation constant C I and the shape regularity parameter defined as Here B K denotes the largest ball inside K. A commonly used example of such an interpolant is the space of functions that are affine on each triangle K ∈ T H , and E H : P 1 (T H ) → V H is an averaging operator that, to each free node x ∈ N , assigns the arithmetic mean of corresponding function values on intersecting elements, i.e.
Let the space V f be defined by the kernel of the interpolant, i.e.
That is, V f is a finescale space in the sense that it captures the features that are excluded from the coarse FE-space. This consequently leads to the decomposition In the case of the LOD method for the standard wave equation (see [1]), one considers a Ritz-projection based solely on the B-coefficient to construct a multiscale space. Instead, the goal is to define a multiscale space based on the inner product a(·, ·) + τ b(·, ·) (for a fixed τ ) and add additional correction to account for the time-dependency. This particular choice of scalar product comes from the backward Euler time-stepping formulation and both simplifies the analysis and is more natural in the implementation. Another possibility is to choose a(·, ·) as scalar product. For v ∈ V H , we consider the Ritz-projection Using this projection, we may define the multiscale space Note that dim(V ms ) = dim(V H ), and hence we can view V ms as a modified coarse space that contains finescale information of A and B. Next, we may use the Ritzprojection to define the basis functions for the space V ms . For x ∈ N , denote by We can now construct our basis for V ms as {λ x −φ x } x∈N which includes the behavior of the coefficients. For an illustration of the Ritz-projected hat function, as well as the modified basis function for V ms , see Figure 1.  We may now formulate our ideal (but impractical) method. Since the solution space can be decomposed as V h = V ms ⊕ V f , the idea is to solve a coarse scale problem in V ms , and then add additional correction from a problem on the fine scale. The method reads: find u n for n ≥ 2 with initial data u 0 lod = u 0 h ∈ V ms and u 1 lod = u 1 h ∈ V ms . The initial data is chosen in V ms to simplify the implementation of the finescale correctors. We further discuss this choice in Section 4.4.
Remark 3.1. Note that in (3.5), we do not take neither the source function nor the second derivative into account. This is because we can subtract an interpolant within the L 2 -product, so that corresponding error converges at the same order as the method itself. Moreover, the v n -part and w n -part have been excluded from the bilinear form a(·, ·) + τ b(·, ·) in (3.4) and (3.5) respectively, due to the orthogonality between V ms and V f . Note that the multiscale space V ms is created using (3.3) with small τ . Thus, the A-coefficient dominates the system for short times. Moreover, we note from (3.5) that for N large enough, we reach a steady state so that , due to the orthogonality. Hence, by rearranging terms we have that lod , z) ≈ 0, which shows that the solution converges to a state where it is orthogonal with respect to B.

3.2.
Localized method. The method we have considered so far is based on the global projection (3.3) onto the finescale space V f , which results in a large linear system that is expensive to solve. Moreover, the basis correctors yield a global support that makes the linear system (3.4) not sparse, but dense. Hence, we wish to localize the computations onto coarse grid patches in order to yield a sparse matrix system.
To localize the corrector problem, we first introduce the patches to which the support of each basis function is to be restricted. For ω ⊂ Ω, let N (ω) := {K ∈ T H : K ∩ ω = ∅}, and define a patch N k (ω) of size k as Given these coarse grid patches, we may restrict the finescale space V f to them by defining , for a subdomain ω ⊂ Ω. In particular, we will commonly use ω = T ∈ T H and ω = x ∈ N .
Next, define the element restricted Ritz-projection Note that we may construct the global Ritz-projection as the sum For k ∈ N, we may restrict the projection to a patch by letting By summation we yield the corresponding global version as Finally, we may construct a localized multiscale space as V ms,k : In order to justify the act of localization, it is required that a corrector φ x vanishes rapidly outside an area of its corresponding node x. Indeed, the following theorem from [27] shows that the corrector φ x satisfy an exponential decay away from its node, making the localization procedure viable.
There exists a constant c ≥ (8C I γ(2 + C I )) −1 , that only depends on the mesh constant γ, such that for any T ∈ T H and any v ∈ With the space V ms,k defined, we are able to localize the computations on the coarse scale system in (3.4) by replacing the multiscale space by its localized counterpart. It remains to localize the computations of the finescale system in (3.5), which equivalently can be written as We replace the right hand side by its localized version v n−1 k ∈ V ms,k and note that v n−1 . Thus, we seek our localized finescale solution as so that the computation of this equation is localized to a patch surrounding the node x ∈ N . We introduce the functions ξ l k,x ∈ V x f,k as solution to the parabolic equation with initial value ξ 0 k,x = 0, and where χ (0,τ ) is an indicator function on the interval (0, τ ). We claim that w n k,x = n l=1 α n−l x ξ l k,x is the solution to (3.6). This follows as With the localized computations established, the GFEM reads: find u n lod, . To justify the fact that we localize the finescale equation, we require a result similar to that of Theorem 3.2, but for the functions {ξ l x } N l=1 . We finish this section about localization by proving that these functions satisfy the exponential decay required for the localization procedure to be viable.
Then there exist constants c > 0 and C > 0 such that for any k ≥ 1 ξ n x H 1 (Ω\N k (x)) ≤ Ce −ck λ x H 1 , for sufficiently small time step τ .
Proof. First, we analyze the problem for the first time step, which when multiplied by τ can be written as Furthermore we use the energy norm |||·||| := ã(·, ·), and by |||·||| D we denote the restriction of the norm onto a domain D. As seen in the proof of Theorem 4.1 in [27], the result in Theorem 3.2 can be written as for some µ < 1. Moreover we define the cut-off function η k ∈ V H by With this setting, we note that where we have denotedÃ = A+τ B. We now proceed to estimate the terms M 1 , M 2 and M 3 separately. For M 1 , we use the problem (3. where we have used theã-orthogonality between V ms and V f , that the integral is zero on supp(λ x ), and that the support of the remaining integrand is Ω\N k−3 . Thus, we get that Moreover, by similar calculations as in the proof of Theorem 4.1 in [27], from M 2 and M 3 we get for a constantC > 0. In total, for ε ∈ (0, 1), we find that x 2 Ω\N k ). Let δ := (ε +C)(1 +C) −1 < 1, and set κ = max(δ, µ) < 1. Then, by rearranging the terms we get the inequality Repeating the estimate, we end up with We proceed by estimating ξ 1 x Ω . By choosing z = ξ 1 x in (3.9) we get Moreover, for i = 0, 1, 2, ..., ⌊k/4⌋ − 1, we note that so in total we have the estimate Recall that this is for the first time step. In next time step, we consider the problem a(ξ 2 x , z) + τ b(ξ 2 x , z) = a(ξ 1 x , z), ∀z ∈ V f . As for the first time step, we split the estimate into the similar integrals M 1 , M 2 , and M 3 , and get while M 2 and M 3 remain the same. In total, we get the estimate Once again, by letting δ = (ε +C)/(1 +C) and since δ ≤ κ, we get Once again we use (3.10) to conclude that Inductively, we get for arbitrary time step n the estimate Since κ 1 2 ⌊k/4⌋ ≤ Ce −ck for some c > 0 and C > 0, and since the energy norm is equivalent to the H 1 -norm, the theorem holds.
Remark 3.4. Note that the constant that appears in the final inequality converges to 1 1 − C 1 τ 2 , which means that the constant behaves nicely for sufficiently small time steps. More specifically, for time steps .

Error estimates
In this section we derive error estimates of the ideal method (3.4)-(3.5). The additional error due to localization can be controlled in terms of the localization parameter k. This is further discussed in Remark 4.9. We begin by considering an auxiliary problem. 4.1. Auxiliary problem. The auxiliary problem is defined as the standard variational formulation for the strongly damped wave equation, but we exclude the second order time derivative. Moreover, we let the starting time t = t 0 be general and set the time discretization to t = t 0 < t 1 < ... < t N = T . Thus, the auxiliary problem is to find Z n h ∈ V h for n = 1, ..., N , such that Equivalently, multiply (4.1) by τ and we may consider Existence of a solution to this problem is guaranteed by Lax-Milgram. For simplicity, we make the assumption that the initial data for the damped wave equation (2.6) is already in the multiscale space V ms , such that For general initial data we refer to Section 4.4 below. Furthermore, to limit the technical details in the proof we have chosen to analyze the error in the L 2 (H 1 )norm instead of the pointwise (in time) H 1 -norm.
The solution space can be decomposed as V h = V ms ⊕ V f , such that the solution can be written as Z n h = v n + w n where v n ∈ V ms and w n ∈ V f . If we insert this into the system in (4.2) and consider test functions z ∈ V ms , the left hand side becomes a(Z n h , z) + τ b(Z n h , z) = a(v n , z) + τ b(v n , z), where we have used the orthogonality between V ms and V f with respect to a(·, ·) + τ b(·, ·). Likewise, if test functions z ∈ V f are considered, the left hand side becomes a(Z n h , z) + τ b(Z n h , z) = a(w n , z) + τ b(w n , z). With these findings, we define the approximation to the auxiliary problem as to find Z n lod = v n + w n , where v n ∈ V ms and w n ∈ V f such that a(v n , z) with initial data Z 0 lod ∈ V ms . Note that if f = 0, then Z n h = Z n lod for every n, meaning that the method reproduces Z n h exactly. For the auxiliary problem, we prove the following error estimates.
If f n ∈ L 2 (Ω), for n ≥ 0, then we have and if f n =∂ t g n , for some {g n } N n=0 such that g n ∈ V h , then where C does not depend on the variations in A or B.
Proof. Since Z n h ∈ V h there arev n ∈ V ms andw n ∈ V f such that Z n h =v n +w n . Let e n = Z n h − Z n lod , and consider |||e n ||| 2 := a(e n , e n ) + τ b(e n , e n ) = τ (f n , e n ) + a(Z n−1 h , e n ) − a(v n , e n ) − τ b(v n , e n ) − a(w n , e n ) − τ b(w n , e n ).
For v n ∈ V ms we have due to the orthogonality and (4.3) Similarly, for w n ∈ V f we use the orthogonality and (4.4) to get a(w n , e n ) + τ b(w n , e n ) = a(Z n−1 lod ,w n − w n ). Hence, lod , e n ). The first term can be bounded by using the interpolation operator For the second term we note that Z n−1 h − Z n−1 lod = e n−1 so that |||e n ||| ≤ CHτ f n L2 + e n−1 .
Using this bound repeatedly and e 0 = 0 we get This concludes the proof since e n H 1 ≤ C|||e n |||. To prove the remaining bounds in L 2 -norm, we define the forward difference operator∂ t x n = (x n+1 − x n )/τ and consider the dual problem: find x j h ∈ V h for j = n − 1, ..., 0, such that x n h = 0 and a(−∂ t x j h , z) + b(x j h , z) = (e j+1 , z), ∀z ∈ V h . Similarly, by choosing z = −∂ t x j h , we achieve where we have used x n = e 0 = 0. Furthermore, we use the equations (4.1) and (4.3), and the orthogonality in (3.2), to show that the following Galerkin orthogonality holds for z ms ∈ V ms a(∂ t e j , z ms ) Using the orthogonality (4.12) and the equations (4.4) and (4.1) we deduce If f j ∈ L 2 (Ω), then we may subtract I H x f = 0 and use (3.1) to achieve Hence the energy estimate (4.9) can now be used to achieve (4.6). If f j =∂ t g j one may use summation by parts to achieve where we have used x n f = x n h = 0. Using (4.10) we conclude (4.7).
Remark 4.2. The bound in (4.6) is not of optimal order, but it is useful in the error analysis.
The next lemma gives error estimates for the discrete time derivative of the error. In the analysis of the (full) damped wave equation we use g =∂ t u h , see Lemma 4.5. If the initial data is nonzero we expect∂ t g 1 below to be of order t −1 in L 2 -norm. A detailed explanation of this is given below. Hence, we have a blow up close to zero due to low regularity of the initial data. Therefore, we need to multiply the error by t j . This is similar to the parabolic case for nonsmooth initial data see, e.g., [34].  4).

L2
(4.14) and if f n =∂ t g n , for some {g n } N n=0 , such that g n ∈ V h , then and, in addition, the following bound holds n j=2 τ t 2 where C does not depend on the variations in A or B.
Proof. The proof of (4.14) is similar to (4.6). Let e j = Z j h − Z j lod and define the dual problem with x n = 0. Choosing z =∂ t e j+1 and performing summation by parts we deduce where we used that x n = 0. Following the same argument as for (4.7), but with a difference quotient, we arrive at Using e 0 = 0, we deduce and with∂ t f j ∈ L 2 (Ω) we get and (4.14) follows by using an energy estimate of x j h similar to (4.9), but with∂ t e j on the right hand side.
If f j =∂ t g j we proceed as for (4.7) to achieve and (4.15) follows by using energy estimates similar to (4.10). For (4.16) we consider the dual problem A simple energy estimate shows ∂ t e k+1 2 L2 , j = 0, ..., n − 1. (4.20) Now choosing z = t j+1∂t e j+1 in (4.19) and performing summation by parts gives . The first two terms of the sum can be handled similarly to (4.15), Now, using summation by parts we achieve where we can use (4.20). Note that in the first term we can use the (crude) bound t 2 j ≤ t 2 n and let the constant C depend on T . We get For the third term in (4.21), we use t j −t j−1 = τ and once again perform summation by parts to get . For the last term in (4.21) we use (4.20) and (4.5) for n = 1 to achieve and (4.16) follows by letting f j =∂ t g j .

The damped wave equation.
For the error analysis of the full damped wave equation we shall make use of the projection corresponding to the auxiliary problem. For u n h ∈ V h , let X n = X n v + X n w ∈ V h with X n v ∈ V ms and X n w ∈ V f such that Note that since u n h solves (2.6), the system (4.22)-(4.23) is equivalent to That is, we may view u n h and X n as the solution and approximation to the auxiliary problem with source data f n −∂ 2 t u n h . We deduce following lemma.
Lemma 4.4. Let u n h be the solution to (2.6) and X n the solution to (4.22)-(4.23). The error satisfies the following bounds where C does not depend on the variations in A or B.
Proof. We let the auxiliary problem (4.1) start at t 1 . In this case e 0 = u 1 h −u 1 ms = 0, since u 1 ms = u 1 h ∈ V ms . The bound (4.26) now follows directly from (4.5) with f n −∂ 2 t u n h as right hand side. The second bound (4.27) follows from (4.6) and (4.7) with f n ∈ L 2 (Ω) and g n =∂ t u n+1 h .
In a similar way me may deduce bounds for the (discrete) time derivative of the error. As a direct consequence of Lemma 4.3, we get the following result.
where C does not depend on the variations in A or B.
Lemma 4.6. Let u n h and u n lod be the solutions to (2.6) and (3.4)-(3.5), respectively. Assume that u 0 = u 1 = 0. The error is bounded by with θ 0 = θ 1 = 0, since u 0 lod = u 0 h = X 0 and u 1 lod = u 1 h = X 1 . Letθ n = n j=2 τ θ j . Multiplying by τ and summing over n gives Since z ms L2 ≤ C z H 1 and α(1) = 0 due to the vanishing initial data, we get Now, choose z = θ n =∂ tθ n in (4.30). We get Now using that θ n L2 ≤ θ n H 1 and Young's weighted inequality, θ j can be kicked back to the left hand side. We deduce n j=2 τ θ j 2 Using Lemma 4.5 we have n j=2 τ θ j 2 To bound ∂2 t u 2 Due to the vanishing initial data∂ t u 2 (4.31) and we deduce All terms, except n j=2 τ ∂ t u j lod 2 H 1 that appears in n j=2 α 2 (j), can now be bounded by using the regularity in Theorem 2.1. To bound n j=2 τ ∂ t u j lod 2 H 1 we choose z =∂ t v n and z =∂ t w n in (3.4) and (3.5) respectively. Adding the two equations and using the orthogonality between V ms and V f we achieve lod a , so we may choose ǫ small enough such that ∂ t u n lod can be kicked to the left hand side. As in the proof of Theorem 2.1 we may now deduce where we have used that v 1 = u 1 lod = u 1 h ∈ V ms . However, we have assumed vanishing initial data so these terms disappear. The lemma follows.
where C does not depend on the variations in A or B.
Proof. We follow the steps in the proof of Lemma 4.6 to equation (4.30). Note that ρ n H 1 can be bounded by Lemma 4.4 and the energy bound in (2.9) with f = 0. Now, letθ n = n j=2 τ θ j . Choose z = θ n =∂ tθ n in (4.30) and multiply by τ t 2 n . We get Summing over n and using t 2 n − t 2 n−1 ≤ 2τ t n gives From the first two sums on the right hand side we can kick t j θ j L2 ≤ t j θ j H 1 and t j θ j H 1 to the left hand side. The remaining two sums needs to be bounded by other energy estimates.
It remains to bound C n j=2 τ t j θ j 2 L2 . For this purpose, choose z = θ n =∂ tθ n in (4.34). Multiply by t n τ and sum over n to achieve n j=2 τ t j θ j 2 L2 + t n θ n 2 a + n j=2 Note that θ j H 1 in the last sum in only present in the right hand side. The second term on the right hand side is bounded by (4.35). For the term involving the bilinear form b(·, ·) we use summation by parts to get Here the constant C ǫ can be made arbitrarily small due to Young's weighted inequality. The first two terms can be bounded by (4.35). Thus, (4.37) becomes n j=2 τ t j θ j 2 Using this in (4.36) we arrive at Using Lemma 4.5 and Lemma 4.4 with f = 0 we deduce for the first two terms in (4.38) where we can use (2.9) for n = 2 and f = 0 to bound∂ 2 t u 2 h . We get . For the remaining terms in (4.38) we note that t j θ j H 1 now may be kicked to left hand side using Cauchy-Schwarz and Young's weighted inequality. The term involving C ǫ can also be moved to the left hand side. All terms involving α(j) and β(j) can be bounded by (2.8) and (4.32). This finishes the proof after using the regularity in Theorem 2.1 with f = 0.

4.3.
Error bound for the ideal method. We get the final result by combining the two previous lemmas. , respectively. The solutions can be split into u n h = u n h,1 + u n h,2 and u n lod = u n lod,1 + u n lod,2 , where the first part has vanishing initial data, and the second part a vanishing right hand side. The error is bounded by Proof. This is a direct consequence of Lemma 4.6 and Lemma 4.7 together with the fact that the problem is linear so the error can be split into two contributions satisfying the conditions of each lemma.
Remark 4.9. The result from Corollary 4.8 is derived for the ideal method presented in (3.4)-(3.5). The GFEM in (3.7)-(3.8) will yield yet another error from the localization procedure. However, due to the exponential decay in Theorem 3.2 and Theorem 3.3, it holds for the choice k ≈ | log(H)| that the perturbation from the ideal method is of higher order and the derived result in Corollary 4.8 is still valid. For the details regarding the error from the localization procedure, we refer to [30].
4.4. Initial data. For general initial data u 0 h , u 1 h ∈ V h we consider the projections R ms u 0 h and R ms u 1 h , where R ms = I − R f is the Ritz-projection onto V ms . Let v be the difference between two solutions to the damped wave equation with the different initial data. From (2.8) , where we have chosen to keep the b-norm. For the first term we may use the interpolant I H to achieve H. For the second term use If the initial data fulfills the following condition for some g ∈ L 2 (Ω) then we may deduce Hence, the error introduced by the projection of the initial data is of order H. The condition (4.40) appears when applying the LOD method to classical wave equations, see [1], where it is referred to as "well prepared data". We note in our case that if B is small compared to A, that is if the damping is strong, then the constant in (4.39) is small. In some sense, this means that the condition in (4.40) is of "less importance", which is consistent with the fact that strong damping reduces the impact of the initial data over time.

Reduced basis method
The GFEM as it is currently stated requires us to solve the system in (3.7) for each coarse node in each time step, i.e. N number of times. We will alter the method by applying a reduced basis method, such that it will suffice to find the solutions for M < N time steps, and compute the remaining in a significantly cheaper and efficient way.
First of all, we note how the system (3.7) that ξ n k,x solves resembles a parabolic type equation with no source term. That is, the solution will decay exponentially until it is completely vanished. An example of how ξ n k,x vanish with increasing n can be seen in Figure 2, where the coefficients are given as In Figure 2 it is also seen how the solutions decay with a similar shape through all time steps. This gives the idea that it is possible to only evaluate the solutions for a few time steps, and utilize these solutions to find the remaining ones. This idea can be further investigated by storing the solutions {ξ n k,x } N n=1 and analyzing the corresponding singular values. The singular values are plotted and seen in Figure  3. It is seen how the values decrease rapidly, and that most of the values lie on machine precision level. In practice, this means that the information in {ξ n k,x } N n=1 can be extracted from only a few ξ n k,x 's. We use this property to decrease the computational complexity by means of a reduced basis method. We remark that singular value decomposition is not used for the method itself, but is merely used as a tool to analyze the possibility of applying reduced basis methods.        Figure 5. Relative H 1 -error u n lod − u n lod,k H 1 / u n lod H 1 between the non-localized and localized method, plotted against the layer number k.
set to h = 2 −8 , and for each coarse mesh width the localization parameter is set to k = log 2 (1/H). Moreover, the time step is set to τ = 0.02 (for the GFEM as well as the reference solution) and the solution is evaluated at T = 1. To compute the error, we use a FEM solution on the fine mesh as a reference solution. The error as a function of 1/H can be seen in Figure 6. Here it is seen how the error for the novel method decays faster than linearly, confirming the error estimates derived in Section 4. For comparison, Figure 6 also shows the error of the standard FEM solution, as well as the solution using the standard LOD method with correction solely on A and B respectively, i.e. corrections based on the bilinear forms a(·, ·) and b(·, ·) respectively and without finescale correctors. As expected, the error of these methods stay at a constant level through all coarse grid sizes.
At last, we compute the solution where the system (3.7) is computed using the reduced basis approach. For this example, we let the number of pre-computed solutions M vary, and see how the error between the solutions u n lod,k and u n,rb lod,k  behaves. In the example we have the fine mesh h = 2 −8 , the coarse mesh H = 2 −5 , the time step τ = 0.02, and the final time T = 1. The result can be seen in Figure  7. Here it is seen how the error decreases rapidly with the amount of pre-computed solutions. Note that it is sufficient to compute approximately 10 solutions to yield an error smaller than the discretization error for the main method in Figure 6. This for the case when the number of time steps are N = 50. We emphasize that a large increment in time steps does not impact the number of pre-computed solutions M significantly, making the RB-approach relatively more efficient the more time steps that are considered.