AN ULTRAWEAK SPACE-TIME VARIATIONAL FORMULATION FOR THE WAVE EQUATION: ANALYSIS AND EFFICIENT NUMERICAL SOLUTION

. We introduce an ultraweak space-time variational formulation for the wave equation, prove its well-posedness (even in the case of minimal regularity) and optimal inf-sup stability. Then, we introduce a tensor product-style space-time Petrov–Galerkin discretization with optimal discrete inf-sup stability, obtained by a non-standard definition of the trial space. As a consequence, the numerical approximation error is equal to the residual, which is particularly useful for a posteriori error estimation. For the arising discrete linear systems in space and time, we introduce efficient numerical solvers that appropriately exploit the equation structure, either at the preconditioning level or in the approximation phase by using a tailored Galerkin projection. This Galerkin method shows competitive behavior concerning wall-clock time, accuracy and memory as compared with a standard time-stepping method in particular in low regularity cases. Numerical experiments with a 3D (in space) wave equation illustrate our findings. nodes), respectively, such that the homogeneous boundary conditions are satisﬁed. We obtain a discretization of dimension 𝑁 ℎ . We show an example for Ω = (0 , 1) and ℎ = 14 i.e. , 𝑁 ℎ = 4) in Figure 1, the test functions on the left. The arising trial functions are depicted on the right and turn out to be identical to the time discretization trial functions in Example 3.1.

regularity for initial data 0 ∈ 2 (Ω). Following [11], we employ specifically chosen test spaces so as to derive a well-posed variational problem. A Petrov-Galerkin method is then used for the discretization: inspired by [9], we first choose an appropriate test space and then define the (non-standard) trial space to preserve optimal inf-sup stability. After completion of this work, we learned that this approach is very closely related to the DPG * method [13,21].
The aforementioned discretization results into a linear system of equations B B B = , whose (stiffness) matrix B B B is a sum of tensor products and has large condition number, making the system solution particularly challenging. Memory and computational complexity are also an issue, as space-time discretizations in general lead to larger systems as compared to conventional time-stepping schemes, where a sequence of linear systems has to be solved, whose dimension corresponds to the spatial discretization only.
Building upon [19], we introduce matrix-based solvers that are competitive with respect to time-stepping schemes. In particular, we show that in case of minimal regularity the space-time method using fast matrixbased solvers outperforms a Crank-Nicolson time-stepping scheme.
The remainder of this paper is organized as follows: In Section 2, we review known facts concerning variational formulations in general and for the wave equation in particular. We derive an optimally inf-sup stable ultraweak variational form. Section 3 is devoted to the Petrov-Galerkin discretization, again allowing for an inf-sup constant equal to 1. The arising linear system of equations is derived in Section 4 and its efficient and stable numerical solution is discussed in Section 5. We show some results of numerical experiments for the 3D wave equation in Section 6. For proving the well-posedness of the proposed variational form we need a result concerning a semi-variational formulation of the wave equation, whose proof is given in Appendix A.
Then, for all ∈ V ′ , the variational problem (2.2) admits a unique solution * ∈ U, which depends continuously on the data ∈ V ′ if and only if The inf-sup constant (or some lower bound) also plays a crucial role for the numerical approximation of the solution ∈ U since it enters the relation between the approximation error and the residual (by the Xu-Zikatanov lemma [35], see also below). This motivates our interest in the size of : the larger 3 , the better.
A standard tool (at least) for (i) proving the inf-sup-stability in (C.2); (ii) stabilizing finite-dimensional discretizations; and (iii) getting sharp bounds for the inf-sup constant; is to determine the so-called supremizer.
To define it, let : U × V → R be a generic bounded bilinear form and 0 ̸ = ∈ U be given. Then, the supremizer ∈ V is defined as the unique solution of It is easily seen that which justifies the name supremizer.

The semi-variational framework
We start presenting some facts from the analysis of semi-variational formulations of the wave equation, where we follow and slightly extend ( [3], Chap. 8). The term semi-variational originates from the use of classical differentiation w.r.t. time and a variational formulation in the space variable. As above, we suppose that two real Hilbert spaces and are given, such that is compactly imbedded in . Let : × → R be a continuous, coercive and symmetric bilinear form 4 . Next, let be the operator on associated with (·, ·) in the following sense: We define the domain of by 5) and recall that for any ∈ ( ) there is a unique ∈ such that ( , ) = ( , ) for all ∈ . Then, we define : ( ) → by ↦ → =: . By the spectral theorem there exists an orthonormal basis { : ∈ N} of (the eigenvectors of ) and numbers ∈ R with 0 < 1 ≤ 2 ≤ · · · , lim →∞ = ∞, such that Note that ( ) is dense in , ∈ ( ) and = for all ∈ N. For ∈ R, we define and note that 0 = , 1 = and 2 = ( ). Moreover, ( ) ′ ∼ = − , see Proposition A.1. We consider the non-homogeneous wave equation Then the following result on the existence and uniqueness holds. Its proof is given in Appendix A.
We note a simple consequence for the backward wave equation.

Towards space-time variational formulations
The semi-variational formulation described above cannot be written as a variational formulation in the form of (2.2), since ([0, ]; ) is not a Hilbert space, even if is a Hilbert space of functions : Ω → R in space, e.g., 2 (Ω) or 1 0 (Ω). We need Lebesgue-type spaces for the temporal and spatial variables yielding the notion of Bochner spaces, denoted by := 2 ( ; ) 5 and defined as := 2 ( ; ) :=
We will derive a space-time variational formulation in Bochner spaces, i.e., we multiply the partial differential equation in (2.1) with test functions in space and time and also integrate w.r.t. both variables. Now, the question remains how to apply integration by parts. One could think of performing integration by parts once w.r.t. all variables. This would yield a variational form in the Bochner space 1 ( ; ). However, we were not able to prove well-posedness in that setting. Hence, we suggest an ultraweak variational form, where all derivatives are put onto the test space by means of integration by parts. We thus define the trial space as U := ℋ = 2 ( ; ) (2.13) and search for an appropriate test space V to guarantee the well-posedness of (2.2). Assuming that ( ) = ( ) = 0 for any ∈ V and performing integration by parts twice for both time and space variables, we obtain for ∈ V, where the space V still needs to be defined in such a way that all the assumptions of Theorem 2.1 are satisfied. It turns out that this is not a straightforward task. The duality pairing ⟨·, ·⟩ is defined in (A.1) in the appendix.

The Lions-Magenes theory
Variational space-time problems for the wave equation within the setting (2.14) have already been investigated in the book [23] by Lions and Magenes. We are going to review some facts from Chapter III, Section 9, pp. 283-299 of [23]. The point of departure is the following adjoint-type problem.
Notice that the first statement is proven by deriving energy-type estimates for the uniqueness and a Faedo-Galerkin approximation for the existence. Let us comment on the previous theorem. First, we note that 0 ∈ , 1 ∈ are "too smooth" initial conditions, we aim at (only) 0 ∈ , 1 ∈ ′ , see (2.14). As a consequence: (1) Statement (a) in Thm. 2.4 results in a "too smooth" solution. In fact, we are interested in an ultraweak solution ∈ 2 ( ; ), (a) is "too" much. (2) Even though the stated solution in (b) has the "right" regularity, it is not clear how to associate the functional in (2.14) to the dual space W ′ , i.e., how to interpret the three terms of in (2.14) in the space W ′ .
These issues are partly fixed by the following statement.
Even though the latter result uses the "right" smoothness of the data and also includes existence and uniqueness, we are not fully satisfied with regard to our goal of a well-posed variational formulation of the wave equation in Hilbert spaces. In fact, the "trial space" ∞ ( ; ) ∩ 1 ∞ ( ; ′ ) is not a Hilbert space and it is at least not straightforward to see how we can base a Petrov-Galerkin approximation on such a trial space. Hence, we follow a different path.

An optimally inf-sup stable ultraweak variational form
We are going to derive a well-posed ultraweak variational formulation (2.2) of (2.1), where U = 2 ( ; ) and (·, ·), (·) are defined by (2.14). To this end, we will follow the framework presented in [11]. This approach is also called the method of transposition and already goes back to [23], see also e.g., [2,8,24] for the corresponding finite element error analysis. For the presentation we will need the semi-variational formulation described above.
Let us restrict ourselves to = −∆ acting on a convex domain Ω ⊂ R and supplemented by homogeneous Dirichlet boundary conditions. This means that = 2 (Ω), = 1 0 (Ω) and ( ) = 2 (Ω) ∩ 1 0 (Ω). However, we stress the fact that most of what is said here can be also extended to other elliptic operators. Then, the starting point is the operator equation in the classical form, i.e., i.e., ∘ = −∆ is also to be understood in the classical sense. Next, denote the classical domain of ∘ by ( ∘ ), where initial and boundary conditions are also imposed in ( ∘ ), i.e., The range ℛ( ∘ ) in the classical sense then reads ℛ( ∘ ) = (︀ Ω )︀ . As a next step, we determine the formal adjoint * ∘ of ∘ . Since the operator * ∘ coincides with ∘ while acting on the space of functions with homogeneous terminal conditions ( ) =˙( ) = 0 instead of initial conditions. This means that ℛ( Following [11], we need to verify the following conditions In order to prove ( * 1), first note that Let us denote the continuous extension of * ∘ from ( * ∘ ) to 2 also by * ∘ . Corollary 2.3 implies that this continuous extension * ∘ is an isomorphism from 2 onto ([0, ]; ) (here we need the semi-variational theory). This implies that * ∘ is injective on ( * ∘ ), i.e., ( * 1). Now, the properties ( * 1) and ( * 2) ensure that is a norm on ( * ∘ ) = 2 . Then, we set which is a Hilbert space, where * is to be understood as the continuous extension of * ∘ from 2 to V 7 . Now, we are ready to prove our first main result. Theorem 2.6. Let ∈ 2 ( ; ′ ), 0 ∈ and 1 ∈ ′ . Moreover, let V, (·, ·), and (·) be defined as in (2.19) and (2.14), respectively. Then, the variational problem admits a unique solution * ∈ U. In particular, Proof. We are going to show the conditions (C.1)-(C.3) of Theorem 2.1 above.
Remark 2.7. The essence of the above proof is the fact that U and V are related as U = * (V) and noting that * coincides with , while acts on functions with homogeneous initial conditions whereas * on functions with homogeneous terminal conditions.

Petrov-Galerkin discretization
We determine a numerical approximation to the solution of a variational problem of the general form (2.2). To this end, one chooses finite-dimensional trial and test spaces, U ⊂ U, V ⊂ V, respectively, where is a discretization parameter to be explained later. For convenience, we assume that their dimension is equal, i.e., := dim U = dim V . The Petrov-Galerkin method then reads As opposed to the coercive case, the well-posedness of (3.1) is not inherited from that of (2.20). In fact, in order to ensure uniform stability (i.e., stability independent of the discretization parameter ), the spaces U and V need to be appropriately chosen in the sense that the discrete inf-sup (or LBB -Ladyshenskaja-Babuška-Brezzi) condition holds, i.e., there exists a ∘ > 0 such that where the crucial point is that ∘ is independent of . The size of ∘ is also relevant for the error analysis, since the Xu-Zikatanov lemma [35] yields a best approximation result for the "exact" solution * of (2.20) and the "discrete" solution * of (3.1). This is also the key for an optimal error/residual relation, which is important for a posteriori error analysis (also within the reduced basis method).
The key idea, as already stated earlier, is to first choose sufficiently smooth test functions, namely 2 in space and time. This can be done, e.g., by choosing at least quadratic splines. Then, the trial functions are the image of the test functions under the adjoint wave operator.

A stable Petrov-Galerkin space-time discretization
In order to use a straightforward finite element discretization, we use tensor product subspaces U ⊂ U and V ⊂ V with a possibly large inf-sup lower bound ∘ in (3.2). Constructing such a stable pair of trial and test spaces is again a nontrivial task, not only for the wave equation. It is a common approach to choose some trial approximation space U (e.g., by splines) and then (try to) construct an appropriate according test space V in such a way that (3.2) is satisfied. This can be done, e.g., by computing the supremizers for all basis functions in U and then define V as the linear span of these supremizers. However, this would amount to solve the original problem times, which is way too costly. We mention that this approach indeed works within the discontinuous Galerkin (dG) method, see, e.g., [10,12,16]. We will follow a different path, also used in [9] for transport problems. We first construct a test space V by a standard approach and then define a stable trial space U in a second step. This implies that the trial functions are no longer "simple" splines but they arise from the application of the adjoint operator * (which is here the same as the primal one except for initial/terminal conditions) to the test basis functions.

Finite elements in time.
We start with the temporal discretization. We choose some integer > 1 and set ∆ := / . This results in a temporal "triangulation"  , i.e., 1 , 2 , can be formed by using 0 = 0 as double and triple node, respectively. Thus, we get a discretization in 2 { } ( ) of dimension . We show an example for = 1 and ∆ = 1 4 (i.e., = 4) in Figure 1, the test functions in the center, optimal trial functions on the right.
Since * is an isomorphism of V onto 2 ( ; ), the functions are in fact linearly independent. An example of a single trial function is shown in Figure 2.  Proof. Let 0 ̸ = ∈ U ⊂ 2 ( ; ). Then, since U = * (V ) there exists a unique ∈ V such that * = . Hence On the other hand, by the Cauchy-Schwarz inequality, we have which proves the claim.

Optimal ultraweak discretization of ordinary differential equations
For the understanding of our subsequent numerical investigations, it is worth considering the univariate case, i.e., ordinary differential equations (ODEs) of the form with either boundary or second order initial conditions, namely  We did experiments for a whole variety of problems admitting solutions of different smoothness both for the boundary value ((3.7), (3.8a)) and the initial value problem ((3.7), (3.8b)). The differences were negligible, so that we only report the results for the initial value problem (3.8b). We investigate the 2 -error and the condition number of the stiffness matrix over the degrees of freedom (d.o.f.) for different type of discretizations, namely -1 * /3: quadratic spline test functions and inf-sup-optimal trial functions as image of the test functions under the adjoint operator; ansatz / test : splines of order ansatz for the trial and of order test for the test functions (here 1/3 and 2/4).
We obtain "standard" spline spaces for the trial space, not related to the test space through the image of the adjoint operator -and thus not necessarily inf-sup optimal.
The results are shown in Figure 3. The errors for 1 * /3 and 1/3 are the same, so that the blue line is not visible (in fact, the spanned spaces coincide with different bases). We obtain linear convergence for these cases and quadratic order for 2/4. Concerning the condition numbers, we see that the inf-sup optimal case in fact gives rise to significantly larger condition numbers than the "standard" ones.
It is worth mentioning that we got ≡ 1 in all cases. This means in particular that the ansatz spaces generated by the inf-sup-optimal setting 1 * /3 are identical with those for the 1/3 case. After observing this numerically, we have also proven this observation. However, we stress the fact that this is a pure univariate fact, i.e., for the ODE. It is no longer true in the PDE case as we shall also see below.

The linear system
To derive the stiffness matrix, we first use arbitrary spaces induced by . We stress that B B B is symmetric and positive definite for = −∆. Finally, let us now detail the right-hand side. Recall from (2.14), that ( ) = ( , ) ℋ + ⟨ 1 , (0)⟩ − ( 0 ,˙(0)) . Hence, Using appropriate quadrature formulae results in a numerical approximation, which we will again denote by . Then, solving the linear system B B B = yields the expansion coefficients of the desired approximation ∈ U as follows: Let = ( ) =1,..., , = ( , ), then in the general case, and for the special one, i.e., = * ( ),

Stability vs. conditioning
The (discrete) inf-sup constant refers to the stability of the discrete system, being included in the error/residual relation where the residual ∈ V ′ is defined as usual by ( ) := ( ) − ( * , ), ∈ V. Hence, is a measure for the stability; its value is the minimal generalized eigenvalue of a generalized eigenvalue problem. This has no effect on the condition number (B B B ), which instead governs the accuracy of direct solvers and convergence of iterative methods in the symmetric case.

Conditioning of the matrices
We report on the condition numbers of the matrices involved in (4.1) and (4.2). In Figure 4, we see the asymptotic behavior of the different matrices. Most matrices show a "normal" scaling w.r.t. the order of the differential operator. However, there are two components, namelyM Δ in the general case and N Δ in the inf-sup-optimal case, which show a very poor scaling as the mesh size tends to zero (here indicated by ℎ max but used for both ∆ and ℎ). This effect comes from the initial condition, namely the first column of the matrices. On the other hand, (B B B ) scales like a stiffness matrix of a 4th order problem. Since B B B is a sum of tensor products involving some ill-conditioned components, we need a structure-aware preconditioning.
We observed the spectral equivalence for the spatial matrices also in our numerical experiments. However, we saw that this is not true for the temporal matrices in the sense that Δ and ⊤ Δ −1 Δ Δ are not spectrally equivalent.

Solution of the algebraic linear system
To derive preconditioning strategies and the new projection method, we rewrite the linear system B B B = as a linear matrix equation, so as to exploit the structure of the Kronecker problem. Let = vec( ) be the operator stacking the columns of one after the other, then it holds that ( ⊗ ) = vec( ⊤ ) for given matrices , , and of conforming dimensions. Hence, the vector system is written as where = vec( ) and the symmetry of some of the matrices has been exploited. In the following we describe two distinct approaches: First, we recall the matrix-oriented conjugate gradient method, preconditioned by two different operator-aware strategies. Then we discuss a procedure that directly deals with (5.1).

Preconditioned conjugate gradients
Since B B B is symmetric and positive definite, the preconditioned conjugate gradient (PCG) method can be applied directly to (5.1), yielding a matrix-oriented implementation of PCG, see Algorithm 1. Here tr( ) denotes the trace of the square matrix . In exact precision arithmetic, this formulation, gives the same iterates as the standard vector form, while exploiting matrix-matrix computations [22]. This can easily be seen by exploiting

Sylvester operator preconditioning.
A natural preconditioning strategy consists of taking the leading part of the coefficient matrix, in terms of order of the differential operators. Hence, setting P = Δ ⊗ ℎ + Δ ⊗ ℎ , we have (see also [19]) with +1 = vec( +1 ) and +1 = vec( +1 ). Applying −1 corresponds to solving the generalized Sylvester For small size problems in space, this can be carried out by means of the Bartels-Stewart method [7], which entails the computation of two Schur decompositions, performed before the PCG iteration is started. For fine discretizations in space, iterative procedures need to be used. For these purposes, we use a Galerkin approach based on the rational Krylov subspace [14], only performed on the spatial matrices; see [31] for a general discussion. A key issue is that this class of iterative methods requires the righthand side to be low rank; we deliberately set the rank to be at most twenty. Hence, the Sylvester solver is applied after a rank truncation of +1 , which thus becomes part of the preconditioning application.
To derive a preconditioner that takes full account of the coefficient matrix we employ the operator K K K ⊤ M M M −1 K K K in Section 4.2. Thanks to the spectral equivalence in Proposition 4.1, PCG applied to the resulting preconditioned operator appears to be optimal, in the sense that the number of iterations to reach the required accuracy is independent of the spatial mesh size; see Table 2.
In vector form this preconditioner is applied as +1 = (︀ K K K ⊤ M M M −1 K K K )︀ −1 +1 . However, this operation can be performed without explicitly using the Kronecker form of the involved matrices, with significant computational and memory savings. We observe that Moreover, due to the transposition properties of the Kronecker product, We next observe that the equation̂︀ K K K ⊤ = +1 can be written as the following Sylvester matrix equation Δ . The overall computational cost of this operation depends on the cost of solving the two matrix equations. For small dimensions in space, once again a Schur-decomposition based method can be used [7]; we recall here that thanks to the discretization employed, we do not expect to have large dimensions in time, as matrices of size at most (100) arise. Also in this case, for fine discretizations in space we use an iterative method (Galerkin) based on the rational Krylov subspace [14], only performed on the spatial matrices, with the truncation of the corresponding right-hand side, +1 and , respectively, so as to have at most rank equal to twenty. Allowing a larger rank did not seem to improve the effectiveness of the preconditioner. Several implementation enhancements can be developed to make the action of the preconditioner more efficient, since most operations are repeated at each PCG iteration with the same matrices.

Galerkin projection
An alternative to PCG consists of attacking the original multi-term matrix equation directly. Thanks to the symmetry of ℎ we rewrite the matrix equation (5.1) as with of low rank, that is = 1 ⊤ 2 . Note that this is an assumption on the data. In particular, we assume that the right-hand side ( ) in (2.14) can be discretized in a way such that the matrix form of has low rank. This happens for instance when is a separable function of and , or it can be well approximated by a separable function; other scenarios can also lead to a low-rank .
Consider two appropriately selected vector spaces , of dimensions much lower than ℎ , , respectively, and let , be the matrices whose orthonormal columns span the two corresponding spaces. We look for a low rank approximation of as = ⊤ . To determine we impose an orthogonality (Galerkin) condition on the residual with respect to the generated space pair ( , ). Using the matrix Euclidean inner product, this corresponds to imposing that ⊤ = 0. Substituting and into this matrix equation, we obtain the following reduced matrix equation, of the same type as (5.4) but of much smaller size, The small dimensional matrix is thus obtained by solving the Kronecker form of this equation 9 . The described Galerkin reduction strategy has been thorough exploited and analyzed for Sylvester equations, and more recently successfully applied to multi-term equations, see, e.g., [28]. The key problem-dependent ingredient is the choice of the spaces , , so that they well represent spectral information of the "left-hand" and "right-hand" matrices in (5.4). A well established choice is (a combination of) rational Krylov subspaces [31]. More precisely, for the spatial approximation we generate the growing space range ( ) aŝ︀ where is the th column of , so that +1 is obtained by orthogonalizing the new columns inserted in︀ +1 . The matrix̂︀ +1 grows at most by two vectors at a time. For each , the parameter can be chosen either a priori or dynamically, with the same sign as the spectrum of ℎ ( ℎ ). Here is cheaply determined using the adaptive strategy in [14]. Since ℎ represents an operator of the second order, the value √ resulted to be appropriate; a specific computation of the parameter associated with ℎ can also be included, at low cost. Analogously,︁ where is the th column of , and +1 is obtained by orthogonalizing the new columns inserted in︁ +1 . The choice of ℓ > 0 is made as for .
Remark 5.1. This approach yields the vector approximation = ( ⊗ ) , with = vec( ) that is, the approximation space range ( ⊗ ) is more structured than that generated by PCG applied to . Experimental evidence shows that this structure-aware space requires significantly smaller dimension to achieve similar accuracy. This is theoretically clear in the Sylvester equation case [31], while it is an open problem for the multi-term linear equation setting.
Remark 5.2. For fine space discretizations, the most expensive step of the Galerkin projection is the solution of the linear systems with ( ℎ + ℎ ) and Depending on the size and sparsity, these systems can be solved by either a sparse direct method or by an iterative procedure; see [31] and references therein.
If denotes the current approximate solution computed at iteration , Algorithm 1 and the Galerkin method are stopped as soon as (i) ℰ < 10 −5 , where the backward error ℰ is defined as and is the residual matrix defined in (5.5), and (ii) ‖ ‖ /‖ ‖ < 10 −2 for the relative residual norm. For the Galerkin approach the computation of ℰ simplifies thanks to the low-rank format of the involved quantities (for instance, does not need to be explicitly formed to compute its norm). Moreover, the linear systems in the rational Krylov subspace basis construction are solved by the vector PCG method with a tolerance = 10 −8 ; see Remark 5.2.
We compared the space-time method with the classical Crank-Nicolson time stepping scheme, in terms of approximation accuracy and CPU time. The ℎ × ℎ linear systems involved in the time marching scheme are solved by means of the vector PCG method with tolerance = 10 −6 . The time stepping scheme is also used to compute the reference solutions. To this end, we chose 1024 time steps and 64 unknowns in every space dimension, resulting in 2.68 · 10 8 degrees of freedom. For the error calculation, we evaluated the solutions on a grid of 64 points in every dimension and approximated the 2 error through the 1.67 · 10 7 query points.
The code 10 is run in Matlab and the B-spline implementation is based on [25] 11 . To explore the potential of the new ultraweak method on low-regularity solutions, we only concentrate on experiments with lower regularity from the CFL-stability condition. On the other hand, the rate of convergence of the time-stepping method is clearly better than the low-order space-time method using discontinuous ansatz functions. In order to get a convergence order of the space-time method comparable to the rate of the Crank-Nicolson scheme, we would at least need to use quartic test functions.

Case 2: Discontinuous solution
For the case of a discontinuous solution, our results are shown in Figure 6. The number of iterations for the PCG and the Galerkin methods behave similar as in case 1, so that we do not monitor all numbers. Again, we observe that the performance of the ultraweak space-time method is basically independent of the wave speed. Moreover, due to the fact that the solution is discontinuous, the rate of convergence of the time stepping scheme is no longer optimal. As a consequence, the ultraweak space-time method outperforms the time stepping approach w.r.t. accuracy and efficiency. The benefit is even larger for increasing wave speed numbers.

Conclusions
Our theoretical results and numerical experience show that the proposed ultraweak variational space-time method, when equipped with appropriate linear algebra solvers, is significantly more accurate and efficient than the Crank-Nicolson scheme on problems with low regularity and high wave speed.

Appendix A. Proof of Theorem 2.2
We collect the proof of the well-posedness for the semi-variational setting in Section 2.2. Even though the proofs are based upon rather classical spectral theory, we could not find them in the desired form in the literature. is an isometric isomorphism from − to ( ) ′ , i.e., ( ) ′ ∼ = − .
Proof. First, note that is a Hilbert space with the inner product ( , ) := ∑︀ ∞
We are now in the position to prove Theorem 2.2 for the wave equation with inhomogeneous right-hand side.