Optimal convergence rates in $L^2$ for a first order system least squares finite element method. Part I: homogeneous boundary conditions

We analyze a divergence based first order system least squares method applied to a second order elliptic model problem with homogeneous boundary conditions. We prove optimal convergence in the $L^2(\Omega)$ norm for the scalar variable. Numerical results confirm our findings.


Introduction
Least Squares Finite Element Methods (LSFEM) are an important class of numerical methods for the solution of partial differential equations with a variety of applications.The main idea of the LSFEM is to reformulate the partial differential equation of interest as a minimization problem, for which a variety of tools is available.For example, even for non-symmetric or indefinite problems, the discretization with the least squares approach leads to symmetric, positive definite systems, which can be solved with well-established numerical technologies.Furthermore, the least squares technique is naturally quasi-optimal, albeit in a problem-dependent norm.For second order PDEs, which is the setting of the present work, the most common least squares approach is that of rewriting the equation as a First Order Least Squares System (FOSLS) that can be discretized with established finite element techniques.A benefit is that many quantities of interest are approximated directly without the need of postprocessing.We mention [BG09] as a classical monograph on the topic as well as the papers [Jes77, CLMM94,CMM97b,BG05].
The present work considers a Poisson-like second order model problem written as a system of first order equations.For the discretization, an H H H(Ω, div) × H 1 (Ω)-conforming least squares formulation is employed.Even though our model problem in its standard H 1 (Ω) formulation is coercive our methods and lines of proof can most certainly be applied to other problems as well, see [BM19,CQ17] for an application to the Helmholtz equation.The LSFEM is typically quasi-optimality in some problem-dependent energy norm, which is, however, somewhat intractable; a priori error estimates in more familiar norms such as the L 2 (Ω) norm of the scalar variable are thus desirable.Numerical examples in our previous work [BM19] suggested convergence rates in standard norms such as the

Contribution of the present work
Our main contribution are optimal L 2 (Ω) based convergence result for the least squares approximation u h to the scalar variable u.Furthermore, we derive hp error estimates for the gradient of the scalar variable u, which do not seem to be available in the current literature, as well as an hp error estimate for the vector variable ϕ ϕ ϕ in the L 2 (Ω) norm, which is available in the literature for a pure h-version.These optimality results are new in the sense that we achieve optimal convergence rates under minimal regularity assumptions on the data.Here, we call a method optimal in a certain norm, if the norm of the error made by the method is of the same order as the best approximation of the employed space.

Review of related results
In [Jes77] the author considered the classical model problem −∆u = f with inhomogeneous Dirichlet boundary condition u = g in some smooth domain Ω.Unlike the present work the least squares formulation employs vector valued H 1 (Ω) functions instead of H H H(Ω, div) for the vector variable.The corresponding finite element spaces are chosen such that they satisfy simultaneous approximation properties in L 2 (Ω) and H 1 (Ω) for both the scalar variable u and the vector variable ϕ ϕ ϕ.Using a duality argument akin to the one used in the present work the author arrived at the error estimate see [Jes77,Thm. 4.1], where (•, •) b denotes the corresponding energy norm.At this point higher order convergence rates are just a question of approximation properties in (•, •) b , see [Jes77,Lemma 3.1] for a precise statement.As stated after the proof of [Jes77,Thm. 4.1], one can extract optimal convergence rates for sufficiently smooth data f and g.The smoothness of the data is important as the following considerations show: For the case of a smooth boundary Γ and f ∈ L 2 (Ω) and g ∈ H 3/2 (Γ), elliptic regularity gives u ∈ H 2 (Ω).Therefore u can be approximated by globally continuous piecewise polynomials of degree greater or equal to one with a error O(h 2 ) in the L 2 (Ω) norm, which is achieved by classical FEM, due to the Aubin-Nitsche trick.In contrast, the above least squares estimate does not give the desired rate: The norm (ϕ ϕ ϕ − ϕ ϕ ϕ h , u − u h ) b contains a term of the form from which no further convergence rate can be extracted, since f is only in L 2 (Ω).
In [CLMM94] (see also [CMM97b]) the problem −∇ • (A∇u) + Xu = f with uniformly elliptic diffusion matrix A and X a linear differential operator of order at most one together with homogeneous mixed Dirichlet and Neumann boundary conditions was considered.The least squares formulation presented therein employs the same spaces as the present work.Apart from nontrivial norm equivalence results, see [CLMM94,Thm. 3.1], they also derived the following estimate of the least squares approximation u − u h H 1 (Ω) + ϕ ϕ ϕ − ϕ ϕ ϕ h H(div,Ω) h s ( u H s+1 (Ω) + ϕ ϕ ϕ H s+1 (Ω) ), assuming u ∈ H s+1 (Ω) and ϕ ϕ ϕ ∈ H H H s+1 (Ω).This result is then optimal in the stated norm, however, the assumed regularity is somewhat unsatisfactory, in the sense that if the solution u ∈ H s+1 (Ω) then the relation ∇u + ϕ ϕ ϕ = 0 merely provides the regularity ϕ ϕ ϕ ∈ H H H s (Ω) and not the assume regularity ϕ ϕ ϕ ∈ H H H s+1 (Ω).
Finally in [BG05] the same model problem as well as the same least squares formulation is considered.The main goal of [BG05] is to establish L 2 (Ω) error estimates for u and ϕ ϕ ϕ.In [BG05, Lemma 3.4] a result similar to [Jes77,Thm. 4.1] is obtained.This result, however, suffers from the same drawback as elaborated above.Furthermore, they prove optimality of the error of the vector variable ϕ ϕ ϕ in the L 2 (Ω) norm, see [BG05,Cor. 3.7].
The main tools for a priori error estimates in more tractable norms such as L 2 (Ω) instead of the energy norm in a least squares setting are, as it is done in the present paper and the above literature, duality arguments, which lead to an estimate of the form As elaborated above it is not possible to extract the desired optimal rate from this estimate directly.In the proof of one of our main result (Theorem 4.13) we exploit the duality argument in a more delicate way, which allows us to lower the regularity requirements on ϕ ϕ ϕ to what could be expected from the regularity of the data f .Key components in the proof are the div-conforming approximation operators I I I 0 h and I I I h (cf.Lemmas 4.4, 4.7), which are also of independent interest.

Notation
Throughout this work, Ω denotes a bounded simply connected domain in R n , n ∈ {2, 3}, with C ∞ boundary Γ := ∂Ω and outward unit normal vector n n n.Let Γ consist of two disjoint parts Γ D and Γ N .We consider the following spaces: H H H 0 (Ω, div) = {ϕ ϕ ϕ ∈ H H H(Ω, div) : ϕ ϕ ϕ • n n n = 0 on Γ}.For further detail and references see [Mon03,BBF13].Since we will look at a first order system formulation we have two finite element spaces to choose, one for the scalar variable u and one for the vector variable ϕ ϕ ϕ.We consider the following finite element spaces: , where the polynomial approximation of the scalar and vector variable is denoted by p s ≥ 1 and p v ≥ 1 respectively.For brevity we also denote by V V V pv (T h ) either the space RT RT RT pv−1 (T h ) or BDM BDM BDM pv (T h ).The spaces V V V N pv (T h ) and V V V 0 pv (T h ) are denoted analogously.Furthermore, the Nédélec space N N N pv (T h ) is either of type one or two, depending on the choice of V V V pv (T h ).The same convention applies to the spaces with boundary conditions.See again [Mon03] for further details as well as Section 4. Further notational conventions will be: • lower case roman letters like u and v will be reserved for scalar valued functions; • lower case boldface greek letters like ϕ ϕ ϕ and ψ ψ ψ will be reserved for vector valued functions; • K denotes the physical element and K denotes the reference element; • quantities like u h and ϕ ϕ ϕ h will be reserved for functions from the corresponding finite element space, again scalar and vector valued respectively; • if not stated otherwise discrete functions without a • will be in some sense fixed, e.g., resulting from a certain discretization scheme, whereas functions with a • will be arbitrary, e.g., when dealing with quasi-optimality results.

Outline
The outline of this paper is as follows.In Section 2 we introduce the model problem, the first order system least squares (FOSLS) method itself and prove norm equivalence results, which in turn guarantee unique solvability of the continuous as well as the discrete least squares formulation.Section 3 is devoted to the proof of duality results for the scalar variable, the gradient of the scalar variable as well as the vector variable.In the beginning of Section 4 we first exploit the duality result of Section 3 in order to prove L 2 (Ω) error estimates for the scalar variable of the primal as well as the dual problem.We then argue first heuristically that these results are actually suboptimal and can be further improved.To that end we introduce an approximation operator that also satisfies certain orthogonality relations and prove best approximation results for this operator, which are then used to prove our main result (Theorem 4.13).Furthermore, we derive L 2 (Ω) error estimates for the gradient of the scalar variable as well as the vector variable.In Section 5 we present numerical examples showcasing the proved convergence rates, focusing especially on the case of finite Sobolev regularity.

Model problem
Let Γ = ∂Ω consist of two disjoint parts Γ D and Γ N and let f ∈ L 2 (Ω).(Later, we will focus on the special cases Γ = Γ D and Γ = Γ N .)For γ > 0 fixed we consider the following model problem (2.1) We formulate (2.1) a first order system.Introducing the new variable ϕ ϕ ϕ = −∇u we formally arrive at the system we want to solve the equation The least squares approach to this problem is to find where (•, •) Ω denotes the usual L 2 (Ω) scalar product.Introducing now the bilinear form b and the linear functional we can state the mixed weak least squares formulation: To see solvability of (2.5), let u ∈ H 1 D (Ω) be the unique solution of (2.1).In view of f ∈ L 2 (Ω) the pair (−∇u, u) is a solution of (2.5).Uniqueness follows if one can show that b((ϕ ϕ ϕ, u), (ψ ψ ψ, v)) = 0 for all (ψ ψ ψ, v) ∈ H H H N (Ω, div) × H 1 D (Ω) implies (ϕ ϕ ϕ, u) = (0 0 0, 0).To that end we introduce the (yet to be verified) norm • b induced by b: (2.6) A general approach would be to show norm equivalence.In our case: We will employ methods similar to a duality argument in the following Theorem 2.1 to prove such a norm equivalence.
Theorem 2.1 (Norm equivalence).For all (ϕ ϕ ϕ, u) ∈ H H H N (Ω, div) × H 1 D (Ω) there holds the norm equivalence (2.7) from which the second inequality in (2.7) is obvious.For the first inequality, we will now split ϕ ϕ ϕ and u as follows: with yet to be determined functions ϕ ϕ ϕ 1 , ϕ ϕ ϕ 2 , u 1 and u 2 .We observe that ϕ ϕ ϕ = ϕ ϕ ϕ 1 +ϕ ϕ ϕ 2 and u = u 1 +u 2 since the difference solves (2.2) with zero right-hand side, which is only solved by the trivial solution.Simply eliminating ϕ ϕ ϕ 1 and ϕ ϕ ϕ 2 in the above equations, we expect u 1 and u 2 to be solutions to where −∇ • η η η is to be understood as an element of (H 1 D (Ω)) given by F : v → (η η η, ∇v) Ω .Both equations are therefore uniquely solvable.This then determines the desired functions u 1 , u 2 and consequently the functions ϕ ϕ ϕ 1 , ϕ ϕ ϕ 2 , using the second equation in the first order systems.
Let us show that (ϕ ϕ ϕ 1 , u 1 ) solves the above system.By construction it satisfies the differential equations and furthermore, since ϕ ϕ ϕ 1 = −∇u 1 , we have by standard regularity theory ϕ ϕ ϕ 1 Let us show that (ϕ ϕ ϕ 2 , u 2 ) satisfies the above system.Let v ∈ C ∞ 0 (Ω) be arbitrary.Integration by parts and exploiting the weak formulation gives Therefore the div-equation is satisfied.To verify the boundary conditions we calculate for any where we first used Green's theorem, then the equations of the first order system and at last the weak formulation for u 2 .The a priori estimate of the Lax-Milgram theorem gives We now estimate the H H H(Ω, div) norms of ϕ ϕ ϕ 1 and ϕ ϕ ϕ 2 as follows which completes the proof.
Remark 2.2.Theorem 2.1 (norm equivalence) does not hold on all of H H H(Ω, div) × H 1 (Ω) since one can construct non-trivial solutions to the system due to the missing boundary conditions, even though (ϕ ϕ ϕ, u) b = 0 by construction.
Remark 2.4.In the literature there are two main ideas for showing unique solvability when working in a least squares setting concerning a first order system derived from a second order equation: • The first one deduces solvability from the second order equation and uses some weaker coercivity estimates to establish uniqueness, as sketched in Remark 2.3.See also [CQ17,BM19] for these kind of arguments for the Helmholtz equation.
• The second approach is to establish a stronger coercivity estimate as in Theorem 2.1 and directly apply the Lax-Milgram theorem to (2.5), where the right-hand side is a suitable continuous linear functional.See also [CLMM94,CMM97b] concerning the model problem in question and also [CMM97a] for the Stokes equation.

Duality argument
The current section is devoted to duality arguments that are later used for the analysis of the L 2 (Ω) norms of u − u h , ∇(u − u h ), and ϕ ϕ ϕ − ϕ ϕ ϕ h .Since these duality arguments rely heavily on the elliptic shift theorem, we restrict ourself to either the pure Neumann or Dirichlet boundary conditions, i.e., Γ = Γ N or Γ = Γ D respectively.In contrast, when considering mixed boundary conditions one has to expect a singularity at the interface between the Dirichlet and Neumann condition, which has to be properly accounted for in the numerical analysis by graded meshes for both the primal and dual problem.This is beyond the scope of the present work.Our overall agenda is to derive regularity results for the dual solutions, always denoted by (ψ ψ ψ, v).For w ∈ H 1 (Ω) and η ∈ H H H 0 (Ω, div) we prove the existence of dual solutions such that: , see Theorem 3.3.These results are exploited in Section 4 with the special choices of w = u − u h and η η η = ϕ ϕ ϕ − ϕ ϕ ϕ h , respectively.
Theorem 3.1 (Duality argument for the scalar variable).Let Γ be smooth.Then there holds: , and v ∈ H 2 (Ω).Additionally the following estimates hold: The same regularity results and estimates as in (i) hold.Proof.We prove (i).Theorem 2.1 give the existence of a unique (ψ For the regularity assertions, we introduce the auxiliary functions z and µ µ µ by Regularity properties of z and µ µ µ: Regularity properties of z are inferred from a scalar elliptic equation satisfied by z.To that end, we note that (3.1) is equivalent to (3.3) For u = 0 and integrating by parts we find which gives z ∈ H 1 (Ω) as well as µ µ µ = ∇z.Inserting µ µ µ = ∇z and setting ϕ ϕ ϕ = 0 in (3.3) we find Therefore z satisfies, in strong form, and the shift theorem immediately give z ∈ H 2 (Ω) with the estimate z H 2 (Ω) w L 2 (Ω) .Regularity properties of v: Eliminating ψ ψ ψ in (3.2), we discover that v satisfies Regularity properties of ψ ψ ψ: Setting ψ ψ ψ = ∇(z − v), we have found the desired pair (ψ ψ ψ, v) ∈ H H H 0 (Ω, div) × H 1 (Ω).Since ψ ψ ψ = ∇(z − v) we first look at the regularity of z − v. Subtracting the equations (3.4), (3.5) satisfied by z and v respectively we obtain We can therefore conclude and since ∇ • ψ ψ ψ = z − γv, we have which concludes the proof of (i).For the Dirichlet case (ii) the proof is completely analogous by replacing every Neumann boundary condition with a Dirichlet one.
Theorem 3.2 (Duality argument for the gradient of the scalar variable).Let Γ be smooth.Then there holds: , and v ∈ H 1 (Ω).Additionally the following estimates hold: (ii) For Γ = Γ D and any The same regularity results and estimates as in (i) hold.
Proof.We prove (i).Theorem 2.1 give the existence of a unique (ψ For the regularity assertion, we introduce the auxiliary functions z and µ µ µ by Regularity properties of z and µ µ µ: We note that (3.6) is equivalent to For u = 0 and integrating by parts we find which gives µ µ µ = ∇z.Inserting µ µ µ = ∇z and setting ϕ ϕ ϕ = 0 in (3.8) we find which can be solved for z ∈ H 1 (Ω) with the a priori estimate z where −∇ • ∇w ∈ (H 1 (Ω)) is to be understood as the mapping u → (∇u, ∇w) Ω .Regularity of v: Eliminating ψ ψ ψ from (3.7) and using µ µ µ = ∇z, we discover that v satisfies By the Lax-Milgram theorem we find that v ∈ H 1 (Ω) as well as Regularity of ψ ψ ψ: Upon setting ψ ψ ψ = ∇(z − v), we have found the solution (ψ ψ ψ, v) ∈ H H H 0 (Ω, div) × H 1 (Ω) of (3.6).To prove the estimates and regularity results for ψ ψ ψ first note that and therefore by elliptic regularity . Finally since ψ ψ ψ = ∇(z − v) the regularity assertion for ψ ψ ψ ∈ H H H 2 (Ω) follows.For the Dirichlet case (ii) the proof is completely analogous by replacing every Neumann boundary condition with a Dirichlet one.
Theorem 3.3 (Duality argument for the vector valued variable).Let Γ be smooth.Then there holds: Additionally the following estimates hold: (ii) For Γ = Γ D and any The same regularity results and estimates as in (i) hold.Proof.We prove (i).Theorem 2.1 give the existence of a unique (ψ For the regularity assertions, we introduce the auxiliary functions z and µ µ µ by Regularity of z and µ µ µ: (3.10) is equivalent to (3.12) For u = 0 and integrating by parts we find which gives µ µ µ − ∇z = η η η.Inserting µ µ µ = η η η + ∇z and setting ϕ ϕ ϕ = 0 in (3.10) we find Hence, with the understanding that ∇ • η η η means u → (∇u, η η η), the function z solves Thus, z ∈ H 1 (Ω) and setting µ µ µ = η η η + ∇z we find (3.12) to be satisfied.Furthermore, note that where the last inequality following from integration by parts and exploiting the boundary condition η ∈ H H H 0 (Ω, div).
Regularity of v: By eliminating ψ ψ ψ we find that v solves Again by elliptic regularity we find that v ∈ H 3 (Ω) as well as Regularity of ψ ψ ψ: We have ψ ψ ψ = η η η + ∇(z − v), and the regularity of ψ ψ ψ follows from that of z of v.For the Dirichlet case (ii) the proof is completely analogous by replacing every Neumann boundary condition with a Dirichlet one.

Error analysis
The goal of the present section is to establish optimal convergence rates for an hp version of the FOSLS method for the scalar variable, the gradient of the scalar variable as well as the vector variable, all measured in the L 2 (Ω) norm, as long as the polynomial degree of the other variable is chosen appropriately.

Notation, assumptions, and road map of the current section
Throughout we denote by (ϕ ϕ ϕ h , u h ) the least squares approximation of (ϕ ϕ ϕ, u).Furthermore, let e u = u − u h and e e e ϕ ϕ ϕ = ϕ ϕ ϕ − ϕ ϕ ϕ h denote the corresponding error terms.For simplicity we also assume Γ = Γ N , i.e., Γ D = ∅.Furthermore, p will denote the minimum of the two polynomial degrees p s and p v , i.e., p = min(p s , p v ).The overall agenda of the present section is as follows: 1. We start off by proving [BG05, Lemma 3.4] in an hp setting using our duality argument, i.e., the (in our sense) suboptimal L 2 (Ω) estimate h/p (e e e ϕ ϕ ϕ , e u ) b .This is done in Lemma 4.2.In Remark 4.3 we present heuristic arguments that suggest the possibility of optimal L 2 (Ω) convergence rates.These arguments suggest to construct an H H H 0 (Ω, div) conforming approximation operator I I I 0 h with additional orthogonality properties.2. In Lemma 4.4 we prove that the operator I I I 0 h is in fact well defined.As a tool of independent interest we derive certain continuous and discrete Helmholtz decompositions in Lemmata 4.5 and 4.6.These decompositions are then used in Lemma 4.7 to analyze the L 2 (Ω) error of the operator I I I 0 h .3. Next we prove an hp version of [BG05, Lemma 3.6] (an h analysis of e e e ϕ ϕ ϕ in the L 2 (Ω) norm).4. In Theorem 4.11 we exploit the results of Lemma 4.10, which analyzes the convergence rate of the FOSLS approximation of the dual solution for the gradient of the scalar variable, in order to prove new optimal L 2 (Ω) error estimates for ∇e u .

5.
We analyze the convergence rate of the FOSLS approximation of the dual solution in various norms in Lemma 4.12.Finally we prove our main result, Theorem 4.13, which analyzes the convergence of e u in the L 2 (Ω) norm.
6. Closing this section we derive Corollary 4.15, which summarizes the results for general righthand side f ∈ H s (Ω), by exploiting the estimates given by the Theorems 4.9, 4.11 and 4.13 together with the approximation properties of the employed finite element spaces.
Since we are dealing with smooth boundaries we employ curved elements.We make the following assumptions on the triangulation.Assumption 4.1 (quasi-uniform regular meshes).Let K be the reference simplex.Each element map F K : K → K can be written as F K = R K • A K , where A K is an affine map and the maps R K and A K satisfy, for constants C affine , C metric , ρ > 0 independent of K: Here, K = A K ( K) and h K > 0 denotes the element diameter.
On the reference element K we introduce the Raviart-Thomas and Brezzi-Douglas-Marini elements: Note that trivially RT RT RT p−1 ( K) ⊂ BDM BDM BDM p ( K) ⊂ RT RT RT p ( K).We also recall the classical Piola transformation, which is the appropriate change of variables for H H H(Ω, div).For a function ϕ ϕ ϕ : K → R d and the element map The spaces S p (T h ), BDM BDM BDM p (T h ), and RT RT RT p−1 (T h ) are given by standard transformation and (contravariant) Piola transformation of functions on the reference element: For the approximation properties of the H H H(Ω, div) conforming finite element spaces see [BBF13, Proposition 2.5.4] as a standard reference for non-curved elements and without the p-aspect.For an analysis of the hp-version under Assumption 4.1 we refer to [BM19, Section 4].

The standard duality argument
Before formulating various duality arguments, we recall that the conforming least squares approximation (ϕ ϕ ϕ h , u h ) is the best approximation in the • b norm: Lemma 4.2.Let Γ be smooth and (ϕ ϕ ϕ h , u h ) be the least squares approximation of (ϕ ϕ ϕ, u).Furthermore, let e u = u − u h and e e e ϕ ϕ ϕ = ϕ ϕ ϕ − ϕ ϕ ϕ h .Then, for any ũh ∈ S ps (T h ), φ Proof.Apply Theorem 3.1 (duality argument for the scalar variable) with w = e u .For any ṽh ∈ S ps (T h ), ψ ψ ψ h ∈ V V V 0 pv (T h ), we find due to the Galerkin orthogonality and the Cauchy-Schwarz inequality: e u 2 L 2 (Ω) = b((e e e ϕ ϕ ϕ , e u ), (ψ ψ ψ, v)) = b((e e e ϕ ϕ ϕ , e u ), (ψ ≤ (e e e ϕ ϕ ϕ , e u ) b (ψ Using Theorem 2.1 (norm equivalence), and exploiting the regularity results and estimates of Theorem 3.1 as well as the H 1 (Ω) and H H H(Ω, div) conforming operators in [MR20], we can find ṽh ∈ S ps (T h ), ψ where we exploited the regularity for (ψ ψ ψ, v) and the a priori estimates of Theorem 3.1, which proves the first estimate.The second one follows by the fact that the least squares solution is the projection with respect to the scalar product b.Therefore (e e e ϕ ϕ ϕ , e u ) b ≤ (ϕ ϕ ϕ − φ ϕ ϕ h , u − ũh ) b .The result follows by applying the norm equivalence given in Theorem 2.1.
Remark 4.3 (Heuristic arguments for improved L 2 (Ω) convergence).We present an argument why improved convergence of the scalar variable u can be expected.We again start by applying our duality argument and exploit the Galerkin orthogonality as in (4.2) in the proof of Lemma 4.2.Instead of immediately applying the Cauchy-Schwarz inequality we investigate the terms in the b scalar product and analyze the best rate we can expect from the regularity of the dual problem: ) Ω .
Note that the terms are not equilibrated and we cannot expect any rate from the terms marked by .However choosing ( ψ ψ ψ h , ṽh ) to be the least squares approximation (ψ ψ ψ h , v h ) of (ψ ψ ψ, v) and again exploiting the Galerkin orthogonality we have for any (φ ϕ ϕ h , ũh ): ) The improved convergence of the dual solution will be shown in Lemma 4.12.From a best approximation viewpoint the ∇• term involving ϕ ϕ ϕ still has no rate.To be more precise, the second term has the right powers of h resulting in an overall h 2 .Since the term γ(u − ũh ) already has order h 2 we have no problem with that one.The term with the worst rate is Out of the box we cannot find an extra h to get optimal convergence, even though ψ ψ ψ has far more regularity, which we did not exploit yet.We now want to construct an operator I I I 0 h mapping into the conforming finite element space of the vector variable.To exploit the regularity of ψ ψ ψ we insert any ψ ψ ψ h ∈ V V V 0 pv (T h ).We have Note that ψ ψ ψ h − ψ ψ ψ h is a discrete object.If we assume I I I 0 h to satisfy the orthogonality condition Therefore the operator I I I 0 h should satisfy the aforementioned orthogonality condition and have good approximation properties in L 2 (Ω), as needed above.In the following we will construct operators I I I 0 h and I I I h acting on H H H 0 (Ω, div) and H H H(Ω, div) respectively.

The operators I I I 0 h and I I I h
In the spirit of Remark 4.3 a natural choice for the operator I I I 0 h is the following constrained minimization problem The corresponding Lagrange function is and the associated saddle point problem is to find (ϕ Uniqueness is not given since only the divergence of the Lagrange parameter appears.However, by focussing on the divergence of the Lagrange parameter, we can formulate it in the following way: The construction of I I I h is completely analogous, one just drops the zero boundary conditions everywhere. To see that the operator I I I 0 h is well-defined, we have to check the Babuška-Brezzi conditions, see [BBF13].Let us first verify solvability on the continuous level.Coercivity on the kernel: Let µ µ µ ∈ {ψ ψ ψ ∈ H H H 0 (Ω, div) : (∇ • ψ ψ ψ, η) Ω = 0, ∀η ∈ ∇ • H H H 0 (Ω, div)} be given.The coercivity is trivial since by construction (∇ • µ µ µ, ∇ • µ µ µ) Ω = 0 and therefore inf-sup condition: Let η ∈ ∇ • H H H 0 (Ω, div) be given.First let u ∈ H 1 (Ω) with zero average solve By elliptic regularity we have u H 2 (Ω) η L 2 (Ω) and upon defining µ µ µ = −∇u we also have µ µ µ H H H(Ω,div) η L 2 (Ω) .Note that by construction µ µ µ ∈ H H H 0 (Ω, div) as well as which proves the inf-sup condition.
Coercivity on the kernel -discrete: The coercivity is again trivial by the same argument as above.
inf-sup condition -discrete: Let λ h ∈ ∇ • V V V 0 pv (T h ) be given.As above in the continuous case we solve the Poisson problem Let Λ Λ Λ = −∇u and again we have . We now employ the commuting projection based interpolation operators defined in [MR20], especially the global operator Π Π Π div p given in [MR20, Remark 2.10], see also [Roj19,Section 4 We use this operator to project Λ Λ Λ onto the conforming subspace.With Λ Λ Λ h := Π Π Π div, pv Λ Λ Λ we find which finally leads to For any which proves the discrete inf-sup condition.The above arguments can be modified in a straightforward manner when replacing V V V 0 pv (T h ) with V V V pv (T h ) and H H H 0 (Ω, div) with H H H(Ω, div).The only caveate is the fact that one has to replace the homogeneous Neumann boundary condition in the auxiliary problem, used in the verification of the inf-sup condition, by a homogeneous Dirichlet boundary condition.We have therefore proven Lemma 4.4.For any mesh T h satisfying Assumption 4.1, the operators I I I 0 h : H H H 0 (Ω, div) → V V V 0 pv (T h ) and I I I h : H H H(Ω, div) → V V V pv (T h ) are well defined with bounds independent of the mesh size h and the polynomial degree p.They are projections.
We are now going to analyze the approximation properties of the operator I I I 0 h and I I I h in the L 2 (Ω) norm.To that end we need certain decompositions on a continuous as well as a discrete level.

Lemma 4.5 (Continuous and discrete Helmholtz-like decomposition -no boundary conditions). The operators Π Π Π
are well defined.Furthermore, the remainder r r r of the continuous decomposition Proof.For unique solvability of the variational definition of the operators, just note that they are the L 2 (Ω) orthogonal projection on ∇ × H H H(Ω, curl) and ∇ × N N N pv (T h ) respectively.By construction we have (r r r, ∇ × µ µ µ) which by definition gives ∇ × r r r = 0. Furthermore, by the characterization of H H H 0 (Ω, curl) given in [Mon03, Thm.3.33] we have n n n × r r r = 0. Since Π Π Π curl ϕ ϕ ϕ ∈ ∇ × H H H(Ω, curl) we immediately have ∇ • r r r = ∇ • ϕ ϕ ϕ.Exploiting the exact sequence property of the following de Rahm complex in the case that both Ω and Γ are simply connected, we can find R ∈ H 1 0 (Ω) such that r r r = ∇R.Therefore R solves the asserted equation.The Friedrichs inequality and elliptic regularity theory then give the desired results.

By nearly the same arguments we also have a version for zero boundary conditions:
Lemma 4.6 (Continuous and discrete Helmholtz-like decomposition -zero boundary conditions).
are well defined.Furthermore, the remainder r r r of the continuous decomposition as well as r r r ∈ H H H 1 (Ω).Additionally there exists an R ∈ H 2 (Ω) ∩ H 1 (Ω)/R such that r r r = ∇R, where R satisfies Proof.Unique solvability as well as ∇ × r r r = 0 and ∇ • r r r = ∇ • ϕ ϕ ϕ follows by the same arguments as in the proof of Lemma 4.5.Since ϕ ϕ ϕ ∈ H H H 0 (Ω, div) and Π Π Π curl,0 ϕ ϕ ϕ ∈ ∇ × H H H 0 (Ω, curl) ⊂ H H H 0 (Ω, div) we find r r r Again by the exact sequence we can find R ∈ H 1 (Ω) such that r r r = ∇R.Finally since ∂ n R = ∇R • n n n = r r r • n n n = 0, we find that R solves the asserted equation.The Poincaré inequality and elliptic regularity theory then give the desired results.
Lemma 4.7.The operator (4.10) The same estimates hold true for the operator In order to treat the second term we apply Lemma 4.6 and split the discrete object φ given in [MR20, Remark 2.10], see also [Roj19].Let therefore Π Π Π div, By the commuting diagram property of the operator Π Π Π div, pv as well as the projection property we therefore have By the exact sequence property we therefore have Π Π Π div, pv r r r − r r r h ∈ ∇ × N N N 0 pv (T h ).Furthermore, the definition of r r r and r r r h in Lemma 4.6 gives the orthogonality relation r r r − r r r h ⊥ ∇ × N N N 0 pv (T h ).Putting it all together we have r r r − r r r h 2 L 2 (Ω) = (r r r − r r r h , r r r − Π Π Π div, pv r r r) Ω + (r r r − r r r h , Π Π Π div, pv r r r − r r r h ) Ω = (r r r − r r r h , r r r − Π Π Π div, pv r r r) Ω , which by the Cauchy-Schwarz inequality gives r r r − r r r h L 2 (Ω) ≤ r r r − Π Π Π div, pv r r r L 2 (Ω) .
Since ∇ • r r r = ∇ • r r r h is discrete we may apply [MR20, Thm.2.8 (vi)] as well as perform a simple scaling argument to arrive at where the last estimate is due to the a priori estimate of Lemma 4.6.Summarizing we have where the last estimate follows by adding and subtracting ϕ ϕ ϕ, the triangle inequality as well as the second inequality of the present lemma.
Treatment of T 2 : The term T 2 is treated with a duality argument.We select To that end, we note that by Lemma 4.6 we have r r r = ∇R for some R ∈ H 2 (Ω).Therefore for so that the desired ψ ψ ψ is found as ψ ψ ψ = ∇w with w solving Furthermore, since R ∈ H 2 (Ω), elliptic regularity gives w ∈ H 4 (Ω) and therefore ψ ψ ψ ∈ H H H 3 (Ω).
Finally the following estimates hold due to elliptic regularity and the results of Lemma 4.6.We therefore have for any ψ ψ ψ h ∈ V V V 0 pv (T h ) T 2 = (e e e, r r r where we used the definition of T 2 , the duality argument elaborated above, the orthogonality relation of I I I 0 h to insert any ψ ψ ψ h ∈ V V V 0 pv (T h ), and the Cauchy-Schwarz inequality.Finally exploiting the a priori estimate of ψ ψ ψ in (4.11) we find for p v > 1 that In the lowest order case p v = 1 we cannot fully exploit the regularity.However, we find Proceeding as above and using estimate (4.12) we find The last last estimate is due to integration by parts and the boundary condition of φ ϕ ϕ h − I I I 0 h ϕ ϕ ϕ; in fact holds.Putting everything together we have for p v > 1 e e e where the last estimate again follows from inserting ϕ ϕ ϕ and using the second estimate of the present lemma.Young's inequality then yields the result for the operator I I I 0 h .The lowest order case is treated analogous.For the operator I I I h the only difference is that one applies Lemma 4.5 instead of Lemma 4.6 and perform the duality argument on all of H H H(Ω, div) instead of H H H 0 (Ω, div).Here it is important to note that the potential R given by Lemma 4.5 satisfies homogeneous boundary conditions, so that the boundary term vanishes in the partial integration.Remark 4.8.H H H(Ω, div)-conforming approximation operators similar to I I I h and I I I 0 h are presented in [EGSV19], where the focus is on a patchwise construction rather than the (global) orthogonalities (4.3b), (4.4b).
Theorem 4.9.Let Γ be smooth and (ϕ ϕ ϕ h , u h ) be the least squares approximation of (ϕ ϕ ϕ, u).Furthermore, let e u = u − u h and e e e ϕ ϕ ϕ = ϕ ϕ ϕ − ϕ ϕ ϕ h .Then, for any ũh ∈ S ps (T h ), φ e e e ϕ ϕ ϕ Proof.Let (ψ ψ ψ, v) ∈ H H H 0 (Ω, div) × H 1 (Ω) denote the dual solution given by Theorem 3.3 applied to η η η = e e e ϕ ϕ ϕ .Theorem 3.3 gives ψ ψ ψ ∈ L L L 2 (Ω), ∇ • ψ ψ ψ ∈ H 1 (Ω), and v ∈ H 3 (Ω).Due to the Galerkin orthogonality we have for any ( ψ ψ ψ h , ṽh ) e e e ϕ ϕ ϕ 2 L 2 (Ω) = b((e e e ϕ ϕ ϕ , e u ), (ψ ψ ψ, v)) = b((e e e ϕ ϕ ϕ , e u ), (ψ We now estimate all terms in the above: Therefore, we conclude that e e e ϕ ϕ ϕ 2 the limiting term being for now the last one.To overcome the lack of regularity of ψ ψ ψ we perform a Helmholtz decomposition.In fact since ψ ψ ψ ∈ H H H 0 (Ω, div) as well as ∇ • ψ ψ ψ ∈ H 1 (Ω) there exist ρ ρ ρ ∈ H H H 0 (Ω, curl) and z ∈ H 3 (Ω) such that ψ ψ ψ = ∇ × ρ ρ ρ + ∇z.The construction is as follows: Let Since ∇ • (ψ ψ ψ − ∇z) = 0 as well as (ψ ψ ψ − ∇z) • n n n = 0 by construction, the exact sequence property of the employed spaces allows for the existence of ρ ρ ρ ∈ H H H 0 (Ω, curl) such that ψ ψ ψ − ∇z = ∇ × ρ ρ ρ.Finally the following estimates hold due to the a priori estimate of the Lax-Milgram theorem and partial integration for the first estimate, elliptic regularity theory for the second, and the triangle inequality together with the first estimate for the third one: We now continue estimating (4.13) by applying the Helmholtz decomposition.For any ψ Treatment of T g : By the Cauchy-Schwarz inequality we have Treatment of T c : For any φ ϕ ϕ h ∈ V V V 0 pv (T h ) we have Treatment of T c 1 : By the Cauchy-Schwarz inequality we have Treatment of T c 2 : In order to treat T c 2 we proceed as in the proof of Lemma 4.7 and apply Lemma 4.6 to split the discrete object φ ϕ ϕ h − ϕ ϕ ϕ h ∈ V V V 0 pv (T h ) on a discrete and a continuous level: given by Lemma 4.6.Exploiting the definition of the operator Π Π Π curl,0 h we find Treatment of T 1 : With the same notation as in the proof of Lemma 4.7 and with exactly the same arguments we have By the Cauchy-Schwarz inequality we have where the last estimate follows from the fact that for any ρ ρ ρ h ∈ N N N 0 pv (T h ) since it is a projection.Finally inserting ϕ ϕ ϕ and applying the triangle inequality as well as estimating ∇ • (ϕ ϕ ϕ − ϕ ϕ ϕ h ) L 2 (Ω) by (e u , e e e ϕ ϕ ϕ ) b we find Treatment of T 2 : Note again that ρ ρ ρ ∈ H H H 0 (Ω, curl) and the fact that Π Π Π curl,0 h maps into ∇×N N N 0 pv (T h ).Therefore, we can write ∇ × ρ ρ ρ − Π Π Π curl,0 h ∇ × ρ ρ ρ = ∇ × ρ ρ ρ for some ρ ρ ρ ∈ H H H 0 (Ω, curl) and the boundary terms consequently vanish in the following integration by parts Finally, T 2 = 0, since ∇ × r r r = 0 by Lemma 4.6.Collecting all the terms: Collecting the terms together with the estimate e e e ϕ ϕ ϕ L 2 (Ω) from the Helmholtz decomposition and the regularity estimates of Lemma 3.3 we find (e e e ϕ ϕ ϕ , ψ ψ ψ − ψ (e e e ϕ ϕ ϕ , e u ) b e e e ϕ ϕ ϕ L 2 (Ω) . (4.14) Due to the regularity of z ∈ H 3 (Ω) we can find ψ ψ ψ (e e e ϕ ϕ ϕ , e u ) b .
Canceling one power of e e e ϕ ϕ ϕ L 2 (Ω) then yields the first estimate.The second one follows again by the fact that the least squares approximation is the projection with respect to b and the norm equivalence given in Theorem 2.1.
Theorem 4.11.Let Γ be smooth and (ϕ ϕ ϕ h , u h ) be the least squares approximation of (ϕ ϕ ϕ, u).Furthermore, let e u = u − u h .Then, for any φ Proof.As in Remark 4.3 with (e e e ψ ψ ψ , e v ) denoting the error of the FOSLS approximation of the dual solution given by Theorem 3.2 (duality argument for the gradient of the scalar variable) applied to w = e u we have for any φ Canceling one power of ∇e u L 2 (Ω) , collecting the terms, and using the estimate for I I I 0 h we arrive at the asserted estimate.
As a tool in the proof of our main theorem (Theorem 4.13) we need to analyze the error of the FOSLS approximation of the dual solution.This is summarized in Lemma 4.12.Let Γ be smooth and (ϕ ϕ ϕ h , u h ) be the least squares approximation of (ϕ ϕ ϕ, u).Furthermore, let e u = u − u h and e e e ϕ ϕ ϕ = ϕ ϕ ϕ − ϕ ϕ ϕ h .Let (ψ ψ ψ, v) ∈ H H H 0 (Ω, div) × H 1 (Ω) be the solution of the dual problem given by Theorem 3.1 with w = e u .Additionally, let (ψ ψ ψ h , v h ) be the least squares approximation of (ψ ψ ψ, v) and denote e v = v − v h and e e e ψ ψ ψ = ψ ψ ψ − ψ ψ ψ h .Then, Furthermore, e e e ψ ψ ψ where the first estimate holds for any ṽh ∈ S p , ψ ψ ψ h ∈ V V V 0 pv (T h ) and the second one follows with the same arguments as in the proof of Lemma 4.2.By Lemma 4.2 we have e v L 2 (Ω) h/p (e e e ψ ψ ψ , e v ) b , which together with the above gives the second estimate.By Theorem 4.9 we have e e e ψ ψ ψ L 2 (Ω) for any ṽh ∈ S ps (T h ), ψ ψ ψ h ∈ V V V 0 pv (T h ).The result follows immediately by again exploiting the regularity of the dual solution and the approximation properties of the employed spaces.
Theorem 4.13.Let Γ be smooth and (ϕ ϕ ϕ h , u h ) be the least squares approximation of (ϕ ϕ ϕ, u).Furthermore, let e u = u − u h .Then, for any φ Proof.As in Remark 4.3 with (e e e ψ ψ ψ , e v ) denoting the FOSLS approximation of the dual solution given by Theorem 3.1 applied to w = e u we have for any φ We specifically choose φ ϕ ϕ h = I I I 0 h ϕ ϕ ϕ.In the following we heavily use the properties of the operator I I I 0 h given in Lemma 4.7.First we exploit the regularity of the dual solution using Lemma 4.12 as well as the estimates of Theorem 3.1: Canceling one power of e u L 2 (Ω) , collecting the terms, and using the estimate for I I I 0 h we arrive at the asserted estimate.
Remark 4.14.Before stating the general corollary with prescribed right-hand side f ∈ H s (Ω) we highlight the improved convergence result.Consider f ∈ L 2 (Ω).For the classical conforming finite element method one observes convergence O(h 2 ) due to the Aubin-Nitsche trick.More precisely, let u FEM h be the solution to the model problem obtained by classical FEM, then there holds As elaborated in Section 1 this rate could not be obtained for the FOSLS method by previous results, since further regularity of the vector variable ϕ ϕ ϕ would be necessary.
So in fact the optimal rate in the sense of the beginning of Section 4 is achieved.If the lowest order case p v = 1 also achieves optimal order is yet to be answered.Numerical experiments in Section 5, however, indicate it to be true.
We summarize the results for general right-hand side f ∈ H s (Ω).This summary is essentially the estimates given by the Theorems 4.9, 4.11, and 4.13 together with the approximation properties of the employed finite element spaces.
The estimates of the Theorems 4.9, 4.11, and 4.13 together with the above estimates give, after straightforward calculations, the asserted rates.
We close this section with some remarks concerning sharpness of the estimates of Corollary 4.15: inf pv (T h ) = BDM BDM BDM 0 pv (T h ).Excluding the lowest order case p v = 1 we have, choosing p v ≥ p s − 1, sharpness of the estimates for e u and ∇e u .This can be easily seen, since the rates guaranteed by Corollary 4.15 for e u L 2 (Ω) and ∇e u L 2 (Ω) are the same as the ones from a best approximation argument.The estimates are therefore sharp.The lowest order case p v = 1 seems to be suboptimal, as the numerical examples in Section 5 suggest.In all other cases, i.e., p v > 1 and p v < p s − 1, our numerical examples suggest sharpness of the estimates, in both the setting of a smooth solution as well as one with finite Sobolev regularity, but not achieving the best approximation rate.Since in the least squares functional the term ∇u h + ϕ ϕ ϕ h L 2 (Ω) enforces ∇u h and ϕ ϕ ϕ h to be close, it is to be expected that an insufficient choice of p v limits the convergence rate.A theoretical justification concerning the observed rates in the cases p v = 1 as well as p v > 1 and p v < p s − 1 is yet to be studied.In conclusion, when the application in question is concerned with approximation of u or ∇u in the L 2 (Ω) norm, the best possible rate with the smallest number of degrees of freedom is achieved with the choice p v = p s − 1 regardless of the choice of V V V 0 pv (T h ).Therefore, it is computationally favorable to choose Raviart-Thomas elements over Brezzi-Douglas-Marini elements.Turning now to e e e ϕ ϕ ϕ L 2 (Ω) similar arguments guarantee sharpness of the estimates.In this case when p s + 1 ≥ p v and p s + 1 ≥ p v + 1, for the choice of Raviart-Thomas elements and Brezzi-Douglas-Marini elements respectively.Again the other cases are open for theoretical justification.However, both theoretical as well as the numerical examples in Section 5 suggest the choice of Brezzi-Douglas-Marini elements over Raviart-Thomas elements, when application is concerned with approximation of ϕ ϕ ϕ in the L 2 (Ω) norm.√ DOF ∼ 1/h employing V V V 0 pv (T h ) = BDM BDM BDM 0 pv (T h ).
Example 5.2.For our second example we consider again the case of Ω being the unit sphere in R 2 .The exact solution u(x, y) is calculated corresponding to the right-hand side f (x, y) = 1 [0,1/2] ( x 2 + y 2 ).Therefore u ∈ ∩ ε>0 H 5/2−ε (Ω).The numerical results for the choice of Raviart- Thomas elements are plotted in Figure 5.4 for e u L 2 (Ω) , in Figure 5.5 for ∇e u L 2 (Ω) and in Figure 5.6 for e e e ϕ ϕ ϕ L 2 (Ω) .Apart from the remarks already made in Example 5.1 we note that we observe the improved convergence result when dealing with limited Sobolev regularity of the data.Furthermore, in the lowest order case p v = 1 the guaranteed rate seems to be suboptimal.The plots for the choice of Brezzi-Douglas-Marini elements are presented in Appendix A.       √ DOF ∼ 1/h employing V V V 0 pv (T h ) = BDM BDM BDM 0 pv (T h ).
[Jes77, like[BG05, Lemma 3.4] and[Jes77, Thm.4.1] are essentially a duality argument like Theorem 3.1 and the strategy of Lemma 4.2.Without further analysis the estimate of Lemma 4.2, does not give any further powers of h, since the b-norm is equivalent to the H H H(Ω, div) × H 1 (Ω) norm.Theorem 4.13 ensures, at least if the space V V V 0 pv (T h ) is not of lowest order, i.e. p v > 1, that the FOSLS method converges also as O(h 2 ).More precisely, the estimate in Theorem 4.13 together with the approximation properties of the employed finite element spaces and p v > 1 and p s ≥ 1 gives