$hp$-robust multigrid solver on locally refined meshes for FEM discretizations of symmetric elliptic PDEs

In this work, we formulate and analyze a geometric multigrid method for the iterative solution of the discrete systems arising from the finite element discretization of symmetric second-order linear elliptic diffusion problems. We show that the iterative solver contracts the algebraic error robustly with respect to the polynomial degree $p \ge 1$ and the (local) mesh size $h$. We further prove that the built-in algebraic error estimator which comes with the solver is $hp$-robustly equivalent to the algebraic error. The application of the solver within the framework of adaptive finite element methods with quasi-optimal computational cost is outlined. Numerical experiments confirm the theoretical findings.


Introduction
Numerical schemes for PDEs aim at approximating the solution u of the weak formulation with an error below a certain tolerance at minimal computational cost.Since the accuracy is spoiled by singularities, e.g., in given data or domain geometry, adaptive finite element methods (AFEMs) employ the loop

SOLVE ESTIMATE MARK REFINE
adaptive stopping solution accurate enough?
to obtain a sequence of meshes T L that resolve such singularities.For a large class of problems, it is known that AFEM is rate-optimal, i.e., one can construct an estimator η L (u L ) from the exact Galerkin solution u L for the discretization error |||u − u L ||| that decreases with the largest possible rate with respect to the number elements in T L ; see, e.g., the seminal works [Dör96; MNS00; BDD04; Ste07; CKNS08] or the abstract overview [CFPP14] for h-adaptive FEM with fixed polynomial degree p.
In practice, the SOLVE module may become computationally expensive (in contrast to all other modules) when employing a direct solver; see, e.g., [PP20; GHPS21; IP23] for a discussion of implementational aspects.Thus, usually, an iterative solver is employed to compute an approximation u L of u L on each level, and the exact Galerkin solution u L is not available.The question of whether the approximations u L converge with optimal rate with respect to the overall computational cost was already treated in the seminal work [Ste07] under some realistic assumptions about an abstract iterative solver.The recent work [GHPS21] employs nested iterations and an adaptive stopping criterion to steer a uniformly contractive iterative solver, linking the SOLVE and ESTIMATE module in the above scheme by an inner loop.Then, it is shown that even the full sequence of iterates converges with optimal rates with respect to the overall computational cost.For this reason, the design of algebraic solvers that are uniformly contractive and robust with respect to the discretization parameters is of utmost importance.
The hierarchical structure of AFEM and the very nature of the arising linear systems suggest to use a multilevel solver; see, e.g., [Hac85; BMR85; BPS86; BDY88; BPX90; Zha92; Rüd93b; Osw94].Different adaptive methods integrating a multilevel solver are possible; see, e.g., [BB87] for generating local meshes, and [Rüd93a] for a fully adaptive multigrid method that steers the local refinement process.In the context of AFEM, the adaptively constructed hierarchy of locally refined meshes calls for suitable local solvers.We refer to [CNX12] for a multilevel preconditioner on a mesh hierarchy consisting of one bisection in each step and [HWZ12; WZ17] for multiplicative multigrid methods, all of which are robust with respect to the mesh size h.Though these works allow for higherorder FEM, an analytic and numerical study on the behavior with increasing polynomial degree was not presented.This aspect is treated, e.g., in [Pav94; SMPZ08; AMV18; BF22], which design iterative solvers that are robust with respect to the polynomial degree p on various types of polyhedral meshes.The recent own work [MPV21] proposes a p-robust geometric multigrid which comes with a built-in algebraic error estimator ζ L (u L ), which is suited perfectly for a posterori steering (i.e., adaptive termination) of the algebraic solver.However, the employed patchwise smoothing associated to every vertex and every level causes a linear dependence on the number of adaptive mesh levels L.
In the present work, we modify the solver from [MPV21] and overcome this dependence for locally refined meshes: we only apply local lowest-order smoothing on patches which change in the refinement step on intermediate levels, whereas a patchwise (and hence parallelizable) higher-order smoothing on all patches of the finest level is applied.This solver only needs one post-smoothing step, requires no symmetrization of the procedure (see also [DHM + 21]), and, in particular, has no tunable parameters since it utilizes optimal step-sizes on the error-correction stage.As the main result of the present work, we show that the proposed solver uniformly contracts the algebraic error |||u L − u L |||.Moreover, it comes with a built-in estimator ζ L (u L ), which is shown to be equivalent to |||u L − u L |||.Throughout, all involved estimates are robust in the discretization parameters h and p.
As one potential application, we formulate an AFEM algorithm in the spirit of [GHPS21] that naturally embeds the multigrid solver and leverages the solver's built-in algebraic error estimator ζ L (u L ) to stop the solver as soon as the discretization and algebraic error are comparable.Adapting the arguments of [GHPS21], we prove that, for fixed polynomial degree p, the AFEM algorithm guarantees optimal convergence rates with respect to overall computational cost.
Using the open-source object-oriented 2D Matlab code MooAFEM [IP23], we present a detailed numerical study of both the algebraic solver and the adaptive algorithm, including higher-order experiments and jumping coefficients.
The outline of this paper reads as follows: Section 2 first poses the model problem and introduces some notation.Then, we state the proposed multigrid solver (Algorithm A) and formulate our main results on hp-robust contraction (Theorem 4) and algebraic error control (Corollary 5).As a potential application, Section 3 formulates an AFEM algorithm (Algorithm B) which employs nested iteration and an adaptive stopping criterion for the iterative solver using the built-in a posteriori estimator for the algebraic error.
Theorem 7 proves optimal computational complexity of the proposed AFEM algorithm.After we confirm the theoretical results by numerical examples in Section 4, we present proofs of the main results in Section 5.For better readability, we precede these proofs with three subsections presenting their core arguments: geometric properties of the meshes T L , an hp-robust stable decomposition combining a local lowest-order multilevel stable decomposition from [WZ17] with a one-level p-robust decomposition from [SMPZ08], and a strengthened Cauchy-Schwarz inequality in the spirit of [CNX12; HWZ12].

hp-robust multigrid solver
In this section, we formulate the model problem, the proposed geometric multigrid method, and the main results, while the proofs are postponed to Section 5.
where K ∈ [L ∞ (Ω)] d×d sym is the symmetric and uniformly positive definite diffusion coefficient.More precisely, given a conforming simplicial triangulation T h of Ω into compact simplices, we have K| T ∈ [W 1,∞ (T )] d×d for all T ∈ T h .For x ∈ Ω we denote the maximal and minimal eigenvalue of K(x) ∈ R d×d sym by λ max (K(x)) and λ min (K(x)), respectively, and define Λ max := ess sup x∈Ω λ max (K(x)) as well as Λ min := ess inf x∈Ω λ min (K(x)).With ⟨• , •⟩ ω denoting the usual L 2 (ω)-scalar product for a measurable subset ω ⊆ Ω, the weak formulation of (1) seeks u ∈ We note that ⟪• , •⟫ Ω is a scalar product and the induced semi-norm |||u||| 2 Ω := ⟪u , u⟫ Ω is an equivalent norm on V. Therefore, the Lax-Milgram lemma yields existence and uniqueness of the weak solution u ∈ V.For ω = Ω, we omit the index ω throughout.
To discretize (2), denote for a polynomial degree p ≥ 1 and a triangle T ∈ T h the space of all polynomials on T of degree at most p with P p (T ) and define With the definition Clearly, the formulation of the discrete problem (4) hinges on the choice of the mesh T h , which directly influences the quality of u h as an approximation of u .Note that (4) can be rewritten as a symmetric and positive definite linear system by introducing a basis of V p h .However, we opt to work instead with the functional basis-independent description.2.2.Mesh and space hierarchy.We suppose that the refinement strategy in the module REFINE is newest vertex bisection (NVB); see, e.g., [Tra97;Ste08] and Figure 1 for an illustration in 2D.
Let T 0 be the conforming initial mesh.We refer to [Ste08] for NVB with admissible T 0 and d ≥ 2, to [KPP13] for NVB with general T 0 for d = 2, and to the recent work [DGS23] for NVB with general T 0 in any dimension.Throughout, we suppose that T 0 is admissible.In the 1D case, [AFF + 13] splits each element into two children of half-length and additionally ensures that any two neighboring elements have uniformly comparable diameter.Let T := T(T 0 ) be the set of all refinements of T 0 that can be obtained by arbitrarily many steps of NVB.
From now on, suppose that we are given a sequence {T } L =0 ⊂ T of successively refined triangulations, i.e., for all = 1, . . ., L, it holds that T = REFINE(T −1 , M −1 ) is the coarsest conforming triangulation obtained by NVB, where all marked elements M −1 ⊆ T −1 have been refined by (at least) one bisection.We note that NVB-refinement generates meshes that are uniformly γ-shape regular, i.e., max where γ depends only on T 0 and is, in particular, independent of L and the meshes T 1 , . . ., T L ; see, e.g., [Ste08, Theorem 2.1] for d ≥ 2 or [AFF + 13] for d = 1.We note that (5a) implies (5b) for d ≥ 2, while (5a) is trivial with γ = 1 and independent of (5b) for d = 1.In addition, we define the quasi-uniformity constant For each mesh T , let V denote the set of vertices.Given a vertex z ∈ V , we denote by T ,z := T ∈ T : z ∈ T the patch of elements of T that share the vertex z.The corresponding (open) patch subdomain is denoted by ω ,z := interior( T ∈T ,z T ) and its size by h ,z := max T ∈T ,z h T := max T ∈T ,z |T | 1/d .Finally, we denote by V + the set of new vertices in T and the pre-existing vertices of T −1 whose associated patches have shrunk in size in the refinement step , i.e., While this notation is used in the analysis of the solver below, the presentation of Algorithm A is more compact with the abbreviation N = V + for = 1, . . ., L − 1 and N L := V + L for p = 1 and N L := V L otherwise, where we recall that p ∈ N is the fixed polynomial degree of the FEM ansatz functions.
From the definition of the discrete FEM spaces (3) and NVB-refinement, we see that there holds nestedness Furthermore, we require the local spaces V q ,z := S q 0 (T ,z ) for all vertices z ∈ V and q ∈ {1, p}, where we use q = 1 for = 0, . . ., L − 1 and q = p for = L; see Figure 2 for the illustration of the degrees of freedom for p = 2. 2.3.Multigrid solver.In the following, we introduce a local geometric multigrid method, which will serve as iterative solver within the SOLVE module of an adaptive FEM algorithm.Each full step of the proposed multigrid method can be mathematically described by an iteration operator Φ : V p L → V p L , i.e., given the current approximation u L ∈ V p L , the solver generates the new iterate Φ(u L ) ∈ V p L .The main ingredients in the solver construction are an inexpensive global residual solve on T 0 and local residual solves on all patches T ,z for z ∈ V + on the intermediate levels = 1, . . ., L − 1 and all patches on the finest level T L when p > 1.For ease of notation, we define the algebraic residual functional R L : To construct the new iterate Φ(u L ), levelwise residual liftings of the algebraic error are added to the current approximation u L .The same levelwise residual liftings are used to define an a posteriori error estimator ζ L (u L ) for the algebraic error, i.e., the solver comes with a built-in estimator.
Algorithm A (One step of the optimal local multigrid solver).Input: Current approximation u L ∈ V p L , meshes {T } L =0 , polynomial degree p ∈ N. Solver step: Perform the following steps (i)-(ii): (i) Global lowest-order residual problem on the coarsest level: • Define step-size λ 0 := 1.
Remark 1 (Construction of the new iterate).The construction of Φ(u L ) from u L by Algorithm A can be seen as one iteration of a V-cycle multigrid with no pre-and one postsmoothing step, and a step-size at the error correction stage.The smoother on each level is additive Schwarz associated to patch subdomains where the local problems (11) are defined.This is equivalent to diagonal Jacobi smoothing for p = 1 (e.g., on intermediate levels) and block-Jacobi smoothing for p > 1 (e.g., on the finest level).The choice and use of the step-sizes λ in Algorithm A (ii) comes from a line-search approach; see, e.g., [MPV21,Lemma 4.3] and one of the earlier works [Hei88].However, if the step-size from the line-search is too large, we use instead a fixed damping parameter offsetting the d + 1 patch overlaps.We note that this case never occurred in practice in any of our numerical experiments.
Remark 2 (Computational effort and speed of convergence).We note that we apply a patchwise Cholesky factorization on the finest level.Hence, the computational effort for the local residual solve on the finest mesh T L in dependence on the polynomial degree p is of order O(p 3d #T L ).The presented algorithm is a linear method.One could symmetrize the procedure by adding one pre-smoothing step to define a preconditioner in the hope of accelerating convergence with the help of conjugate gradients.However, in our experience, the patchwise pre-smoothing typically did not yield considerable algebraic error decrease; see, e.g.[DHM + 21], while still doubling the number of smoothing operations of a V-cycle.The remaining steps needed to compute the new approximation stem from classical multigrid solvers (such as intergrid operators).We stress that the overall effort does not depend on the number of levels L.
Remark 3 (Nested iterations).In the context of adaptive FEM, the solver does not start from an arbitrary initial guess on each newly-refined mesh but from the final approximation of the previous level (see Algorithm B below).This will ensure a posteriori error control in each step after initialization as well as optimal computational cost.From the algebraic solver perspective, such an approach can be seen as a full multigrid method over the evolving hierarchy of meshes whose number of cycles is determined by the adaptive stopping criterion.

Main result.
This subsection formulates the main results regarding the iterative solver stating the contraction of the multigrid solver and reliability of the built-in a posteriori estimator of the algebraic error.Both results hold robustly in the number of levels L and the polynomial degree p. Theorem 4. Let u L ∈ V p L be the (unknown) finite element solution of (4) and let v L ∈ V p L be arbitrary.Let Φ(v L ) ∈ V p L and ζ L (v L ) be generated by Algorithm A. Then, the solver iterates and the estimator are connected by Moreover, the solver contracts the error, i.e., there exists 0 < q ctr < 1 such that Finally, the estimator is a two-sided bound of the algebraic error, i.e., there exists The contraction and reliability constants q ctr and C rel depend only on the space dimension d, the γ-shape regularity (5), the quasi-uniformity constant C qu from (6), Λ max /Λ min , and In particular, q ctr is independent of the polynomial degree p, the number of mesh levels L, and the meshes T 1 , . . ., T L .
Corollary 5.The reliability of the estimator in ( 14) is equivalent to the solver contraction (13).In particular, this also yields that Remark 6.We note that (12) holds with equality whenever the step-size criterion s ≤ d + 1 in Algorithm A(ii) are fulfilled and the construction is thus done by optimal-line search.In such a case, which was always satisfied in all our numerical tests, a Pythagoras identity in the spirit of [MPV21, Theorem 4.7] yielding exact algebraic error decrease is obtained.

Application to adaptive FEM with inexact solver
Given a coarse mesh T 0 , we use an adaptive finite element method (AFEM) to generate locally refined meshes {T L } L∈N tailored to the behavior of the sought solution.In the spirit of [GHPS21], Algorithm B presents such an approach with an adaptively stopped iterative solver, where Step (Ii) exploits the built-in a posteriori estimator of the geometric multigrid solver from Section 2.
While we note that the present Algorithm B and the corresponding Theorem 7 are restricted to fixed polynomial degree p, the inclusion of the proposed hp-robust iterative solver into the hp-adaptive FEM algorithm of [CNSV17] remains for future research, since the mathematical understanding of hp-adaptive FEM is still widely open.

Algorithm B (AFEM with iterative solver).
Input: Initial mesh T 0 , polynomial degree p ∈ N, adaptivity parameters 0 < θ ≤ 1, C mark ≥ 1, and µ > 0, initial guess u 0 0 := 0. Adaptive loop: repeat the following steps (I)-(III) for all L = 0, 1, 2, . . .: (I) SOLVE & ESTIMATE: repeat the following steps (i)-(iii) for all k = 1, 2, 3, . . .: (i) Do one step of the algebraic solver to obtain u k L ∈ V p L and an associated (II) MARK: Determine a set of marked elements M L ⊆ T L of (up to the multiplicative constant C mark ) minimal cardinality that satisfies (III) REFINE: Generate the new mesh T L+1 := REFINE(M L , T L ) and define u 0 L+1 := u L .Output: Sequences of successively refined triangulations T L , discrete approximations u L and corresponding error estimators Mesh-refinement is steered by the discretization error estimator.For all T ∈ T h , let η h (T ; •) : V p h → R ≥0 be the local contributions of the standard residual error estimator defined by where • ω denote the appropriate L 2 (ω)-norms.We define To abbreviate notation, let One important consequence of Theorem 4 is optimal convergence of Algorithm B with respect to computational complexity.To formulate this mathematically, we define the ordered set Before we state the theorem, we introduce the notion of approximation classes.For T ∈ T and s > 0 define with Galerkin solution u opt and estimator η opt on the optimal triangulation T opt ∈ T N (T 0 ), where T N (T 0 ) := T h ∈ T(T 0 ) : #T h − #T 0 ≤ N .By reliability (A3) of the estimator, see, e.g., [CFPP14], the sum on the right-hand side of ( 17) is equivalent to η opt (u opt ).If u As < ∞, then we say that rate s is possible.In [GHPS21], it is shown that in the case of a contractive solver, convergence rates with respect to degrees of freedom are equivalent to convergence rates with respect to computational complexity.We abbreviate with cost(L, k) the total costs of Algorithm B defined by Theorem 7. Let {T L } L∈N 0 be the sequence generated by Algorithm B and define the quasi-error by Then, for all parameters 0 < θ ≤ 1 and µ > 0, it holds that Furthermore, there exist 0 < θ ≤ 1, and µ > 0 such that, for sufficiently small parameters 0 < θ < θ and 0 < µ/θ < µ , and for all s > 0, it holds that The constants c opt , C opt > 0 depend only on the polynomial degree p, the initial triangulation T 0 , Λ max /Λ min , max T ∈T L div(K) L ∞ (T ) /Λ min , the rate s, the ratios θ/θ and µ/(θµ ), and the properties of newest vertex bisection.In particular, this proves the equivalence which proves optimal complexity of Algorithm B.
Remark 8. We note that in [GHPS21, Theorem 8], the constant c opt > 0 additionally depends on the stopping index L in the case the algorithm terminates after a finite number of mesh levels L < ∞ or the estimator satisfies η L (u L ) = 0.The refined analysis in the recent work [BIM + 23] removes this dependence.
Remark 9. We note that it is also possible to use the same stopping criterion for the algebraic solver as in [GHPS21, Algorithm 2].However, since the multigrid solver from Algorithm A has a built-in estimator for the algebraic error, we opt for its choice within Algorithm B instead.

Numerical experiments
This section investigates the numerical performance of the proposed multigrid solver of Algorithm A and the adaptive Algorithm B. The Matlab implementation of the multigrid solver is embedded into the MooAFEM 1 framework from [IP23].Throughout, we choose the marking parameter θ = 0.5 in the adaptive Algorithm B and f = (0, 0) .We introduce the following test case: • L-shaped domain.Let Ω = (−1, 1) 2 \ [0, 1] × [−1, 0] with right-hand side f = 1 and K = I.
4.1.Contraction and performance of local multigrid solver.We confirm numerically our main results from Theorem 4. In order to study the algebraic solver and its built-in estimator with respect to different polynomial degrees, we take µ = 10 −5 in Algorithm B, thus oversolving the algebraic problem.Moreover, we stop the adaptive algorithm once the final mesh consists of 10 6 degrees of freedom.Note that thanks to Corollary 5 proving the equivalence of the reliability of the algebraic error estimator with the contraction of the algebraic solver, we indeed only need to investigate numerically the existence of the p-robust bound on the contraction of the solver.In Figure 3 (left), we present the maximal contraction factors on each level L of the adaptive algorithm from Algorithm B. We see that the contraction factors are robust in the polynomial degree p with an upper bound of about 0.7 in all our experiments.In Figure 3 (right), we see that on a fixed number of levels (L=10) even for higher-order polynomials their behavior is clustered around similar values.Moreover, from a purely solver-centric perspective, we see that the solver variant which employs higher-order smoothing also on the intermediate levels (and not only on the finest one) as studied in [MPV21] only leads to slight improvements of the contraction constants.Adapting the arguments of [MPV21], this modified construction can be guaranteed to be contractive with p-robust, but linearly L-dependent contraction bound on the algebraic error.However, this degradation with increasing L is not seen in practice, provided that the patchwise smoothing is done everywhere for level L = 1 (as new degrees of freedom are added on all patches when the polynomial degree is p > 1) and local patchwise smoothing is employed in the remaining levels.We present a comparison of the resulting contraction factors of this approach to Algorithm A for a fixed number of level (L = 10) in Figure 3(right).History plot of the contraction factors q ctr from (13) for various polynomial degrees p with parameter µ = 10 −5 for the presented polynomial hierarchy from (7) in the adaptive algorithm from Algorithm B stopping once the final mesh consists of 10 6 degrees of freedom (left) and the comparison with polynomial hierarchy motivated by [MPV21] with localized smoothing for a fixed number of levels L = 10 (right).
4.2.Optimality of the adaptive algorithm.We take µ = 0.1 in Algorithm B and study the decrease of the discretization error estimator η L (u L ), both in terms of number of degrees of freedom and timing.We remark that the error estimator η L (u L ) on the final iterates is equivalent to the quasi-error ∆ L .After a pre-asymptotic phase, we see in Figure 4A and 4B for different polynomial degrees p that the optimal convergence rate −p/2 is recovered both with respect to number of degrees of freedom and computational time, and the singularity at the reentrant corner (0, 0) is resolved through local mesh refinement.Furthermore, Figure 5 shows that the proposed multigrid solver behaves faster than the built-in direct solver (Matlab backslash operator) concerning the time per dof.The displayed timings include the setup of the linear system, the time for the solver module, computation of estimator, and mesh refinement.Overall, the numerical experiments in Figure 5 validate the linear complexity of the suggested local multigrid solver from Algorithm A.
In Table 1, we see the optimal convergence of the discretization estimator with the optimal rate −1/2 for p = 1 as well as −1 for p = 2 for both diffusion coefficients regardless of the jump size.We stress that the discontinuity in the diffusion coefficient does not affect the optimality of the proposed adaptive algorithm and the iteration numbers remain uniformly bounded as displayed in Table 2.
Both test cases exhibit singularities due to jumps in the diffusion coefficient; however, the jump can be much higher for two neighboring elements in the checkerboard case.In this case, near the cross point (1/2, 1/2), the jump is of order 10 k from one element to the next, which coincides with the jump from the highest to the lowest value of K on the whole domain.For the striped test case, the jump between two neighboring elements History plot of the cumulative computational time and the relative computational time per degree of freedom for the polynomial degrees p = 1 and p = 4.We compare the overall time with the direct solve (square) to the overall time of the AFEM algorithm with the multigrid solver (diamond).In particular, the displayed times include setup, marking, and mesh refinement.belonging to different "stripes" is of order 10, even if the global jump in the diffusion (for non-neighboring elements) is of order 10 2 k −1 .This gives us the tools to observe numerically if the performance of our method only depends on local jumps in the diffusion coefficient.

Proofs
Below we present proofs of intermediate results leading to our main Theorem 4 of L-and p-robust contraction of the multigrid solver and the L-and p-robust two-sided bound of the algebraic error by the built-in a posteriori estimator.We emphasize that this result improves the recent work [MPV21] by removing the L-dependence.From an algorithmic point of view, this is done by applying local smoothing only on patches which change in the refinement step on lowest-order levels instead of on every patch as was the case in [MPV21].From an analysis point of view, L-robustness is achieved thanks to the strengthened Cauchy-Schwarz inequality on bisection-generated meshes (Proposition 16) building on the property that the levelwise overlap of the smoothed patches stays uniformly bounded.The next essential ingredient to prove the main result is an hp-stable decomposition on bisection generated meshes (Proposition 14), then one combines the results carefully together with the simple but crucial observation of uniform boundedness in the number of overlapping patches for a fixed level (Lemma 10) and bounds on the step-sizes and the levelwise solver update (Lemma 11).

Auxiliary results.
We start with the simple observation that the number of overlapping patches is uniformly bounded.
Lemma 10 (Finite patch overlap).For all T ∈ T , there holds Therefore, for all q ∈ N, it holds that Similar arguments show that Proof.The overlap ( 23) is clear from the geometry of the elements in the mesh.For all = 0, . . ., L, the discrete Cauchy-Schwarz inequality and (23) lead to This concludes the proof.
Next, we present bounds on the step-size and the levelwise solver update.
Lemma 11.For all ∈ {1, . . ., L}, we have Moreover, we have upper and lower bounds for the step-sizes, Proof.
Step 1: Proof of (26 Step 2: Proof of (26) in the remaining cases.We use the finite overlap of the patches in Lemma 10 to obtain Step 3: Proof of (27).For ∈ {1, . . ., L−1}, the upper bound is guaranteed by definition of λ .The lower bound for ∈ {1, . . ., L} is trivial if λ = 1/(d + 1).Otherwise, it is a consequence of the finite patch overlap: This concludes the proof.
In the next two subsections, we combine existing results from the literature to obtain a multilevel hp-robust stable decomposition and a strengthened Cauchy-Schwarz inequality for our setting of bisection-generated meshes.These will be crucial for the proofs of Theorem 4 and Corollary 5 in Subsection 5.4 below.

Multilevel hp-robust stable decomposition on NVB-generated meshes.
We start by recalling the one-level p-robust stable decomposition from Section 3.4 and Section 4.3 in [SMPZ08] for d = 2 and d = 3, respectively.
Lemma 12 (p-robust one level decomposition).Let v L ∈ V p L .Then, there exists a decomposition which is stable in the sense of The constant C OL depends only on the space dimension d, the γ-shape regularity (5), and the quasi-uniformity constant C qu from (6).
Similarly, we recall the local multilevel decomposition for piecewise affine functions proven in [WZ17, Lemma 3.1].In order to present this stable decomposition in a form that is more suitable for our forthcoming analysis, we add a short proof for completeness.
Lemma 13 (h-robust local multilevel decomposition for lowest-order functions).
Then, there exists a decomposition which is stable in the sense of The constant C ML depends only on the space dimension d, the γ-shape regularity (5), and the quasi-uniformity constant C qu from (6).
For fixed and z ∈ V + , the equivalence of norms on finite-dimensional spaces proves where the hidden constants depend only on γ-shape regularity (5).To obtain stability of the decomposition (32), we use an inverse inequality on the patches and the stability proved in [WZ17, Lemma 3.7]: This concludes the proof.
The combination of the two previous lemmas, done similarly in [MPV20, Proposition 7.6] for a non-local and hence not h-robust solver, leads to the following hp-robust decomposition.

Proposition 14 (hp-robust local multilevel decomposition
and this decomposition is stable in the sense of The constant C SD ≥ 1 depends only on the space dimension d, γ-shape regularity (5), the quasi-uniformity constant C qu from (6), and the ratio of Λ max and Λ min .
Proof.Let v L ∈ V p L .We begin with the decomposition of v L by (28), then continue with the further decomposition of the lowest-order contribution v 1 L in a multilevel way (30): For the finest level, it holds that A combination of the two estimates shows that Hence, the decomposition (34) is stable with (C SD ) 2 := max{1, C 2 ML }C 2 OL (d + 1) with respect to the H 1 (Ω)-seminorm.Taking into account the variations of the diffusion coefficient K, we obtain (35) with the stability constant C SD := C SD Λ max /Λ min .

Strengthened Cauchy-Schwarz inequality on NVB-generated meshes.
The following results are proved in the spirit of [HWZ12;CNX12].Note that the setting of this work is similar to [HWZ12], and unlike [CNX12], the underlying adaptive meshes of the space hierarchy are not restricted to one bisection per level.
For analysis purposes, we introduce a sequence of uniformly refined triangulations indicated by { T j } M j=0 such that T j+1 := refine( T j , T j ) and T 0 = T 0 , where refine enforces one bisection per element.According to [Ste08], admissibility of T 0 ensures that indeed each element T ∈ T j is bisected only once into two children T , T ∈ T j+1 .In the following, we will indicate the equivalent notation to Section 2 on uniform triangulations T j with a hat, e.g., V 1 j is the equivalent of V 1 on the uniformly refined mesh T j .The connection of the uniformly refined meshes and their adaptively generated counterpart requires further notation.For a given level 0 ≤ ≤ L and a given node z ∈ V , we define the generation g ,z of the patch by the maximum number of times an element of the patch has been bisected where T 0 ∈ T 0 denotes the unique ancestor element of T ∈ T .Define the maximal generation M = max z∈V L g L,z .First, we present the following result for uniformly refined meshes and then exploit this for our setting of adaptively refined meshes.
Proof.We begin by splitting the domain Ω into elementwise components, applying integration by parts, and using the Cauchy-Schwarz inequality.Note that the restriction of u i to any element T ∈ T i is an affine function, and hence the second derivatives vanish.Thus, it holds with the outer normal n to ∂T that Due to K ∈ W 1,∞ (T ), the fact that u i , v j are piecewise affine, a discrete trace inequality, and h −1 i 1, we get Moreover, note that due to uniform refinement, we have equivalence δ j−i = (2 −1/2 ) j−i h j / h i 1/2 and h j ≤ h i .Using the last equation multiplied by 1 = h 1/2 j h −1/2 j , we derive that This concludes the proof.
The last result enables us to tackle the setting of adaptively-refined meshes.
Proof.Let M ∈ N. The proof consists of five steps.
Step 1.First note that, for any 0 < δ < 1 and x i , y i > 0 with 0 ≤ i ≤ M , there holds To see this, we change the summation order accordingly and use the Cauchy-Schwarz inequality to obtain .
The geometric series then proves the claim (39).
This set allows to track how large the levelwise overlap of patches with the same generation is.Crucially, the cardinality of these sets is uniformly bounded by max see, e.g., [WC06, Lemma 3.1] in the two-dimensional setting with arguments that transfer to three dimensions.The constant C lev solely depends on γ-shape regularity (5).
Step 3. We introduce a way to reorder the patch contributions by generations (36).Note that, for any 0 Once the generation constraint is introduced, one can shift the perspective from summing over "adaptive" levels and associated vertices to summing over "uniform" vertices and only the (finitely many, cf.(41)) levels where each vertex satisfies the generation constraint, i.e., for 0 ≤ ≤ ≤ L and 0 ≤ j ≤ M , the two following sets coincide Step 4. According to γ-shape regularity (5), all elements in the patch have comparable size depending on C qu from (6).If g ,z = j, (at least) one element T ∈ T ,z satisfies T ∈ T j and it follows that In particular, there exists C eq > 0 such that Step 5. We proceed to prove the main estimate (38).The central feature of the following approach is to introduce additional sums over the generations with generation constraints, i.e., there holds for every admissible , k, that We abbreviate the terms as S 1 ( , k) and S 2 ( , k), respectively.A change of the summation of order i and j yields for S 1 ( , k) that Summing S 2 ( , k) over all and k and changing the order of summation, we obtain Combining these two identities with (42), we see that We define the last two terms as S 1 and S 2 , respectively.Since the second term S 2 is treated in the same way, we only present detailed estimations of the first term S 1 .The strengthened Cauchy-Schwarz inequality (37) for functions defined on uniform meshes followed by the patch overlap (24) leads to The identity (42) and the finite levelwise overlap (41) show The equivalence of mesh sizes from (43) and a Poincaré-inequality prove A combination of (42) with ( 25) and (41), followed again by (42), yields Thus, we obtain the bound Combining all estimates, together with the geometric series bound (39), confirms Finally, the result (38) is obtained after summing together with the analogous estimations coming from the remaining term S 2 and taking into consideration the variations of the diffusion coefficient K so that the result holds for the energy norm.This concludes the proof.

Proof of the main results.
For the sake of a concise presentation, we only consider the case p > 1.The case p = 1 is already covered in the literature [CNX12; WZ17] and follows from our proof with only minor modifications.
Proof of Theorem 4, connection of solver and estimator (12).The proof consists of two steps.
Step 1.We show that there holds the identity Indeed, note that σ = k=0 λ k ρ k .By definition of the local lowest-order problems in (10) and (11) as well as the definition of ρ = z∈V + ρ ,z , we have Thus, by expanding the square, we have This proves the identity (44). Step and the choice of λ L in Algorithm A, we have For the first term it holds that Combining the last two estimates with the definition of ζ L (v L ) in Algorithm A, we obtain This concludes the proof of (12).
Proof of Theorem 4, lower bound in (14).The relation between the solver and the estimator given in (12) shows that ζ L (v L ) ≤ |||u L − v L |||.
Proof of Theorem 4, upper bound in (14).We use the stable decomposition of Proposition 14 on the algebraic error u Note that σ = k=0 λ k ρ k for all = 0, . . ., L; see Algorithm A. We use (45) to develop = ⟪ρ 0 , v 0 ⟫+ L−1 Expanding σ = ρ 0 + k=1 λ ρ and rearranging the terms finally leads to Note that, until this point, only equalities are used.In the following, we will estimate each of the constituting terms of the algebraic error using Young's inequality in the form ab ≤ (α/2) a 2 + (2α) −1 b 2 with α = 4C 2 SD , the strengthened Cauchy-Schwarz inequality, and patch overlap arguments as done in the proof of Lemma 10.Using the fact that λ 0 = 1 and the decomposition of the error u L − v L = v 0 + L−1 =1 z∈V + v ,z + z∈V L v L,z , we see that the first term yields For the second term, we obtain that For the fourth term, we have Finally, to treat the last term where higher-order terms appear together with a sum over levels, we proceed similarly as in [CNX12, Proof of Theorem 4.8] and obtain For the first term of the last bound, we have that Summing all the estimates of the algebraic error components and defining the constant C 2 rel := max{1/2, C 2 SD (d + 1) 2 + C 2 SCS + 2 C SCS (d + 1) 1/2 }, we see that After rearranging the terms, we finally obtain that This proves the upper bound of (14) and thus concludes the proof of Theorem 4.

Figure 1 .
Figure 1.Schematic of 2D NVB refinement pattern: For each triangle T ∈ T , there is one fixed refinement edge E T indicated by the extra pink line.The pink dots indicate edges that are marked for refinement.If an element is marked for refinement, at least its refinement edge is marked for refinement (top).Iterated bisection refines a marked element into 2, 3, or 4 children (bottom).

Figure 2 .
Figure 2. Illustration of degrees of freedom (p = 2) for the space V p L,z

|Figure 3 .
Figure3.Contraction of the algebraic solver.History plot of the contraction factors q ctr from (13) for various polynomial degrees p with parameter µ = 10 −5 for the presented polynomial hierarchy from (7) in the adaptive algorithm from Algorithm B stopping once the final mesh consists of 10 6 degrees of freedom (left) and the comparison with polynomial hierarchy motivated by[MPV21] with localized smoothing for a fixed number of levels L = 10 (right).

Figure 4 .
Figure 4. Optimality of AFEM on L-shape.The convergence history plot of the discretization error estimator η L (u L ) with respect to the total computational cost (left) and the cumulative computational time (right).

||Figure 5 .
Figure5.Optimality of the local multigrid solver.History plot of the cumulative computational time and the relative computational time per degree of freedom for the polynomial degrees p = 1 and p = 4.We compare the overall time with the direct solve (square) to the overall time of the AFEM algorithm with the multigrid solver (diamond).In particular, the displayed times include setup, marking, and mesh refinement.

Figure 7 .
Figure7.Optimality of AFEM for jumping diffusion.The convergence history plot of the discretization error estimator η L (u L ) for polynomial degree p = 2 with respect to the total computational cost for the checkerboard diffusion (left) and the stripe diffusion (right).
we obtain the decomposition (34).It remains to show that this decomposition is stable (35).First, we have for the coarsest level that