A new entropy-variable-based discretization method for minimum entropy moment approximations of linear kinetic equations

In this contribution we derive and analyze a new numerical method for kinetic equations based on a variable transformation of the moment approximation. Classical minimum-entropy moment closures are a class of reduced models for kinetic equations that conserve many of the fundamental physical properties of solutions. However, their practical use is limited by their high computational cost, as an optimization problem has to be solved for every cell in the space-time grid. In addition, implementation of numerical solvers for these models is hampered by the fact that the optimization problems are only well-defined if the moment vectors stay within the realizable set. For the same reason, further reducing these models by, e.g., reduced-basis methods is not a simple task. Our new method overcomes these disadvantages of classical approaches. The transformation is performed on the semi-discretized level which makes them applicable to a wide range of kinetic schemes and replaces the nonlinear optimization problems by inversion of the positive-definite Hessian matrix. As a result, the new scheme gets rid of the realizability-related problems. Moreover, a discrete entropy law can be enforced by modifying the time stepping scheme. Our numerical experiments demonstrate that our new method is often several times faster than the standard optimization-based scheme.


Introduction
Kinetic equations play an important role in many physical applications. One of the earliest and most prominent examples is the Boltzmann equation which was derived by the Austrian physicist Ludwig Boltzmann in 1872 [9] and still forms the basis for the kinetic theory of rarefied gases. The Boltzmann equation or similar kinetic equations proved to be applicable not only to classical gases but also to electron transport in solids and plasmas, neutron transport in nuclear reactors, photon transport in superfluids and radiative transfer, among others [13,35,[39][40][41]. More recently, kinetic equations were also derived in the context of biological modelling, e.g., for studying cell movement or wolf migration [10,27,30].
While analytic solutions can be derived in some special cases [22], usually kinetic equations have to be solved numerically. Due to their high dimensionality, directly solving kinetic equations with standard discretizations (e.g., finite difference methods) is often infeasible or restricted to very small grid sizes. For that reason, a variety of specialized approximate methods have been developed, many of which belong to the class of moment methods. Instead of computing the whole kinetic density function, moment approximations choose a set of weight functions (usually polynomials up to some order) on the velocity space and only track the weighted velocity averages (called moments) of the kinetic density with respect to these functions. This is usually done by performing a Galerkin projection of the original kinetic equation to the linear span of the weight functions. In general, the resulting moment equations are not closed and thus an ansatz for the velocity distribution has to be made. Choosing a linear combination of the weight functions gives the widely used P N closure [39], where N is the degree of the highest-order moments in the model. The P N closure results in linear equations, is simple to implement and often gives reasonable results. However, it does not guarantee non-negativity of the approximated kinetic density. This sometimes leads to physically meaningless solutions, as the P N solutions can, e.g., contain negative values for the local particle density.
The so-called minimum-entropy moment models M N [18,45] avoid these problems by choosing the ansatz function such that it minimizes an entropy functional which usually models the (negative) physical entropy. The resulting closed system of equations is hyperbolic and dissipates the chosen entropy [38]. However, numerically solving the M N equations requires the solution of a non-linear optimization problem at every point on the space-time grid. Although the optimization problems can be solved in parallel [4,29,34,52], the computational cost for high moment orders still is prohibitively high in practical applications. Another drawback of the entropy-based moment closures is that the optimization problem is solvable only for so-called realizable moment vectors, i.e., vectors that actually are moments of a positive density function. As explicit descriptions of the set of realizable moment vectors are usually not available, discretizations (especially of higher order) often struggle to keep the approximate solutions realizable [1, 14,46,53,57,60,66].
A partial remedy for the high computational cost of the minimum entropy models could be additional model reduction, for example via reduced basis methods [47]. These methods generate a reduced description of the (discretized) equations first and then use this reduced model to perform the actual computations. In some cases, e.g., if a given kinetic equation has to be solved many times for different parameters, this reduces overall computation time by several orders of magnitude. Generating the reduced model is usually done by constructing a low-dimensional linear subspace from solution trajectories and then projecting the problem to this subspace. This has been successfully done for the P N models [31]. In the context of minimum-entropy moment models, however, this procedure is problematic as it does not preserve realizability, which may render the reduced model useless as it does not admit a solution.
Checking realizability is much easier when using piecewise linear bases instead of the standard polynomial basis on the whole velocity space [19,20,49,59,[62][63][64]. In addition, the computational cost is significantly lower for these models. However, solving the optimization problems is still costly compared to linear models and maintaining realizability still requires additional limiters [62].
Another approach to fix the realizability issues is to introduce a regularization of the optimization problem [2]. The regularized problem admits a solution also for moments vectors that are not realizable and maintains most of the desirable properties of the original problem, at the cost of an additional approximation error (which, however, can be controlled by the regularization parameter). However, this approach still requires the solution of the (regularized) minimum entropy problem in each cell of the space-time grid.
In this paper, we will present a new discretization scheme for the minimum-entropy moment equations based on a transformation of the semi-discretized equations to entropy variables. The new scheme replaces the non-linear optimization problems by matrix inversions and inherently guarantees realizability. As a consequence, it avoids many of the problems described above. In addition, the new scheme is often significantly faster than the untransformed scheme and shows improved parallel scaling. Moreover, a discrete entropy law can be enforced for the new scheme by using a relaxed Runge-Kutta method. On the downside, adaptive timestepping is strictly needed for the transformed scheme. Moreover, numerically singular Hessian matrices will result in a failure of the scheme if no additional regularization is employed. However, we did not encounter such a situation during our extensive numerical experiments (despite the fact that the untransformed reference scheme had to use regularization in several of the tests). This paper is organized as follows. First, in Section 2 we shortly recall the necessary background on minimum entropy moment models. In Section 3, the new scheme is presented and analysed. In Section 4, we give an outline of our implementation which is then used for the extensive numerical investigations in Section 5.

Kinetic transport equation
We consider the linear transport equation ∂ t ψ + Ω · ∇ x ψ + σ a ψ = σ s C (ψ) + Q, (2.1a) which describes the density of particles with speed Ω ∈ S 2 at position x ∈ X ⊆ R 3 and time t ∈ T = [0, t end ] under the events of scattering (proportional to σ s (t, x) ≥ 0), absorption (proportional to σ a (t, x) ≥ 0) and emission (proportional to Q (t, x, Ω) ≥ 0). The equation is supplemented with initial condition and Dirichlet boundary conditions: for t ∈ T, x ∈ ∂X, n · Ω < 0, (2.1c) where ψ t=0 and ψ b are given functions and n is the outward unit normal vector in x ∈ ∂X. For simplicity, we will consider isotropic scattering isotropic time-independent source Q (t, x, Ω) = Q(x) and time-independent scattering σ s (t, x) = σ s (x) and absorption σ a (t, x) = σ a (x).
As a one-dimensional simplification, we will also consider the models in slab geometry, which is a projection of the sphere onto the z-axis [65]. The transport equation under consideration then has the form ∂ t ψ + µ∂ z ψ + σ a ψ = σ s C (ψ) + Q, t ∈ T, z ∈ X, µ ∈ [−1, 1]. (2.4)

The moment approximation
In the following, V will always denote the angular domain, i.e., V = [−1, 1] in slab geometry and V = S 2 in the three-dimensional case, and Ω will denote the corresponding angular variable. Moreover, we will use angle brackets to denote integration over V , i.e., f := V f (Ω) dΩ for all f ∈ L 1 (V ).
Due to the high-dimensionality, directly discretizing and solving (2.1) via standard numerical schemes is usually not viable. We will thus consider moment approximations of (2.1). These models transfer the kinetic equation to a coupled system of PDEs for weighted velocity averages (moments) of the solution.
Definition 2.1. The vector of functions b : V → R n consisting of n linearly-independent basis functions b l ∈ L 1 (V ), l ∈ { 0, . . . n − 1 }, is called a moment basis. The moments u b,ψ = (u 0 , . . . , u n−1 ) T ∈ R n with respect to the basis b of a given density function ψ ∈ L 1 (V ) are then defined by u b,ψ := bψ , (2.5) where the integration is performed component-wise. Furthermore, the vector u iso b := b is called the isotropic moment.
Definition 2.2. The quantity ρ ψ := ψ ∈ R is called the local particle density of the function ψ. If we assume that there exists a vector α 1 b ∈ R n such that we have α 1 b · u b,ψ = α 1 b · bψ = ψ = ρ ψ . Hence, we define the local particle density of the moment vector u ∈ R n with respect to the basis b as (2.7) Remark 2.3. The vector α 1 b exists for all bases regarded in this paper (see Section 2.4).
In the following, if basis b or density function ψ are clear from the context, we will usually omit the corresponding subscripts.
Equations for the moments u can be obtained by multiplying (2.1) with b and integrating over V , yielding b∂ t ψ + b (Ω · ∇ x ψ) + σ a bψ = σ s b C (ψ) + bQ .
Collecting known terms, and interchanging integration and differentiation where possible, the moment system has the form For isotropic collision operator (2.2), the scattering term becomes where I ∈ R n×n is the unit matrix and is the matrix mapping the moment vector u to the isotropic moment vector with the same density Consequently, for isotropic scattering, (2.8) simplifies to where σ t = σ a + σ s is the total cross section.
However, even in the isotropic case, the transport term b (Ω · ∇ x ψ) usually cannot be given explicitly in terms of u. For non-isotropic scattering operator, the same applies to the scattering term. Therefore, additional assumptions have to be made to close the unknown terms. A common approach is to replace ψ in (2.11) by a moment-dependent ansatzψ u,b , resulting in a closed system of non-linear equations for u: (2.14) Remark 2.4. We will always assume that the ansatz exactly reproduces the moments, i.e., bψ u,b = u.
Note that this may not be fulfilled by regularized moment approximations as regarded, e.g., in [2].
It remains to specify the basis functions and the ansatz densityψ u,b . In the following, we will often omit the b-dependency of the ansatz function and only writeψ u if the basis is clear from the context.

Minimum-entropy closure
For the minimum entropy closure [21,38,44,45], we choose a strictly convex and twice continuously differentiable entropy density function η : D ⊂ R → R and demand that the ansatz function minimizes the entropy functional under the moment constraints bψ = u. (2.16) Here, the minimum is simply taken over all functions ψ = ψ(Ω) such that H(ψ) is well-defined, i.e.
This problem, which must be solved over the space-time mesh, is typically solved through its strictly convex finite-dimensional dual, where η * is the Legendre dual of η. The first-order necessary conditions for the multipliers α b,η (u) show that the solution to (2.17), if it exists, has the form where η * is the derivative of η * .
Remark 2.5. In principle, the new scheme described in Section 3 could be used in the same way with other physically relevant entropies, e.g. the Bose-Einstein entropy In this case, the ansatz distribution is given bŷ To ensure positivity, we thus have to keep the multipliers α BE in the basis-dependent set i.e., other than in the Maxwell-Boltzmann case, we again have to deal with realizability (compare Section 2.5), also in the transformed scheme. Depending on the basis, this might significantly complicate the implementation.
Using the entropy-based closure, the moment system (2.12) is hyperbolic and dissipates the chosen entropy [58] (for σ a = Q = 0) form an entropy-entropy flux pair in the sense of hyperbolic systems.

Basis functions
We will consider three options for the basis functions b: the full moment basis f N , the hat function basis h n and the partial moment basis p n .

Full moment basis
The full moment basis f N is the standard choice and consists of polynomials of up to order N , resulting in n = N + 1 and n = (N + 1) 2 basis functions in one and three dimensions, respectively. We will use Legendre polynomials in slab geometry and real spherical harmonics in the full three-dimensional setting. In one dimension, the isotropic moment is u iso f N = f N = (2, 0, 0, . . . , 0) T and the multiplier α 1 f N can be chosen as α 1 f N = (1, 0, 0, . . . , 0) T . In three dimensions, we have u iso f N = ( √ 4π, 0, 0, . . . , 0) T = α 1 f N . Definition 2.6. The minimum-entropy moment models using the f N will be called M N models, where N is the maximal polynomial order of the basis functions.

First-order finite-element bases
Models using the full moment basis show optimal (spectral) convergence for smooth problems. For nonsmooth problems, however, instead of increasing the polynomial order N , it might be better to keep N fixed and regard piecewise polynomials on increasingly refined partitions of the domain. We will here restrict ourselves to piecewise linear bases (N = 1) which avoid many of the performance and realizability problems of the classical polynomial models [62,63]. In the following, we will shortly state the definitions of the first-order bases. For a more detailed introduction see [63].
To define the first-order bases, we choose a partition P dividing the velocity domain V into intervals (slab geometry) or spherical triangles (three dimensions). Let n v and n e be the number of nodes (vertices) and elements (intervals or spherical triangles) of this partition, respectively.
The first basis of interest, the hat function basis h n , consists of n = n v continuous basis functions h l which, similar to the linear basis typically used in the continuous finite element method, fulfill the partition of unity property, i.e. n−1 l=0 h l ≡ 1, and the Lagrange property, i.e. each basis function evaluates to 1 at one node of the partition and to 0 at all other nodes.
The partial moment basis p n , on the other hand, is defined in analogy to the discontinuous finite element method and consists of the n = 2n e or n = 4n e (in one and three dimensions, respectively) basis functions for the space of piecewise linear functions on P that may be discontinuous between elements of the partition.
In three dimensions, the triangulation P will be obtained by dyadic refinement of the octants of the sphere V = S 2 , i.e. the coarsest triangulation contains the eight spherical triangles obtained by projecting the octahedron with vertices {(±1, 0, 0) T , (0, ±1, 0) T , (0, 0, ±1) T } to the sphere and finer partitions are obtained by iteratively subdividing each spherical triangle into four new ones, adding vertices at the midpoints of the triangle edges. After r refinements, we thus obtain n v (r) = 4 r+1 + 2 vertices and n e (r) = 2 · 4 r+1 spherical triangles.
To get a three-dimensional equivalent of the continuous hat function basis (2.24), we consider basis functions defined using spherical barycentric coordinates [12,36,51]. On each spherical triangle > K ∈ P all elements of h n are defined to be zero except for the three basis functions associated with the vertices of > K. Denoting these vertices as Ω 1 , Ω 2 , Ω 3 ∈ S 2 , the values of the corresponding basis functions h 1 , h 2 , h 3 are defined by requiring that h l (Ω m ) = δ lm (Lagrange property) for l, m ∈ { 1, 2, 3 }, and that, for every point Ω ∈ > K, h 1 (Ω) + h 2 (Ω) + h 3 (Ω) = 1 (partition of unity) and Ω ∈ > K is the Riemannian center of mass with weights h l (Ω) and nodes Ω l .
As in one dimension, the resulting basis functions are non-negative. Due to the partition of unity property we again have α 1 hn = (1, . . . , 1) T .
Definition 2.7. The minimum-entropy moment models using the h n and p n basis will be called HFM n and PMM n models, respectively, where n is the number of moments (or equivalently, the number of basis functions, i.e. the length of the vectors of functions h n and p n ).

Realizability
Since the dual problem ( For Maxwell-Boltzmann entropy, it can be shown [33,63] that this set is equal to the positively realizable set R + b := { u ∈ R n | ∃ψ ∈ L 1 + (V ) such that u = bψ }. (2.26) and that the map α b,η : R + b → R n given by (2.18) is a diffeomorphism with inverse map Vectors u ∈ R + b will be called realizable. Similar to the ansatz function, we will often omit one or all of the subscripts of u and α if the corresponding dependencies are clear from the context. Motivated by the mapping (2.27), in the following, we will also refer to the multipliers α as the entropy variables (or transformed variables) and to the realizable moments u as standard or original variables.
The realizable set is a convex cone that, depending on the choice of basis b, may have a complicated structure. For example, a moment vector u is realizable with respect to the full-moment basis f N if some (u-dependent) Hankel matrices are positive definite [16]. This criterion is hard to test in practice, especially for large polynomial order N . As a consequence, given a moment vector u ∈ R n , it can be very difficult to check whether u is realizable and even more difficult to compute a projection to the realizable set. This is a major problem for numerical solvers which have to ensure that the approximate solutions stay realizable during the whole solution process since otherwise the minimum entropy optimization problems are ill-posed.
In contrast, the realizability conditions for the piecewise linear bases are quite simple [63]. In particular, a moment vector is realizable with respect to h n if and only if all its entries are positive [58,63]: In this case, distinguishing realizable from non-realizable vectors is easy. Still, as we will see in the next section, also for the piecewise linear bases we have to take some extra measures (in particular, restrict the time step size) to ensure that the numerical solutions are always realizable. Moreover, for higher-order numerical schemes or reduced order-models, maintaining realizability at all times (without introducing large errors) is still challenging, also for the hat function models.
Remark 2.8. Realizability is further complicated by the fact that we usually cannot solve the velocity integrals analytically and have to approximate them by a numerical quadrature Q. For the full moments, this can have a severe impact on the realizable set [1, 4], i.e. the numerically realizable set R Q b obtained by replacing the integral in (2.26) by its quadrature approximation significantly differs from the realizable set R + b . In the following, for notational simplicity, we will neglect the quadrature-related complications and assume that the integrals are evaluated exactly.

Standard finite volume discretization
We consider two discretization schemes for the moment equations (2.12), a standard finite volume scheme presented in this section and a new scheme based on the identification (2.27) between realizable set and R N (see Section 3). For simplicity, we will restrict ourselves to first-order schemes.
The reference scheme is a standard first-order finite volume scheme. Let G = { T i | i ∈ I G = { 0, . . . , n x −1 } } be a numerical grid for the spatial domain X with n x elements such that Integrating (2.12) over a grid cell T i and dividing by |T i | gives Using the midpoint rule to approximate the source term where x i is the centre of grid cell T i , we arrive at By applying the divergence theorem, we obtain where N (i) is the set of all indices of neighbors of T i , S ij = T i ∩T j is the interface between grid cells T i and T j , n ij is the unit outer normal of T i on S ij and the flux matrix is given as Replacing the flux term by a numerical flux g ij on S ij , we get the semidiscrete form In principle, we could use any numerical flux for hyperbolic equations, e.g. the Lax-Friedrichs flux. We will, however, use a numerical flux which is specifically designed for the equations under consideration. Define the two half integrals In the following, we will omit the normal vector if it is clear from the context and write, e.g., · + instead of · +,n in these cases. The kinetic flux is defined as [20,23,29,61] Using the kinetic flux, the semidiscrete form (2.29) becomes We will then use an explicit one-step scheme for the time discretization. For example, an explicit Euler discretization gives the fully discrete form where u κ i is the approximation of u i at time step κ and we defined The scheme (2.32) requires the solution of the minimization problem (2.21) in every time step on each grid cell. The initial values thus have to be realizable and we have to limit the time step ∆t to ensure that the scheme yields realizable moments.
Theorem 2.9. The numerical scheme (2.32) using a structured cubic grid with equally-sized grid cells with edge length ∆x is realizability-preserving under the CFL-like condition ∆t < 1 Proof. We will generalize the proof of [58, Corollary 3.17] to several dimensions. Let u κ i be realizable. By (2.31), (2.14) and the definition of G iso b (compare (2.9)), we have where (Ω · n ij ) + = max(Ω · n ij , 0) and (Ω · n ij ) − = min(Ω · n ij , 0). We have to show that ψ κ+1 i is positive for all Ω ∈ V under the time step restriction (2.34). Neglecting non-negative terms (remember that σ s , σ t and Q are assumed to be non-negative), we arrive at (Ω · n ij ) where we used that Ω 2 ≤ 1 for all Ω ∈ V holds true both for V = [−1, 1] and for V = S 2 . Consequently, (2.35) becomes which is positive if (2.34) holds.
The time step restriction due to the cross-section σ t can be avoided, e.g. by using implicit-explicit methods where the source term is treated implicitly [55][56][57]62]. As in [62], we will use a second-order Strang splitting scheme for the split system i.e., in each time step from t to t + ∆t we first solve the (linear) source system (2.36b) analytically (see Section 4) up to the time t + ∆t 2 , then we use the result as input to solve the hyperbolic part (2.36a) with a full timestep ∆t, then we again advance (2.36b) analytically for a half time step ∆t 2 . As the source system is solved analytically (and thus preserves realizability), we only have to ensure that realizability is preserved when solving the hyperbolic part. We can thus avoid the time step restriction due to the cross-section σ t . Note that we assumed in the proof of Theorem 2.9 that the optimization problems are solved exactly (by using the exact ansatz functionsψ u κ i ). In practice, we have to solve these problems numerically which inevitably introduces some numerical errors. Fortunately, we can account for these inexact solutions by using a slightly tighter time step restriction as long as we can control the relative error in the ansatz function (compare [3,4,62]). Corollary 2.11. Let ε γ ∈ (0, 1). If the numerical solution α(u) of the minimum-entropy problem (2.18) fulfils for all u for which the problem has to be solved during the finite volume scheme, then Corollary 2.10 still holds if the time step restrictions are tightened to We can ensure that (2.38) holds by appropriately choosing the stopping criterion for the Newton scheme that is used to solve the minimum entropy optimization problems (see Section 4). We will always use ε γ = 0.1 which means that we scale the time step restrictions (2.37) by a safety factor of 0.9.
If we advance the hyperbolic system (2.36a) in time by strong-stability preserving Runge-Kutta schemes [25] the CFL condition (2.37) still holds as these schemes consist of convex combinations of forward Euler steps.
In particular, we will use Heun's method in all tests which is of second order.

New scheme in transformed variables
We will now present the new scheme in transformed variables which uses the identification between the realizable set R + b and R n given by the diffeomorphism (2.27).

Semidiscrete formulation
To derive the transformed scheme, note that is the (positive definite) Hessian of the objective function in the dual problem (2.18) (compare Section 4.1.2). Further, for the flux In transformed variables, assuming u is sufficiently smooth, the hyperbolic system of equations (2.12) thus becomes [38] s On the other hand, if we perform the space discretization first and then transform the semi-discrete equation (2.29) to the new variables, we arrive at where α i = α(u i ) are the multipliers corresponding to the finite volume averaged moment u i via the inverse diffeomorphism (2.18).
Since the space discretization is inherited from the conservative form (2.29), we now only have to discretize in time and can expect that the moments of the solution converge to the corresponding solution of the non-transformed scheme. In the following, we will show that this is indeed the case (see Section 3.3).
Remark 3.1. A similar idea has been used in [52] to efficiently solve an semi-implicit version of the standard finite volume scheme (2.32) with Lax-Friedrichs flux. In fact, the scheme presented in the following is equivalent to using the approach in [52] with explicit time discretization and only performing a single step of the Newton iteration. The authors in [52] restrict their investigation to slab geometry and note that further research is needed to examine the efficiency of their scheme in higher dimensions where the Jacobians of the coupled system are not block-tridiagonal anymore. In contrast, using the fully explicit approach considered here the grid cells decouple and the equations can be solved independently for each grid cell, also in several dimensions.

Entropy stability on the semi-discrete level
On the semidiscrete level, the new scheme is just a variable transformation of the standard scheme. We can thus show that the solutions of (3.5) also inherit a semidiscrete version of the entropy-dissipation property (2.23).
Proof. The negativity of the entropy contribution by the scattering term σ s (G iso b − I)u can be shown exactly as in the continuous case, see, e.g., [58]. The remaining part −σ a u + bQ of the source term directly gives the right-hand side of (3.6) after multiplication by α i (as done below). In the following, for notational simplicity, we will thus only regard the flux term. To that end, first note that where we used that η • η * is the identity by definition of the Legendre transform (see, e.g., [50]). Further, the Legendre transform fulfills the relation Multiplying the semi-discrete equation (3.5) by α i , we thus obtain (using the kinetic flux (2.30) and neglecting the source term) where we used for the estimate that the integral · − is defined such that Ω · n ij is negative and that . For a hyperrectangular grid, for each interface of grid cell T i with outer normal n ij the opposite interface has the same area and outer normal −n ij . Thus, the (Ω · n ij )η * (α i · b) terms cancel out which finally gives (3.6).
Remark 3.3. The entropy density η is strictly convex and thus attains its minimum at ψ with η (ψ) = 0. Consequently, for ψ =ψ u(α) , the entropy density is This explains why absorption decreases the entropy for α · b > 0 and increases the entropy for α · b < 0 (see the right-hand side of (3.6)). Similarly, a source of particles Q(Ω) with velocity Ω increases the entropy if α · b(Ω) > 0 and decreases it otherwise.
Remark 3.4. Since the two schemes are related by the transformation (2.27) on the semidiscrete level, the entropy dissipation law (3.6) also holds for the standard unsplit finite volume scheme if we replace u(α i ) by u i and α i by α(u i ).

Time discretization
To get a fully discrete numerical scheme, we still have to choose a time discretization for (3.5). To avoid having to solve a large coupled non-linear system of equations in each time step, we will only consider explicit schemes. Using, for example, the explicit Euler scheme and the kinetic flux (2.30), the fully discrete form of (3.5) becomes where the update term u ↑ i has been defined in (2.33). However, a time discretization using fixed time steps might not be a suitable choice for the transformed scheme.
Example 3.5. Consider exemplarily the plane-source test (compare Section 5) using an equidistant grid with n x = 240 elements for the domain X = [−1.2, 1.2]. We thus have ∆x = 2.4 240 = 10 −2 . The initial value in this test is a small isotropic vacuum density ψ vac = 5 · 10 −7 plus a Dirac delta at x = 0 which is split into the grid cells T left = T 119 and T right = T 120 adjacent to 0. The initial values for the finite volume scheme thus are given as The initial values are thus isotropic with local particle densities ρ i := ρ(u 0 i ) given as ρ 119 = ρ 120 = 50 = 100 and ρ j = 5 · 10 −7 = 10 −6 for j / ∈ {119, 120}. The multipliers corresponding to the isotropic moment with local particle density ρ are with corresponding ansatz function where we used again that η and η * are inverse functions. If we now focus on cell T 121 and ignore the source term s(x 121 , u(α 121 )), the update (3.9) takes the form where the approximation in the last step is based on the observation that ρ 121 = ρ 122 = 10 −6 are much smaller than ρ 120 = 100 (see above). Inserting the definitions of the Hessian matrix (4.3) and the values of ∆x, ρ 120 and ρ 121 we finally obtain For the full-moment M N models using Legendre polynomials as a basis, the mass matrix M is the unit matrix and the basis functions b are approximately of unit order. We thus see that the update term is in the order of 10 10 which is why a very small time step ∆t has to be chosen initially to limit the time stepping error.
The example shows that the transformed scheme is expected to require very small time steps in some instances, e.g. whenever there are large differences in the particle density between adjacent cells (which is initially the case for all our numerical tests, see Section 5). On the other hand, the time step does not have to be restricted to ensure realizability which may allow for time steps that are even larger than those used in the standard scheme in some situations. A time stepping scheme using a fixed time step ∆t would thus be very inefficient for the new scheme.
Instead, we will use the Runge-Kutta method by Bogacki and Shampine [8] which adaptively chooses the time step according to an embedded error estimate (see Section 4.2.1 for details). This way, we can use large time steps where possible without introducing uncontrollable errors in time regions where a small time step is required. If we define the Runge-Kutta update takes the form with stages β p i given by Here, s is the number of stages in the Runge-Kutta schemeǎ pq ,b p are the Runge-Kutta coefficients and we neglected the varying time step for notational simplicity.

Convergence properties
The new scheme calculates approximate solutions in transformed (α) variables. However, we are usually interested in the solution in original (u) variables. We can easily obtain such a solution by applying the diffeomorphism (2.27) to the transformed solution. However, a priori, it might be possible that this transformation amplifies the discretization errors and destroys the accuracy of the scheme. Luckily, we can show that the order of convergence of the time stepping scheme is preserved by the transformation.
. If the spectral norm H of the Hessian is bounded, then the corresponding moments converge with the same order, i.e.
Proof. Using a zeroth order Taylor approximation with Lagrange form of the remainder, we have Due to the boundedness of H, we further have For arbitrary α, we cannot expect the upper bound on H required by Theorem 3.6 to hold. However, note that For Maxwell-Boltzmann entropy, we have η * = η * and thus is the local particle density corresponding to the multipliers α. Due to the conservation properties, the local particle density remains bounded for the solutions of the moment equations. As a consequence, H(α) should be bounded for α close enough to a solution of (3.5).
Once we have a r-th order time discretization for (3.5), we thus can expect the corresponding moments to converge with the same order in practice, at least for time steps ∆t small enough. However, the order of convergence of Runge-Kutta schemes is usually shown under the assumption of Lipschitz continuity of the ordinary differential equation's right-hand side (see, e.g., [28,Theorem II.3.4]). In our case, we have to show Lipschitz continuity of the functions α ↑ i . Under some additional assumptions, we indeed obtain Lipschitz continuity for multipliers α in a domain A.
The proof is technical and can be found in Appendix A.

Regularization
As discussed above, the bound (3.14b) on the local particle density ρ can be expected to hold in practice. Unfortunately, this is not true for the lower bound on H . Considering again (3.13), we see that we cannot expect the lower bound to hold as long as η * is not bounded from below. In particular, for Maxwell-Boltzmann entropy, (3.14) would require a lower bound on the ansatz densityψ u(αi) . Even if such a lower bound exists for the exact solution, it might be extremely small. Factoring in numerical errors, it is clear that in practice this bound will not always hold. In addition, H may be very badly conditioned, up to the point that it might be numerically singular.
For the standard scheme, where H shows up as the Hessian of the objective function in the minimum entropy optimization problem (2.18), we thus use several regularization techniques to ensure that the Newton scheme always converges (see Section 4.1.5). While most of these techniques are not easily carried over to the transformed scheme, we here investigate two approaches to regularize the new scheme.

Isotropic regularization of the Hessian
As a first approach, we regularize the Hessian matrix by adding a small multiple of the mass matrix M = bb T . Exemplarily, for the explicit Euler scheme, the update formula (3.9) then becomes This corresponds to adding a small isotropic particle density to the derivative of the ansatz function in the Hessian and ensures that the lower bound in (3.14a) holds for the regularized matrix.
To analyze the introduced error, note that without regularization the update in the Euler scheme is with remainder of order O((∆t) 2 ). Using the regularization (3.15), the local truncation error becomes Remember that the terms in u ↑,κ i , except for the constant source term bQ , all scale with eitherψ u(α κ i ) or ψ u(α κ j ) , j ∈ N (i). Thus, if α κ i corresponds to an ansatz densityψ u(α κ i ) which is significantly greater than , the additional error term does not negatively impact the rate of convergence as long as we choose as O(∆t). Similarly, ifψ u(α κ i ) , ψ u(α κ j ) and Q are small (more precisely, O( )), the additional error is of order O(∆t 1 ) = O(∆t ). However, ifψ u(α κ i ) is small, but at least one ofψ u(α κ j ) and Q is not, we obtain an additional error of O(∆t 1 1) = O(∆t). In this case, i.e. if the particle density in grid cell T i is small and there is a strong source or an influx from an neighbor entity T i , the regularization (3.15) might negatively affect the solution. For higher-order time stepping schemes, the error analysis gets much more involved, but we expect similar results, i.e. significant regularization errors only in regions with large differences in particle densities between adjacent cells.

Direct constraints for the entropy variables
Instead of modifying the Hessian, we might try to directly enforce the boundedness of the entropy variables required in Corollary 3.8. For example, we could replace all multipliers α with α > α max for some α max ∈ R >0 by the closest multiplier α reg with α reg ≤ α max .
We tested this approach in several numerical tests and found that, for general bases b, the regularization either introduced large errors into the solution (for α max small) or did not improve the numerical stability (for large α max ).
A notable exception is the hat function basis h n . Since the hat functions are non-negative, remembering that the ansatz density has the form exp(α ·h n ) (for Maxwell-Boltzmann entropy), we see that large positive entries of α correspond to large ansatz densities and large (in absolute values) negative entries correspond to small ansatz densities. Thus, if we enforce a lower bound on the entries of α by replacing all entries α l < α min by α min for some negative α min ∈ R, we would expect in general that the introduced error is small since we are only replacing entries corresponding to very small ansatz densities by entries corresponding to slightly larger but still very small densities.
Indeed, in our numerical tests, the error is very small (see Section 5.3.2). Note, however, that there might be cases where also this regularization technique leads to significant errors. Consider, e.g., a highly anisotropic particle distributions in one dimension where the density is very low at one boundary of an interval in the velocity partition and very high at the other boundary. Here, replacing the entry corresponding to the very low density might significantly alter the ansatz density in that interval.
For the other bases (p n and f N ), a similar regularization technique which replaces only multipliers corresponding to small ansatz densities is not straightforward since for these bases there is no clear correspondence between the sign of α entries and the magnitude of the ansatz density.

Entropy stability on the fully discrete level
In general, we cannot expect that an explicit time discretization preserves the entropy-stability of the semidiscrete scheme. However, we can enforce this property by using a relaxation of the standard Runge-Kutta scheme [48]. To that end, we collect the multipliers for each grid cell in a single vector, i.e. we definê . . .
Further, we define the total entropy aŝ (where we again used that η • η * is the identity) and thus i.e. the change in total entropy is bounded by entropy fluxes over the domain boundary and entropy production via particle absorption or creation. Here, x ij is the center of interface S ij and ψ b is the boundary value of the kinetic equation (2.1c). If we use an explicit Runge-Kutta scheme of the form (3.12) for time discretization of (3.5), we would expect the total entropy to approximately evolve aŝ where γ κ is calculated by finding a root of (3.20) IfĤ is convex, r(γ) is also convex and has exactly two roots, one at zero and one close to one [48] (for ∆t small enough). Unfortunately, in our case,Ĥ is not necessarily convex since η • η * is not necessarily convex.
In the Maxwell-Boltzmann case, for example, we have η • η * (p) = exp(p)(p − 1) which is convex only for p > −1 (and concave for p < −1). However, in our numerical experiments, we always found a root of r(γ) close to one.
Remark 3.9. We could use the same approach to obtain entropy-stability for the standard finite volume scheme. However, in that case, each time we want to evaluate r(γ) for a new γ we would have to solve the minimum-entropy optimization problem on each grid cell (to evaluate the equivalent ofĤ(α κ+1 γ )), which would make the root finding procedure prohibitively expensive.
Additional details on our implementation of the relaxed Runge-Kutta scheme can be found in Section 4.2.1.

Implementation details
We implemented both schemes in the generic C ++ framework DUNE [6,7], more specifically in the DUNE generic discretization toolbox dune-gdt [54] and the dune-xt-modules [42,43]. The implementation is available in [37].

Standard finite volume scheme
The implementation for the standard finite volume scheme is taken from [62]. Here, we only shortly recall the relevant parts. Further, we will restrict ourselves to Maxwell-Boltzmann entropy (2.20) such that the ansatz becomesψ u = η * (α(u) · b) = exp(α(u) · b).

Analytic solution of the source system
As mentioned above, we use a second-order splitting approach (see (2.36)) to handle the source term independently of the flux term. Remember that we assume that the scattering is isotropic and that the parameters σ s , σ a , Q are time-independent. The source system (2.36b) on grid cell T i thus takes the form Since (4.1) is linear in u i with time-independent system matrix, we can solve it explicitly using matrix exponentials and the variation of constants formula. Under the additional assumption that the source is also isotropic, i.e. that Q does not depend on Ω, the solution to (4.1) is (see [62] for a detailed derivation) Note that (4.1.1) can easily be calculated without any matrix operations due to the rank one structure of G iso b (see (2.10)).

Solving the optimization problem
The second part of the standard splitting scheme, the flux system (2.36a), is advanced in time using Heun's method, which is a second-order strong-stability preserving Runge-Kutta scheme [25]. In each stage of the time stepping scheme, we have to solve the optimization problem (2.17) once in each cell. This usually accounts for the majority of computation time which makes it mandatory to pay special attention to the implementation of the optimization algorithm.
Recall that the objective function in the dual problem (2.18) is The gradient and the Hessian of p are given by and respectively. Note that η * > 0 since we assumed that η (and thus also η * ) is strictly convex, and remember that the basis functions contained in b are linearly independent. As a consequence, the Hessian H is positive definite (and thus invertible).
To find a minimizer of p, we are searching for a root of the gradient q using Newton's method. The Newton The update takes the form where ζ k is determined by a backtracking line search such that with ξ ∈ (0, 1). In our implementation, we always use ξ = 10 −3 .
To avoid numerical problems for moments corresponding to a very small local particle density, before entering the Newton algorithm for the moment vector u we rescale it to φ := u ρ(u) such that ρ(φ) = 1. If the optimization algorithm for φ stops at an iterate β, we return .
(4.4) (remember that α 1 b is the multiplier with the property that α 1 b · b ≡ 1, see Definition 2.2). This ensures that the local particle density is preserved exactly: Given tolerances τ ∈ R >0 , ε γ ∈ (0, 1), we will stop the Newton iteration at iterate β if where α is obtained from β by (4.4) and, as always, n is the number of moments. As shown in [62], the first criterion guarantees that the gradient of the objective function is sufficiently small, i.e., q u ( α) 2 ≤ τ . The second criterion (4.7) ensures that (2.38) holds and thus that the whole scheme is realizability-preserving although the optimization problems are only solved approximately. In our implementation, we choose the tolerances as τ = 10 −9 and ε γ = 0.1.
Checking the second stopping criterion (4.7) might be quite expensive (depending on the basis b). We therefore check this criterion only if additionally holds. This criterion approximately ensures (2.38) (see [3,58]) but, in general, is much easier to evaluate than (4.7). For the HFM n models, however, checking realizability is just checking positivity, so in that case we do not need to check (4.8) first.

Caching
We use two types of caching for the standard scheme. First, for each grid cell we store the moment vector u κ−1 i from the last time step and the corresponding multiplier α κ−1 i obtained by entropy minimization. In this way we do not have to solve the optimization problem again if the moment vector in that grid cell did not change during the last time step. In addition, we store the last few solutions of the minimization problem with corresponding input moment vectors per thread of execution, so if several grid cells contain the same values, we only have to perform the optimization once and then use the cached values. If we encounter a moment vector that can not be found in the caches, we take the moment vector that is closest to the input vector (in one-norm) and use the corresponding multiplier α as an initial guess.

Linear solvers
In each iteration of the Newton scheme described above and in each time step of the new scheme, we have to apply the inverse of a positive definite Hessian matrix. We assemble the matrices using the quadratures described in Section 4.1.6. Inversion is then done by computing a Cholesky factorization of the assembled matrix. For the full moment models, the Hessian matrices are dense, so we use the LAPACK [5] routine dpotrf to compute the factorization and then use dtrsv to actually invert the linear systems. For the PMM n models, the Hessian is block-diagonal (each block corresponds to one interval/triangle of the partition) such that we can perform the Cholesky decomposition independently for each block. For the HFM n models in one dimension, the Hessian matrices are tridiagonal, so we can use the specialized LAPACK algorithms dpttrf and dpttrs. In three dimensions, the HFM n Hessians are not tridiagonal anymore but still sparse, so we use the sparse SimplicialLDLT solver from the Eigen library [26].

Regularization
Though the Hessian H(α) is positive definite and thus invertible, it may be very badly conditioned, especially for multipliers α corresponding to moments u(α) close to the boundary of the realizable set. Moreover, in general, the integral in the definition (4.3) of H can only be calculated approximately using a numerical quadrature (see Section 4.1.6). If the quadrature is not sufficiently accurate, the approximate Hessian may have a significantly worse condition or may even be numerically singular.
To improve this situation, a change of basis can be performed after each Newton iteration such that the Hessian at the current iterate becomes the unit matrix in the new basis [3]. We use this procedure in our implementation for all bases except for the hat function bases h n where the change of basis would destroy the sparsity of the Hessian [62].
For tests with strong absorption, the local particle density may become very small in parts of the domain. As a consequence, also the entries of the Hessian H become very small which may cause numerical problems.
We thus choose a "vacuum" density ψ vac with corresponding local particle density ρ vac = ψ vac . We then enforce a minimum local particle density of ρ vac by replacing moments u with local particle density ρ(u) < ρ vac by the isotropic moment with vacuum density ρ vac . Obviously, this approach leads to a violation of the conservation properties of the scheme. However, since we only replace moments with very small local particle densities by moments with slightly larger but still very small densities, the effect should be negligible in practice.
Finally, if the optimizer fails for a moment vector u (for example, by reaching a maximum number of iterations or being unable to solve for the Newton direction) we use the isotropic-regularization technique from [3], i.e. we replace u by the regularized moment vector and retry the optimization. If the optimizer still fails, we increase r until the optimizer succeeds, which is guaranteed at least for r = 1 where u r is isotropic. In our implementation, the sequence of regularization parameters r is chosen as { 10 −8 , 10 −6 , 10 −4 , 10 −3 , 0.01, 0.05, 0.1, 0.5, 1 }. As the regularized moment vector u r always has the same local particle density as the original moment vector u, this technique does not violate the mass conservation of the scheme but it may potentially completely alter the solution. In practice, regularization is only used rarely and if it is used, a small regularization parameters is usually sufficient.

Quadrature rules
We have to approximate the same integrals for both schemes, so we use the quadratures and quadrature orders that have been determined in [62] for the standard scheme. Using these quadratures, the quadrature error should usually be negligible compared to the moment approximation error [62].
In one dimension, we use Gauss-Lobatto quadratures. These quadratures include the endpoints of the interval in the set of quadrature points which ensures that the numerically realizable set and the analytically realizable set agree for the HFM n and PMM n models (see [62] for details). For these models, we use a quadrature of order 15 per interval of the partition P. In three dimensions, for the HFM n and PMM n models, we use Fekete quadratures on each spherical triangle of the triangulation P. Similar to the one-dimensional situation, these quadratures include the vertices of the triangle which is again important for realizability considerations [62]. We use a quadrature order of 15 for the HFM 6 and PMM 32 ] models and a quadrature order of 9 for the other HFM n and PMM n models. For the M N models, we use tensor-product quadrature rules of order 2N + 8 on the octants of the sphere.

Implementation of initial and boundary conditions
The initial values for the finite volume scheme are computed by integration of the kinetic equation's initial values (2.1b): Since the initial values in our test cases are isotropic (see Section 5), i.e. ψ t=0 (x, Ω) = ψ t=0 (x), we only have to compute the velocity integral of the basis b . For this integral, we use the same quadratures as in Section 4.1.6 to ensure that the result is numerically realizable. Except for the plane-source and point-source tests, the initial values are constant in each grid cell T i , so we use the midpoint quadrature to evaluate the spatial integral. For the plane-source test, we always use an even number of grid cells and distribute the Dirac delta at x = 0 into the two adjacent grid cells, i.e. the initial value in these grid cells is set to the constant ψ t=0 | Ti (x) = ψ vac + 1 2∆x . For the point-source test, we use a Gauss-Legendre tensor product quadrature of order 20 to evaluate the spatial integrals for the initial values.
Boundary conditions for the moment equations are implemented by replacing the ansatz functionψ uj belonging to a grid cell T j outside of the computational domain (such cells often called "ghost cells") by the boundary condition ψ b of the kinetic equation (2.1c) in the computation of the kinetic flux (2.30).

New scheme
For the new scheme, evaluation of quadrature rules and boundary conditions and assembly and inversion of the Hessian matrices is performed exactly in the same way as for the standard scheme (see Sections 4.1.4, 4.1.6 and 4.1.7. To get the initial values { α 0 i } for the new scheme, we solve the minimum entropy problems for the initial moments { u 0 i } which are computed as described in Section 4.1.7 using the Newton scheme from Section 4.1.2.

Embedded and relaxed Runge-Kutta schemes
Our time stepping scheme is outlined in Algorithm 1. As mentioned above (see Section 3.3), we use embedded Runge-Kutta methods (see, e.g., [28,Chapter II.4]) to adaptively choose the time step for the new scheme. These schemes include a second set of coefficients { b p } p=0,...,s−1 to obtain a different approximation (usually of lower order) of the solution at the next time step which is used for error estimation. More precisely, in addition to the approximation α κ+1 i given by (3.12a) we compute a second approximation in each grid cell T i and regard the error between these two approximations to decide if the time step is appropriate. As an error measure, we use the mixed error (see [ respectively. For simplicity, we will always use τ abs = τ rel = τ in the following. We use an automatic step size control which tries to select the time step as large as possible while still maintaining an error (4.11) below one, i.e. at t = 0 we start with a very small time (which is needed to compute α κ+1 i,γ , see (3.19))) and use a simple bisection algorithm to find the root γ κ of r(γ) that is close to 1. We then compute α κ+1 i = α κ+1 i,γκ according to (3.19).

Numerical Experiments
We want to investigate the behaviour of the new scheme in several benchmarks. For that purpose, we use the same test cases as in [62]. Our C ++ implementation and the generated data can be found in [37].
In the following, we will briefly restate the test cases. For a more detailed description and plots of (numerical) solutions see [62] and references therein. As the minimum entropy models cannot handle zero densities, we use the small isotropic distribution ψ vac = 5 · 10 −7 to approximate a vacuum. Note that we increased the vacuum density slightly compared to [62] to avoid numerical difficulties with very low densities. We use the following test cases: • Plane-source. In this test case, all mass is concentrated in the middle of the computational domain X = [−1.2, 1.2], i.e., we use the isotropic initial distribution See Section 4.1.7 for details on the implementation of this initial condition. The physical coefficients are set to σ s ≡ 1, σ a ≡ 0 and Q ≡ 0. Vacuum boundary conditions are used.
• Source-beam. In this test case, a strongly anisotropic beam enters the computational domain X = [0, 3] from the left. In addition, a source is present in the interval [1, 1.5]. More precisely, the approximate vacuum is used as initial condition and boundary condition on the right-hand side, and the left boundary distribution is The parameters are set to • Point-source. The point-source test is a smoothed three-dimensional analogue of the plane-source test.
The initial condition in the domain X = [−1, 1] 3 is where σ = 0.03. The parameters are the same as in the plane-source test.
• Checkerboard. The checkerboard test case is loosely based on a part of a reactor core [11]. The domain X = [0, 7] 3 is split into scattering and absorbing regions, X = X s ∪ X a , where The parameters are Vacuum initial and boundary conditions are used.
• Shadow. The shadow test case represents an isotropic particle stream that is partially blocked by an absorber, resulting in a shadowed region behind the absorber. The particle stream is given by an isotropic boundary condition with density ρ = 2 at x = 0. On the other boundaries of the domain X = [0, 12] × [0, 4] × [0, 3] and as an initial condition, the approximate vacuum is prescribed. The parameters are as follows: Whenever we need a reference scheme, we use the splitting scheme based on (2.36). The obvious choice might be the unsplit scheme (2.32), as the new scheme is just a coordinate transformation of this scheme. However, the splitting scheme is easy to implement and avoids the time step restriction due to the physical parameters. For the new scheme, a similar splitting approach is not straightforward. Thus, using (2.32) as a reference would arguably give the new scheme an unfair advantage.

Convergence
Under some assumed bounds on the Hessian H and the local particle density ρ, the transformed scheme will always converge to the same solution as the splitting scheme (2.36) (see Section 3.4). However, in practice, taking numerical errors into account, these bounds may not always hold (in particular the lower bound on the norm of H).
In our first experiment, we thus want to validate that the two schemes converge to the same solutions also in our numerical test cases. For that purpose, we compute numerical solutions with both schemes for varying tolerance and time step parameter, respectively, and calculate the errors with respect to a reference solution (new scheme with τ = 10 −9 ). Remark 5.1. It might be more intuitive to compute the reference solution using the standard finite volume scheme with a very small time step ∆t. However, since we only want to show that both schemes converge to the same solution, it does not matter whether we use the standard scheme or the new transformed scheme as a reference, and computing the new scheme for a small tolerance τ is significantly faster.
As an error measure, we choose the L 1 -error of the piecewise constant finite volume approximations E 1 (u) = u − u ref L 1 (X) at the final time t end . We use a relatively coarse grid for all test cases to be able to compute the results for very small tolerance parameter τ or time step ∆t in reasonable time. However, we confirmed at least for large parameters (τ ∈ {10 −2 , 10 −3 }, ∆t = ∆t max ) that the results are similar for the grid sizes and final times used in Section 5.5 (see Tables S1 and S2 in the supplementary materials).
As can be seen in Figure 1 (for the plane source test) and Figures S1 and S2 in the supplementary materials (source-beam and point-source), both schemes nicely converge to the same solution. For the standard scheme, the error is basically independent of the model which is not true for the new scheme. This is probably due to the fact that for the new scheme, the error estimate during the time stepping is calculated in transformed (α-)variables while the final error is plotted in the original (u-)variables. The L ∞ -errors behave similarly (data not shown).

Time stepping behavior
Now that we have confirmed also numerically that the new scheme indeed yields the same solutions as the standard scheme, we would like to investigate the properties of the new scheme. We will focus on the time stepping behavior first. We will only present the results for some exemplary models in the one-dimensional test cases here. Results for additional models and for the three-dimensional tests are similar and can be found in the supplementary materials ( Figures S3 to S10). Figure 2 shows the time steps chosen by the adaptive time stepping scheme (with a tolerance of τ = 10 −3 ) in the one-dimensional test cases for some exemplary models. As expected, for all models, the new scheme takes very small time steps initially. This can be explained by the large time derivatives of the solution in α-variables at the beginning of the test (see Example 3.5).
In the plane-source test (Figure 2a), the time steps are rapidly and almost monotonically increasing for all models and finally reach a time step that is even above the maximal realizability-preserving time step (2.37) used in the standard scheme (except for the PMM 2 model). In the source-beam test (Figure 2b), the time steps are also increasing initially but are less stable afterwards. In particular for the M N models there are strong oscillations in the time step sizes.
To test the influence of the tolerance parameter τ on the time steps, we compute the test cases for the M 10 model again for varying τ. The M 10 model was chosen since it is more complex than the M 2 model, not as expensive to compute as the M 100 model and still shows the time step oscillations in the source-beam test. However, we also tested several other models and found that they all show a similar behavior with respect to the τ tolerance. As can be seen in Figure 3, for small tolerances (τ ≤ 10 −3 ), increasing τ basically just scales the time step curve by a constant factor which is due to the third-order time-stepping scheme (increasing τ by a factor of 10 results in an increase of the time step by a factor of approximately 3 √ 10). However, this scaling does not extend to large tolerance parameters (τ = 10 −1 , 10 −2 ). In particular, for the plane-source test, increasing the tolerance above τ = 10 −3 does not result in larger time steps (see Figure 3a). In addition,  the time steps oscillate much more. This is due to the fact that the time step predicted from the standard error estimate for these tolerances is often too large, leading to infs and NaNs during the computations and a subsequent reduction in the time step (compare Section 4.2.1).
Finally, we measured the times needed to compute a single time step of the new scheme or the standard scheme for several models (see Figure 4). For the new scheme, the time needed to compute a time step is basically constant, except for time steps which have to be recomputed because the error estimate is above the tolerance. In contrast, time step computation times of the standard scheme are increasing over time. This is probably mostly due to the caching used in the implementation of the standard scheme (see Section 4.1.3) which is particularly effective during the first time steps where most grid cells still contain the initial approximate vacuum. The new scheme does not use any caching. As a consequence, the standard scheme is faster for the first few time steps but after a short time the new scheme's time steps are computed significantly faster.
As can also be seen from Figure 4, recomputations of time steps in the adaptive time stepping scheme (which show up in Figure 4 as spikes in the computation times of the new scheme) occur rarely, except for the M N models in the source-beam test. This is in line with the erratic time stepping behavior that we observed for these models (compare Figures 2b and 3b) and can be improved using regularization (see next section).

Isotropic regularization of the Hessian
In the previous section, we saw that the time steps for the M 10 model in the source-beam test (Figure 3b) oscillate a lot between t = 0.5 and t = 1 and sometimes even get as small as 10 −8 . These oscillations are mostly caused by ill-conditioned Hessian matrices which arise from the interaction between the highly anisotropic particle beam from the left boundary and the particles from the source Q in the absorbing but non-scattering region [0, 1]. A possible workaround for this problem is the regularization (3.15) which adds a small isotropic particle density during computation of the Hessian matrix. As can be seen when comparing Figure 5a to Figure 3b, this indeed improves the time step sizes and removes the very small time steps. As a consequence, the regularized scheme needs significantly less time steps (see Figure 5b). For example, the regularized scheme with regularization parameter = 10 −7 only needs n t = 1495 time steps instead of 3918 for the nonregularized scheme.
The price to pay for the regularization is an additional error in the order of 10 −3 to 10 −4 at the final time t end (see Figure 5b for the L 1 error, the L ∞ error is of the same order (data not shown)). This seems to be below the typical error introduced by the temporal discretization in the standard scheme (see Section 5.5.1 and Figure S1b).
Increasing the regularization parameter above 10 −7 increases the regularization error but does not further decrease the number of time steps.

Regularization for the hat function basis
For the hat function (HFM n ) models, we can use a different regularization technique that might result in smaller regularization errors. Although, according to our testing, time step declines are much less common for the HFM n models than for the M N models, we observed such declines for the HFM 6 model in the shadow test around t = 8 (see Figure 6a). In this test, (almost) all particles enter the domain with positive x-velocity.
Since there is no scattering and a strongly absorbing region, the density of particles with negative x-velocity becomes very low in parts of the domain which also leads to large (in absolute values) negative entries in theα coefficient vectors. To improve the situation, we tested the regularization technique introduced in Section 3.5.2: Whenever the time step falls below ∆t min = 0.01, we replace all entries in theα vector that are smaller than α min = −1000 by α min .
As can be seen in Figure 6b results with and without regularization is only about 10 −6 in this case.

Entropy stability
To test the entropy stability properties of the different schemes, we calculate the entropyĤ (see Section 3.6) at each time step and compute the difference between the actual entropyĤ(α κ+1 ) and the entropy given by the discrete entropy law. To that end, letĤ est (α κ ) be the entropy estimated for the next time point t κ+1 fromα κ using the entropy law (3.18). To get an compact error measure, we use the cumulated difference between actual entropy and estimated entropy, i.e.
We tested several representative moment models in the one-dimensional test cases using the new transformed scheme with either the non-modified Runge-Kutta scheme (RK) or the relaxed Runge-Kutta scheme (RRK). We restrict our investigation to one dimension here since we do not expect a qualitatively different behavior in three dimensions.
As can be seen in Table 1, with the non-relaxed Runge-Kutta scheme the difference ∆Ĥ between the entropy of the solutions and the discrete entropy law (3.18) is in the order of 10 −4 to 10 −7 . The entropyĤ(α κ ) is in the order of 100 to 1000, so the relative error is quite low even without relaxation. If relaxation is used, the error vanishes for all test cases (up to a remainder in the order of 10 −12 which is the tolerance parameter for our root finding algorithm). However, the relaxation comes at a price, as can be seen from the computation times. With relaxation, the computations take at least twice as long (for the M 50 model in the source-beam test), up to a factor of about 20 for the HFM 50 model in the plane-source test. Note that we use a simple custom bisection root finding algorithm to compute the relaxation parameters and that we did not optimize the relaxed version for performance, so it should be possible to significantly reduce the performance impact of the relaxation. Still, given that the error is already quite low without relaxation, it seems advisable to simply use the non-modified Runge-Kutta scheme unless exact entropy stability is required in the application.    Table 1: Entropy stability results. The average absolute value of the entropy is computed as |Ĥ|av = 1 t end n t −1 κ=0 |Ĥ(α κ )|∆tκ. RK: Transformed scheme using Runge-Kutta method. RRK: Transformed scheme using relaxed Runge-Kutta method.

Performance
We now want to compare the performance of the new scheme to the standard splitting scheme. Since the standard scheme uses a fixed time step ∆t and the new scheme uses adaptive time steps controlled by the tolerance parameter τ, we first have to decide on how to choose these parameters to have a fair comparison.

Choice of time stepping parameters
For the standard scheme, we have an upper bound ∆t max := 1−εγ √ d ∆x (see Equation (2.39)) on the time step due to realizability considerations. If we again consider the convergence results for the standard scheme (compare Figures 1b, S1b and S2b and Tables S1 and S2) we see that choosing ∆t = ∆t max results in a time stepping error in the order of about 10 −2 to 10 −3 . Note that the errors regarded here are solely due to the time stepping, i.e. the solution both schemes are converging to in the convergence tests is the exact solution of the semidiscrete moment equation (2.29). Since we are usually interested in an approximation of the solution to the kinetic equation (2.1), we also have to take errors into account that arise due to the space discretization and due to the moment approximation. Choosing a time step smaller than ∆t max thus would only be reasonable if the time stepping error is of the same order or even larger than the errors due to the spatial discretization and the moment approximation. This seems to be the case, e.g., for the checkerboard test where the moment approximation errors are relatively small and for the shadow test which has a large final time t end such that time discretization errors accumulate over time [62]. However, for most of the regarded test cases, the errors introduced by the moment approximation (compare [62]) are much larger than the error of up to 10 −2 we observed due to the time stepping. We thus always use ∆t = ∆t max with ε γ = 0.1 for the standard scheme.
By the same arguments, we could use a tolerance parameter of τ = 0.1 for the new scheme which also yields time stepping errors in the order of 10 −3 to 10 −2 in our tests. However, we observed in Section 5.2 that, with with the current standard error estimate, increasing the tolerance above about 10 −3 to 10 −2 does not necessarily improve performance since the time steps do not increase accordingly. In some cases, we even observed increased computation times for larger tolerances as time steps had to be recomputed more frequently. We thus choose a tolerance of τ = 10 −3 in slab geometry and τ = 10 −2 in three dimensions.
Note that with this choice of parameters, the time stepping error is probably considerably lower for the new scheme. For the test cases where this error is insignificant, it might be possible to choose larger time steps for the new scheme with an improved error estimate that is specifically adapted to the moment equations. However, we did not find such an error estimate yet. On the other hand, for test cases (e.g. checkerboard) where the time stepping error is relevant, a smaller time step would have to be used for the standard scheme which then would be significantly slower than the following results indicate. Alternatively, adaptive time stepping could also for the standard scheme. However, finding an adaptive time stepping scheme of the desired order that (provenly) preserves realizability may be difficult. The Bogacki-Shampine method used here is realizability-preserving (with the same times step restriction (2.34) as the forward Euler method) since the intermediate stages are just convex combinations of forward Euler steps and the zero vector. Similarly, the strong-stability-preserving embedded methods from the preprint [15] could be used. For the Dormand-Prince method [17], on the other hand, it is not clear under which conditions realizability is preserved (due to the negative coefficients in its Butcher tableau). In contrast, any adaptive time stepping scheme can be used for the new scheme.
For the standard scheme, we use the regularization techniques described in Section 4.1.2 to ensure that we are always able to solve the optimization problems. For the new scheme, we do not use any regularization.

Timings
Computational times for the one-dimensional test cases can be found in Figure 7. As expected (see [62]), computational times are increasing linearly (HFM n , PMM n ) or quadratically (M N ).
In the plane-source test (see Figure 7a), the new scheme is several times faster than the standard scheme for all models except for the low-order HFM n and PMM n models.
For the source-beam test, we saw in Section 5.2 that the time steps significantly vary over time for some models. In particular, the M N models show strongly oscillating time steps. These oscillations seem to be much less pronounced for the higher-order models (see Supplementary Figures S4 and S5). As a consequence, computational times for the new scheme are not increasing monotonically with the moment order (see Figure 7b), e.g. computing the M 20 model takes longer than computing the M 60 model. Thus, the standard scheme is faster for the low-order models and again significantly slower for the high-order models. Except for the HFM 2 and PMM 2 model, the HFM n and PMM n models do not show these oscillations and again reach time steps that are larger than ∆t max after some time. Consequently, overall computation times are faster with the new scheme.
For all models, the time steps are initially very small (see analysis in Section 5.1) but are rapidly increasing (see Figure 2a). After some time, the time steps are even larger than the time steps taken by the standard scheme for most models. In addition, the time to compute a time step is (on average) much smaller for the new scheme (see Figure 4a). This is especially true for the M N models which is why the speed-up for these models is significantly larger than for the HFM n and PMM n models where solving the optimization problem is already quite fast.
In the three-dimensional point-source test, the final time t end = 0.75 is relatively small and none of the models reaches ∆t max during the test (compare Supplementary Figure S6). Consequently, the computation times are only slightly faster for most models with the new scheme (see Figure 8a). The higher-order PMM n models show considerably smaller time steps than the other models and thus overall computation times are even higher than with the standard scheme.
The checkerboard test case has strongly absorbing regions and thus is the first test where the time step restrictions (2.37) and (2.34) significantly differ. After some time, the time steps are mostly between these two bounds and even exceed the upper bound several times (compare Supplementary Figure S6b). The higher order M N models show some oscillations in the beginning but much less than in the source-beam test and the time steps always stay relatively large. Thus, overall computation times are greatly improved and up to ten times as fast as with the standard splitting scheme (see Figure 8b). In addition, as mentioned earlier, for this test case the increased accuracy of the new scheme might be important, as the error due to the timestepping with the standard scheme are of the same order as the error due to the spatial and moment approximation. Again, the speed-up is smaller for the HFM n models and non-existent for the PMM n models.
The shadow test case is highly challenging for the numerical solvers. In the absorbing domain, very small local particle densities occur which lead to numerical problems when inverting the Hessians (whose entries scale with the density). In addition, as only right-going particles are entering the domain, densities for particles with negative x-velocity decline much faster than positive x-velocity densities, resulting in very anisotropic distributions and ill-conditioned Hessians. For the standard scheme, we are dealing with these problems by enforcing a minimum density and using an isotropic regularization technique to replace illconditioned moment vectors (see Section 4.1.2). These techniques, in particular the isotropic regularization, introduce additional errors which in theory might completely alter the solution. In practice, regularization is usually mainly applied to moment vectors with very low densities and thus does not destroy accuracy. However, it should be noted that this is not guaranteed automatically and has to be verified for every new application of the scheme. In our case, regularization is massively used by the M N and PMM n models. The HFM n models do not use regularization.
For the new scheme, for this comparison, we did not use any regularization. Thus, it is particularly remarkable that, for the M N models, the new scheme is about as fast as the standard splitting scheme, although frequent recomputations can be observed (see Supplementary Figure S3d For the PMM n and higher-order HFM n models, computational times are several times higher for the new scheme than for the standard splitting scheme (see Figure 8c). For these models, the time steps are converging to a value well above the maximum time step of the unsplit scheme but also significantly below the time step of the splitting scheme (compare Supplementary Figure S7). In addition, computing a time step for these models is already quite fast with the standard scheme and the speed-up of the new scheme is not large enough to compensate the smaller time step (compare Supplementary Figure S3c). Note however that the PMM n models do not show significant oscillations or other problems though the standard scheme has to use regularization. To be competitive with respect to computation times in this test case, the new scheme probably also needs to use a splitting technique. Though there is no formal limit on the time step for the new scheme due to the strong absorption, we would expect the approximation error (and thus the time step) to be dominated by this term.

Parallel scaling
As already mentioned, one of the major drawbacks of the minimum-entropy-based moment models are their computational costs. As we have seen above, the new scheme often is several times faster than the scheme in standard variables. However, even with this speed-up computations without parallelization still take excessively long. In addition, there are cases where the new scheme is not faster or even slower than the standard scheme.
As the minimization problems or matrix inversions on different grid cells are independent, parallelization is easily possible for both schemes. However, for the standard scheme, load balancing may be a serious issue [34]. Usually, some minimization problems are harder to solve than others, resulting in different numbers of iterations in the Newton scheme. The new scheme does not have this problem, as it only needs the inversion of a relatively small positive definite matrix in each grid cell. For these matrix sizes, direct solvers usually perform at least as good as iterative methods and take an approximately constant time per inversion.
To investigate the scaling behaviour, for both schemes we computed ten time steps of the point-source test  case with a varying number of threads. We use a work-stealing task-based parallelization (implemented using Intel TBB [32]). To see the impact of load balancing we perform all test both with 1 task per thread (no load balancing) and 1000 tasks per thread. The results are shown in Figure 8d. If load balancing is used, both schemes scale almost perfectly to 16 threads. When going to 32 threads the scaling is slightly worse which may be due to the used dual-socket system with 2 × 16 CPU cores.
Removing the load balancing has a large impact on the standard scheme while the new scheme is much less affected. We would expect that this difference is emphasized if even more threads (or processes) are used. The new scheme thus should be better suited to massively (MPI)-parallel computations.

Masslumping for the transformed scheme with hat function basis
The basis functions used by the HFM n models are basically the Lagrange P 1 nodal basis functions used in the (continuous) finite element method, i.e. each basis function evaluates to 1 on exactly one node of the triangulation and to 0 on all other nodes (see Section 2.4.2). As a consequence, the Hessian matrix (3.1) is tridiagonal (in one dimension) or sparse (three dimensions). Compared to the M N models where the Hessian is dense, this significantly reduces the computational effort required for assembly and inversion. However, especially in three dimensions, assembling and inverting the Hessian matrix still account for the vast majority of the new scheme's computational time.
We can significantly speed up these computations by using a quadrature that only contains the nodes of the triangulation. With such a quadrature, the basis functions always evaluate to either zero or one and the Hessian matrix becomes diagonal. The downside is, of course, that an additional quadrature error is introduced as the nodal quadrature is only of first order. However, this additional error is of the same order as introduced by the linear finite element discretization. This approach is sometimes called masslumping as using such a quadrature diagonalizes the mass matrix in the finite element method ("all mass is lumped together on the diagonal").
Remark 5.2. Masslumping could also be used for the standard scheme and should lead to similar speed-ups (assuming that the masslumping does not negatively affect the number of iterations needed for the solution of the optimization problems). Since our focus in this work is on the new scheme, we did not yet test masslumping for the standard scheme.
For the one-dimensional tests, we use the two-point Gauss-Lobatto quadrature in each interval (containing only the end-points of interval) for the masslumped version. As quadrature points that are on the same vertex of the partition can be merged, we only have one quadrature point per vertex. The reference quadrature uses 24 quadrature points per interval. In addition, we only have to evaluate one component of the integrand per quadrature point (the one corresponding to the non-zero basis function) instead of two. Overall, this reduces the number of integrand evaluations by a factor of about 48. The results can be found in Figure 9. For both test cases, almost independently of the number of moments n, the computations are about 40 times as fast using masslumping (see Figure 9a). This is in line with the reduction in the number of evaluations. The L 1 errors (compared to the non-masslumped result) are decreasing with second order (see Figure 9b). The L ∞ errors in the source-beam test are decreasing with order about 1.3, while the L ∞ errors in the plane-source tests are converging with very low order.
For the three-dimensional tests, we use the vertex quadrature on the reference triangle (with the vertexes (0, 0), (1, 0), (0, 1) as quadrature points and weight 1 6 each) transferred to each spherical triangle. This results in one quadrature point per vertex of the triangulation. The triangulation consists of 2 · 4 r+1 triangles and 2 + 4 r+1 vertices (where r is the number of refinements of the initial octants, see Section 2.4.2), and the reference quadrature has 55 quadrature points per spherical triangle. The standard implementation thus uses about 110 times as many evaluations. The results can be found in Figure 10. For all test cases, the masslumped version is more than two orders of magnitude faster than the version using the reference quadrature. The maximum speed-up is 216 times (point-source, HFM 1026 ), which is considerably higher than Ps. Sb. ml.
Sb. the reduction in quadrature points. The additional speed-up is due to the more efficient implementation, as the masslumped version does not need to use sparse matrices and the associated indirect indexing. When looking at the profiler results, we see that the time for assembling and inverting the (diagonal) Hessian is negligible in the masslumped version. Overall, the time needed for the operator evaluation (which consists of calculating the kinetic fluxes and the source term and applying the inverse Hessian matrix) has been reduced to a point where the vector operations in the adaptive Runge-Kutta scheme now make up a major part of the computation time.
For all three-dimensional test cases, the errors compared to the non-masslumped version are quite large (see Figure 10b). However, both the L 1 and the L ∞ error converge with first order in n for all test cases which corresponds to second-order convergence in the grid width of the (velocity space) triangulation, as each refinement halves the grid width but increases the number of vertices (approximately) by a factor of 4. The convergence rate thus is similar (for the checkerboard test) to or even higher (point-source, shadow) than the convergence of the (second-order discretization of the) moment approximation (compare [62]). Thus, for high-order models, the additional quadrature error introduced by masslumping might be acceptable given the massive speed-up. In any case, it might be preferable to replace a lower-order moment model with high-order quadrature by a higher-order masslumped model.

Conclusion and outlook
In this paper, we introduced a new numerical scheme for entropy-based moment equations that is based on a variable transformation of the semi-discretized equations and gets rid of the minimum-entropy optimization problems (except for the initial values). We have shown analytically and numerically that the new scheme converges to the correct solutions, and that it follow a discrete entropy law if a relaxed Runge-Kutta method is used for time stepping. In addition, we investigated the performance of the new scheme in several numerical benchmarks and showed that it is often several times faster than the untransformed scheme, at the same or even higher accuracy in time. In addition, for the hat function basis, we showed that a massive speed-up can be obtained by using a quadrature that contains only the vertices of the triangulation (at the cost of additional quadrature error), making very high-order models computable in reasonable time. Finally, we  did some tests on parallel scaling of the schemes which suggest that the new scheme does not have the same load-balancing problems as the untransformed scheme.
To improve the scheme, better error estimates for the adaptive timesteppers should be investigated to get rid of the erratic time step behaviour observed in the source-beam and shadow tests. Here, larger errors could be allowed for multipliers that correspond to small densities and thus only have a minor effect on the solution in original variables. In addition, regularization techniques could be used to replace such multipliers if they limit the time step. These regularization techniques might also be needed to be able to solve problems where some Hessians are numerically singular. For applications involving strong scattering or absorption, splitting methods for the new scheme might be of interest to remove the time step restriction induced by the corresponding terms.
While we restricted ourselves to a first-order scheme, the same variable transformation can also be applied to higher-order kinetic schemes as regarded, e.g., in [58,60].
In future work, we will investigate further model reduction by POD-based reduced basis methods [31,47] which should be much easier with the new scheme as it is well-defined on the whole R n and not only on the realizable set (which is a convex cone in R n ).

S. Supplementary data
This supplement contains additional figures and tabular data from the numerical experiments.        Figure S9: Time steps taken in the plane-source test case (nx = 240, t end = 0.5) for different tolerance parameters τ. The last step has been omitted for all models as it was chosen to reach t end exactly and thus may be artificially small.  Figure S10: Time steps taken in the point-source test case (nx = 30 3 , t end = 0.25) for different tolerance parameters τ. The last step has been omitted for all models as it was chosen to reach t end exactly and thus may be artificially small.