Positivity-preserving methods for population models

Many important applications are modelled by differential equations with positive solutions. However, it remains an outstanding open problem to develop numerical methods that are both (i) of a high order of accuracy and (ii) capable of preserving positivity. It is known that the two main families of numerical methods, Runge-Kutta methods and multistep methods, face an order barrier: if they preserve positivity, then they are constrained to low accuracy: they cannot be better than first order. We propose novel methods that overcome this barrier: our methods are of second order, and they are guaranteed to preserve positivity. Our methods apply to a large class of differential equations that have a special graph Laplacian structure, which we elucidate. The equations need be neither linear nor autonomous and the graph Laplacian need not be symmetric. This algebraic structure arises naturally in many important applications where positivity is required. We showcase our new methods on applications where standard high order methods fail to preserve positivity, including infectious diseases, Markov processes, master equations and chemical reactions.

Although the focus of this article is mainly on positivity and mass preservation ODEs, positivity preservation is a much wider challenge. For example, Lotka-Volterra models [3,16] preserve positivity but not mass as well as some parabolic problems [26]. The stochastic differential equation associated with the Nobel prize winning Black-Scholes model in finance has positive solutions, but standard numerical solvers, such as the Euler-Maruyama method, fail to preserve positivity. The Kolmogorov Lecture at the Ninth World Congress in Probability and Statistics concerned methods for preserving positivity in the setting of the stochastic Langevin equations [36].
We note in passing that even with these two requirements, (1.1) may display rich dynamical behaviour: some systems of this kind converge to a unique steady state, others have a number of steady states, yet others exhibit oscillatory behaviour.
The methods proposed in this work are constructed to preserve positivity, while keeping linear invariants preservation to high accuracy (symplectic integrators preserve the symplectic structure of Hamiltonian systems while not exactly preserving energy, but this gives good properties in regards to error propagation over long time intervals). However, in the case where only mass preservation is required there are well known mathematical results that allow us to adapt the methods to preserve exactly, and for this reason this case is now treated in more detail.

Graph Laplacians and ODEs
A useful way to envisage mass and positivity preservation is that for every t ≥ 0 the state variable y(t) is a discrete probability distribution of d species. This corresponds to the case k = 1, w 1 = 1 and, as we will show in Proposition 1, these properties can be preserved if the vector field in (1.1) can be written in the form (see also e.g. [5,15,20]) f (t, y) = A(t, y)y where the matrix A : R × R d → R d×d is a graph Laplacian.
We denote the set of all n × n graph Laplacians by L n . The same term 'graph Laplacian' is used with different meanings in the literature -in our work, we allow it to be non-symmetric.
For simplicity, we consider the autonomous case. (The general nonautonomous case can be considered similarly, as we will show latter.) We focus on the solution of the nonlinear ODE y ′ = A(y)y, y(0) = y 0 ∈ R d , (1.2) where we assume throughout that A(y) has the same pattern of signs as a graph Laplacian, i.e. we assume Property 1 of the definition above. (In some examples, such as the MAPK cascade example, we do not assume Property 2, i.e. we do not always assume 1 ⊤ A(y) = 0 ⊤ , and we demonstrate that our methods can nevertheless work well.) We typically also assume that all components of the initial condition are nonnegative. Many applications fit this framework: Markov processes in continuous time on discrete states; master equations [40]; single molecule chemistry [51] (Fig. 4); studies of robustness of Turing pattern formation in stochastic settings [27,43]; and lasers and quantum dots [53]. Given two compatible matrices P and Q we say that P ≻ Q if P i,j > Q i,j for all i, j and P Q if P i,j ≥ Q i,j . We assume that y 0 0 and 1 ⊤ y 0 = 1. Then the solutions of (1.2) have the following desirable features. Proposition 1. Solutions of (1.2) with y 0 0, 1 T y 0 = 1 have the following two properties: Positivity: y(t) 0 for all t ≥ 0, and Conservation of mass: 1 ⊤ y(t) = 1, for all t ≥ 0.
To prove the statement about positivity, we consider any t * ≥ 0 such that there exists k * ∈ {1, 2, . . . , d} with y k * (t * ) = 0 and such that y(t * ) 0 -clearly, unless such t * exists, y(t) stays forever in the nonnegative cone. Note that it is perfectly possible for t * to be zero, also it is possible that several components of y(t) vanish at t = t * , this makes no difference to our argument. We note that, by (1.2), A k * ,ℓ (y(t * ))y ℓ (t * ) ≥ 0, because A is a graph Laplacian, so off-diagonal entries are nonnegative. Therefore y k * cannot change sign at t * , and it must stay in the nonnegative cone.
Remark. Note in the proof of Proposition 1 that property 1 alone of the definition of the Laplacian (pattern of signs) suffices to give positivity, and that, separately, property 2 alone of the definition of the Laplacian suffices to give mass preservation. In particular, if the matrix A(y) has the same pattern of ± signs as a Laplacian (but we make no assumption on the column sums of A(y)), then it is still true that solutions of y ′ = A(y)y, preserve positivity. Let us now consider some properties of graph Laplacian matrices that allow us to deduce additional qualitative properties of the solution of (1.2).
Theorem 2. Let A ∈ L n . Then it has an eigenvalue at the origin, which is simple if A is irreducible, and all its other eigenvalues reside in C − = {z ∈ C : Re z < 0}.
Proof. Since 1 ⊤ A = 0 ⊤ , it follows that 0 ∈ σ(A). To locate the remaining eigenvalues we use the Gerschgorin theorem, applying it to columns (typically it is applied to rows, but this makes no difference). Thus, letting we have σ(A) ⊂ n ℓ=1 S ℓ . By the definition of graph Laplacian, all Gerschgorin discs live in cl C − and adjoin iR only at the origin. Therefore σ(A) \ {0} ∈ C − . It remains to prove that 0 is a simple eigenvalue. Let α = min k=1,...,n A k,k , then the entries of B = A − αI = O are all nonnegative. Therefore, according to Frobenius-Perron theory [4], irreducibility implies that the largest in modulus eigenvalue of B is positive and simple. Since this is −α, it follows that 0 is a simple eigenvalue of A.
Incidentally, one of the less well-known formulations of the Gerschgorin theorem states that if A is irreducible then an eigenvalue might be on the boundary of one Gerschgorin disc only if it is on the boundary of all Gerschgorin discs -this is certainly the case with 0.
where α + (B) is the spectral abscissa -the eigenvalue of the matrix B with the largest real part (which in the case of A is real because of the Perron-Frobenius theory). This is true because α + (B) ≥ v ⊤ Bv/ v 2 for any square matrix B and a nonzero vector v. Since our A(y) is graph Laplacian, it follows at once from the Gerschgorin theorem that α + (A(y(t))) ≤ 0 and, since 0 ∈ σ(A(y(t))), we deduce that d y(t) 2 / dt ≤ 0.
Letŷ be the eigenvector corresponding to the simple eigenvalue 0. In the symmetric case, it is clear that y(t) −ŷ 2 is a monotonically decreasing functionusing the fact that A(y(t))ŷ = 0, and we continue as before.
In the nonsymmetric case, the issue of stability needs more discussion. The two defining properties of the graph Laplacian together ensure that the columns of the matrix exponential are probability vectors, so that, when A is a constant matrix, in the 1-norm we always have exp(tA) = 1, t ≥ 0. In the case of a constant matrix, these matrices are sometimes known as 'W-matrices' in the statistical physics literature and, by studying the adjoint z ′ (t) = A ⊤ z(t) -with arguments similar to those of our Proposition 1 -it is known that the minimum of the solution z is increasing, and that the maximum is decreasing. In the 2-norm, a sufficient condition for strong stability of y ′ = Ay(t) with solution y(t) = exp(tA)y(0), is that (A + A ⊤ ) be negative definite. Note that this condition is more restrictive than merely the assumption that the eigenvalues of A have negative real part (because then it would still be possible that (A + A ⊤ ) had a positive eigenvalue). This issue of stability is related to 'the hump' in the classical literature on the numerical analysis of the matrix exponential, and to the lognorm, and also to the subject of pseudospectra. Nonsymmetric graph Laplacians exhibit significant pseudospectra, manifesting themselves in various ways, such as a more subtle stability analysis, and the failure of standard eigenvalue algorithms [31,37,38]. A sufficient condition for stability of operator splitting methods is that each part separately be strongly stable, although this may be too pessimistic in practice. For operator splitting methods, the graph Laplacian can sometimes be expressed as the sum of two matrices, each of which is separately a graph Laplacian with a physical interpretation [39]. In general, operator splitting does not preserve the steady state [52] -so it is worth pointing out that the novel splitting methods that we introduce in this work, for example later in (3.2), in our numerical experiments, do have the desirable property that they preserve the steady state. In the nonautonomous case, but still linear case, it can be shown under suitable assumptions that the difference of any two solutions is decreasing in the 1-norm, but the issue of stability is much more delicate. For instance, see the catalogue of counterexamples, and Theorem 3.1 described in [17].
To sum up, the solution of a mathematical model given by (1.2) with y 0 0 and where A(y) is a graph Laplacian matrix (assuming y 0) always preserves mass and always preserves positivity. Often, the model (1.2) is also stable and converges to a steady state. These features correspond to the phenomenological desiderata in for example epidemiological models.
In theory, there are always exact formulae for the right eigenvector corresponding to the zero eigenvalue of a nonsymmetric graph Laplacian matrix A, via the Matrix-Tree Theorem [22]. This is the steady state of the corresponding linear Laplacian dynamical system, and in special cases, there are also formulae for the dynamical solutions [17,18,31].
Unfortunately, in general, the exact solution of these dynamical systems is unknown, so we need to resort to numerical algorithms. Using backward error analysis, we can envisage a numerical method as the exact solution of a perturbed model. While this is typically adequate across a single step, unless the method is chosen carefully, a numerical solution is highly unlikely to respect the important special structure of (1.2) across the entire time interval of interest.
The mathematical models we are considering in this paper are based on differential equations whose solutions preserve some underlying geometric structure. The design and analysis of numerical integrators that preserve the qualitative features of the underlying differential equations is the subject of Geometric Numerical Integration [8,25,30,50]. We are not only concerned with the accuracy and stability of numerical schemes but also with their geometric properties, which reflect important features of the phenomena being modelled. This endows the integrators with an improved qualitative behaviour, but also typically leads to significantly more accurate results.
For example, in [21] the authors consider a mathematical model for the COVID-19 epidemic in Italy, while paying much attention so that the proposed model has the structure of (1.2), but then numerically solve it using the first order explicit Euler method y n+1 = y n + hf (y n ) where h is the time step and y n ≃ y(t n ) with t n = t 0 + nh. We easily see that and then the mass is preserved (this is also the case for most standard methods like Runge-Kutta or multistep methods). However, it is well known that, in general, this method does not preserve positivity unconditionally.
This inadequate behaviour cannot be rectified by a standard higher-order method: in [10] it is shown that within the class of linear multistep and Runge-Kutta methods unconditional positivity restricts the order of the method to just one. 1 For non-stiff problems and for relatively short time integration, an Euler method, or any other standard method, can provide sufficiently accurate, satisfactory results. However, if a mathematical model is stiff (this is typical to equations of chemical kinetics) or need be solved for long time intervals, standard methods may produce negative solutions or become unstable. While the stiffness in chemical kinetics equations can be dealt with using implicit methods and mass is preserved by most numerical methods, positivity remains an outstanding challenge.
The most efficient solvers considered in [24] for low to medium accuracy in the numerical solution of stiff kinetic equations are Rosenbrock methods. In addition, they are among the simplest implicit schemes to be implemented in a code, yet they fail to preserve positivity. Note that there exist exponential Rosenbrock-type methods [28] that involve the computation of the exponential of the Jacobian. However, in general, this Jacobian is not a graph Laplacian and positivity cannot be guaranteed.
The objective of preserving mass and positivity in numerical integration, in particular within the context of chemical kinetics, received a measure of attention, although perhaps less than it deserves given its importance in applications. An obvious device to avoid the solution from becoming negative is clipping: the practice of converting a negative component to zero. This, of course, interferes with the preservation of mass but the latter can be recovered using laborious optimization procedure in every time step [49]. The effects of this costly algorithm on long-term accuracy and stability are unknown.
Another approach toward preservation of mass and positivity are Runge-Kutta-Patankar methods [5,14,34,47]. The idea is to adapt Runge-Kutta-like methods for production-destruction systems in chemical kinetics. We will show that this class of methods can be seen as particular approximation to the methods proposed in the present work.

Illustrative examples
To illustrate our analysis we consider several simple population models from the literature. y ′ 1 = −0.04y 1 +10 4 y 2 y 3 , y 1 (0) = 1 y ′ 2 = 0.04y 1 −10 4 y 2 y 3 −3 · 10 7 y 2 2 , y 2 (0) = 0 y ′ 3 = 3 · 10 7 y 2 2 , y 3 (0) = 0, that, rewritten in a vector form, read d dt where the matrix is graph Laplacian. This example fits into the framework of Theorem 9, which comes later. Note that the system can also be written in many different ways, for example d dt where now the matrix is no longer a graph Laplacian. As we will see, it is crucial to write properly the equations for the numerical solutions to preserve their qualitative properties.
Example 2: The SIR model. The Susceptible-Infected-Recovered (SIR) model describes the temporal epidemic evolution in terms of three variables for the population: S(t): (Susceptible), I(t) (Infected) and R(t) (Recovered). It is usually asssumed that the total population does not change during the infection period. S, I and R denote the fractions with respect to the total population: S(t)+I(t)+R(t) ≡ 1. This model was proposed in [33] S ′ = −R 0 SI , where R 0 > 0 is the basic reproduction number, and the system can be written in the form which is like (1.2) with y = [S, I, R] ⊤ and the matrix is evidently a graph Laplacian.
Example 3: Laplacian dynamics on graphs (autonomous and linear). Graph Laplacian dynamics, y ′ = L(G)y, where L(G) is a constant matrix, representing the Laplacian of a directed graph G, gives rise to a large class of applications in biochemical kinetics, including Michaelis-Menten enzyme kinetics, allosteric enzymes, G-protein coupled receptors, ion channels, and gene regulation [22] (equation (3)). Discussion of conditions under which such linear systems always converge to a steady state, and discussion of the sense in which that might be considered unique is given in [45]. That linear setting y ′ = L(G)y is a special case of the more general framework here where we focus on the exact nonlinear model in (1.2).
Example 4: Cardiac ion channels (nonautonomous and linear). Nonautonomous Laplacian systems, y ′ = A(t)y, have many important applications, including cardiac ion channel kinetics [17,18]. In special cases, there are also exact solutions for the dynamical solutions, such as the explicit Magnus formulae in [31], and closely related invariant manifolds of binomial-like solutions.
Example 5: MAPK cascade (autonomous and nonlinear). The mitogen-activated protein kinase (MAPK) cascade is fundamental in cell signalling biology and cancer biology, and it is modelled by eighteen differential equations with rates given by the Law of Mass Action, together with some linear conservation laws [48]. By our Theorem 9, in the sequel, this MAPK model fits our framework of (1.2), subject to the remarks we make following Proposition 1. The Laplacian dynamics mentioned in the above constant coefficient and linear examples, where convergence to a steady state is common [45], makes it tempting to conjecture that the model we focus on here in (1.2), likewise always converges to a steady state. However, a counterexample is provided by the MAPK cascade, which can be modelled by our nonlinear Laplacian dynamics (1.2), and which has been shown by numerical simulations to exhibit both bistability and oscillations [48].
We have taken the model of [23] (Table 3, Fig 3, equations (12)- (17)), which is closely related to the MAPK cascade, and rewritten it here in the form of our model (1.2), to show that it is clearly an example of the Laplacian dynamics that we study in this paper: (2.8) We take the same rate constants k 1 = 100 10 , k 7 = 7 10 , and initial state y(0) = [0.1, 0.175, 0.15, 1.15, 0.81, 0.5] ⊤ . Note that we have y ′ = A(α, y)y, where α ∈ [0, 1] is a parameter we can freely choose in this interval and the matrix A(α, y) has the same pattern of signs as a Laplacian, but that a column of A(α, y) does not always sum to zero, so this fits our framework of (1.2), subject to the remarks we make following Proposition 1, and this is also an example of our later Theorem 9. This model possesses two conservation of mass laws, namely both y 1 +y 4 +y 6 and y 2 +y 3 +y 4 +y 5 are constants, which have physical interpretation in terms of the total enzyme of two types of kinases. Those two conservation laws correspond to w 1 = [1, 0, 0, 1, 0, 1] ⊤ , and w 2 = [0, 1, 1, 1, 1, 0] ⊤ , respectively. Note that we have and this is irrespective of the value of α ∈ (0, 1). However, if we take α = 0 we have that It should be possible to use methods based on matrix exponentials (such as the methods proposed in this paper) to respect e.g. the second conservation law, if we take α = 1 because w ⊤ 2 A(y) = 0, so w ⊤ 2 exp(tA(y)) = w ⊤ 2 . However, because w ⊤ 1 A(y) = 0, it will be difficult (and probably impossible) to maintain exactly the first conservation law by methods that compute matrix exponentials. 2 This is typical of applications in chemical kinetics, and for example, the famous Michaelis-Menten enzyme kinetics model (which always converges to a unique and simple steady state) also fits the framework, with a matrix that has the same pattern of signs as a Laplacian, but that does not have zero column sum, and the model still has two simple well-known linear conservation laws. Significantly, by numerical simulation, it has been shown that solutions of this model (2.8) show oscillations [23] (Fig. 5).

Positivity preserving second-order methods
Let us first consider the particular case in which the matrix A is constant. Then the exact solution is given via the exponential: If A is a graph Laplacian matrix it is a consequence of Theorem 2 that σ(e tA ) ⊂ {z ∈ C : |z| ≤ 1}, hence the solution is stable (subject to the discussion of stability we gave earlier, in the nonsymmetric case).
The exponential of a graph Laplacian matrix is fundamental to the work of this paper, and this calls for a more detailed study of its qualitative properties.

The exponential of a graph Laplacian matrix
We begin with column sums for an arbitrary square matrix.
Proof. By the series definition of the exponential Remark. Replace A by tA in the Proposition to see that, as a corollary, if Remark. Graph Laplacians have the property 1 ⊤ A = 0 ⊤ by definition, so for graph Laplacians it is also true that 1 ⊤ e tA = 1 ⊤ . We need the following elements of the Perron-Frobenius theory [4] (p. [26][27]. SinceÃ O, all its nonnegative powers are also nonnegative and we deduce that e tÃ O. Therefore e tA O. Indeed, the Mittag-Leffler matrix function of a graph Laplacian, E α (At α ), is likewise a stochastic matrix, i.e. E α (At α ) O, and all entries are positive, and columns sum to unity [41]. Here the Mittag-Leffler function E α (z) = ∞ k=0 z k /Γ(αk + 1) is a one-parameter generalisation of the exponential, and the exponential is recovered as the special case once α → 1. Furthermore, when A is Laplacian then the pattern of signs in the resolvent, and the properties of M -matrices, show that for large n, all entries of the matrix I − t n A −n are nonnegative. This suggests the results we derive here may be extended to more general settings. Additionally, once A is irreducible and we denote by w 0 the left eigenvector of A corresponding to the zero eigenvalue, then w ⊤Ã = −a * w ⊤ . Since a * < 0, we deduce that |a * | = ρ(Ã) and w = 1.
Proof. By the Gerschgorin theorem applied to the columns ofÃ (or the standard Gerschgorin theorem applied toÃ ⊤ ) and because A j,i ≥ 0 for i = j, we have and the proof follows.
Proposition 6. Let R d ∋ p 0 be such that 1 ⊤ p = 1 and set q = e tA p. Then for every t ≥ 0 q 0 and 1 ⊤ q = 1.
Remark. Note that all previously stated results apply to maps of the form z = e σS x where x 0, S is a graph Laplacian and σ is a non-negative constant. Since S can have large negative eigenvalues, taking negative values of σ is likely to lead to a poorly conditioned problem where negative solutions can occur and this compels us to avoid this choice. In the sequel we propose several methods that involve maps of the form e σS with S being graph Laplacian, and we will see that condition σ > 0 limits the order of the methods to two in the time step, an order barrier. Since , the following well known result will be useful in the sequel.
which is an M -matrix whose inverse has only non negative elements.

Splitting methods
Splitting methods are frequently used to solve differential equations that are separable into solvable parts. However, for stiff as well as for non separable problems it is more convenient to proceed as follows [7]. Let us consider the following system in the extended space The system is separable into two solvable parts A : We solve the system with the symmetric second order Strang splitting method, i.e. advance half a step with A followed by a step with B and conclude with another half a step with A: 1) Since z 0 0, the frozen matrix A(z 0 ) is a graph Laplacian, therefore x 1/2 0 and preserves the 1-norm, and similarly for z 1 and x 1 .
In addition, x 1 and z 1 correspond to symmetric second order approximations: z 1 can be seen as the exponential midpoint and x 1 as the exponential trapezoidal rule. Then, we can advance the solution either with z 1 or with x 1 but, in general, more accurate results are obtained with the smoothing technique, i.e. taking the solution for the next step as the average where again the 1-norm is preserved and all components of y 1 are nonnegative. The Lie group structure is not preserved by this linear combination, but this is not a property that concerns us in the present context. In addition, the difference x 1 − z 1 can be taken as an estimate of local error, using the scheme as a variable time-step algorithm in order to get more accurate results.
Remark. If A is graph Laplacian and irreducible then (3.1) is a time-symmetric second order method that preserves mass and positivity unconditionally (the average (3.2) breaks time symmetry) and converges to the steady state solution. Let y f be the steady state solution, then f ( where 0 is a simple eigenvalue and the method is a composition of exponentials of A, it must converge to a steady state solution, sayŷ f . However, we observe that if we take y 0 = y f then it is trivial to check that which is a first order approximation to the exponential that, by our earlier Proposition 7, still preserves positivity. We can replace A(x 1/2 ) by A(u) in (3.1) and, since this matrix is multiplied by h, the method retains second order of accuracy.
The non-autonomous case. Let us now consider the non-autonomous system y ′ = A(t, y)y, y(0) = y 0 . This occurs, for example, when a chemical reaction takes place at variable temperature and the coefficients k i (t) are time dependent or when the parameter R 0 in the SIR model changes due to political decisions, variations in behaviour or the evolution of a pathogen. In this case we duplicate the system, but taking the time as two dependent variables where x(t) = y(t) = z(t) and x t (t) = z t (t) = t: the system is now autonomous and separable into solvable parts: the outcome is an algorithm similar to (3.1),

Magnus integrators
A more general procedure to construct higher-order methods is to consider Magnus integrators.
Let A : R × R d → R d×d . We consider the equation and suppose that y n ≃ y(t n ). One approach toward the solution of (3.4) is that corresponds to an approximation to the exact solution to order m * . The linear ODE in (3.5) can be solved e.g. by Magnus series expansion [42] (see also [6,29,32] and references therein). For simplicity, we first consider the autonomous case, with A(y), and next we show the results for the non-autonomous problem. For example, for m * = 1 we have y [1] ′ = A(y n )y [1] , therefore y [1] (t) = e (t−tn)A(yn) y n and we obtain the first-order method y n+1 = e hnA(yn) y n . (3.6) Letting m * = 2 leads to a second-order method y [2] ′ = A(e (t−tn)A(yn) y n )y [2] , whose Magnus solution truncated to the first term that provides second order approximations in the time step is We can easily apply these Magnus integrators to non-autonomous problems. The first-order method is, obviously y n+1 = e hnA(tn,yn) y n . (3.9) The trapezoidal second-order method is given by while the corresponding second-order midpoint rule method is Note that if we consider the first order approximation to e hnA(tn,yn) or e 1 2 hnA(tnyn) given by  12) or and still preserve mass and positivity as well as the second order accuracy. We can either compute the exponentials to high accuracy, or to look for cheaper approximations that preserve both mass and positivity, this being a problem to be studied further in the future.

Patankar methods
A well-known approach toward preservation of mass and positivity are Runge-Kutta-Patankar methods [14,34,35,46,47]. The idea is to use Runge-Kutta-like methods for production-destruction systems in chemical kinetics, of the form where p k,j (t, y), d k,j (t, y) ≥ 0. The first order Patankar method is given by This method preserves positivity but does not preserve mass. In [14] the authors propose a Modified Patankar Euler scheme (MPE) given by y n+1,k = y n,k + h d j=1 p k,j (t n , y n ) y n+1,j y n,j − d k,j (t n , y n ) y n+1,k y n,k , k = 1, . . . , d, (3.15) which preserves both mass and positivity unconditionally. Note that we can write (3.14) in our graph-Laplacian notation as δ k,j y k where δ k,j = 0 if k = j and δ k,k = 1. Then, (3.15) can be written in matrix form as y n+1 = y n + hA(t n , y n )y n+1 (3.16) that preserves mass (because A(t n , y n ) is graph Laplacian) and as already shown preserves positivity. Note that corresponding to a first order rational approximation to the first order exponential Magnus integrator. The second-order Modified Patankar-Runge-Kutta scheme (MPRK) is given by where D(y n , u) = diag y n,1 u 1 , . . . , y n,d u d and y n = (y n,1 , . . . , y n,d ) ⊤ , u = (u 1 , . . . , u d ) ⊤ . Note that u coincides with u 1 in (3.10) and then the modified Patankar method can be considered as a particular second order approximation to the second order Magnus method (3.10). Obviously, different second order approximations to this exponential or to the method using the midpoint rule (3.11) would lead to different modified second order Patankar methods.
Note that during the integration some of the values u i may approach zero. In this case one may take, for example, yn,i ui = 0 when u i is smaller than a given tolerance. Some caution is required if any component of the solution is very close to zero and suddenly grows, as it happens with some of the numerical examples we will consider. This method has shown a good performance on stiff problems [13]. Higher order modified Patankar methods have also been recently obtained in the literature [20,35,46] and it would be interesting to find if there is any connection with our exponential integrators.
There are other families of methods which consider some kind of adaptive time steps which depend on the phase space and the time step which allows to preserve positivity as well as the linear invariants [2,11,44] but they are not considered in this work and a proper study of their performance with respect to the new methods is left for future research.

Higher order methods
Continuing in this vain, A(e (τ −tn)A(yn) y n ) dτ y n y [3] = A 2 (t)y [3] and a fourth-order Magnus reads (this is a 4th-order approximation to y [3] which is a third order approximation to the exact solution, so the methods will be of order three) The temptation is now to discretise using standard Magnus quadrature at Gauss-Legendre points but this does not work because the definition of A 2 itself contains an integral. Moreover, the critical issue is the dependence of A 2 on t, not on y n . We approximate The simplest solution is to approximate that integral also by two-point Gauss-Legendre (note that the interval of integration in the inner integral is of length ( 1 2 − √ 3 6 )h n and we need to adjust quadrature points), whereby Brief explanation: the first integral is in the interval [t n , t n +( 6 ) = 1 6 . Thus, altogether we need three function evaluations, one more than standard Magnus. Note moreover that the integration in A 1 is explicit, A 1 (t) = A(e (t−tn)A(yn) ).
We observe that B i , i = 1, 2, are graph Laplacians, but this need not be the case for their commutator [B 1 , B 2 ]. This problem can be bypassed using commutator-free Magnus integrators [1,9].
Commutator-free Magnus integrators. We describe briefly, using an example, the construction of commutation-free integrators, based upon the work of [9]. We approximate the solution across a single time step by 3 and the algorithm is given by This is a seven-exponential method that might be useful when highly accurate results are desired and the cost of each exponential is not excessive. It preserves positivity for moderately stiff problems since it is conditionally positivity preserving. If y n 0 then it is easy to see that x i 0, i = 1, 2, 3, 4, 5 since A 1,1 , A 1,2 , A 1,3 , B 1 , B 2 ∈ L d . However, positivity is guaranteed as long as the matrices αB 2 + βB 1 , βB 2 + αB 1 are graph Laplacians [37]. Unfortunately, β < 0 but α/|β| = 7 + 4 √ 3 ≃ 14, and unless B 1 , B 2 drastically change in a short time interval or their sparsity structure is 'unlucky', their linear combinations are likely to inherit graph-Laplacian structure. In other words, while preservation of graph Laplacians for this third-order method is not assured, it is highly likely in practice.
Finally, we present this Magnus integrator to be used on non-autonomous problems. A third-order commutator-free method can be obtained following the same approximations as previously and taking, for example, the midpoint rule when approximating the intermediate integrals that ensure the third order of accuracy for the method, resulting in the following algorithm 12 )h n , y n ), 4. Hidden graph Laplacian structures for polynomial ODEs 4.1. The recovery of graph Laplacian structure Given an ODE system of the form with suitable initial conditions y(0) 0, 1 ⊤ y(0) = 1, we seek conditions so that it can be written in the form (1.2), where the matrix A(y) is a graph Laplacian, namely that for every nonnegative y such that 1 ⊤ y = 1 it is true that A k,k (y) ≤ 0 and A k,ℓ (y) ≥ 0, ℓ = k. Moreover, we seek constructive means of deriving such a matrix A, given (4.1).
Our first observation is that the representation of (4.1) in the form (1.2) is additive, in the sense that if we can do so for two different right-hand sides of (4.1), we can do so for their sum. By the same token, if we can do so separately for the first sum and the second, double sum in (4.1), all we need is simply add the two representations. The first sum is trivial and corresponds to the constant-matrix representation y ′ = By, where B = (b ℓ k ) is a graph Laplacian. Consequently, the task at hand reduces to the derivation of a representation (1.2) of the system a ℓ,i k y ℓ y i , k = 1, . . . , d.
With greater generality, we may just as well consider the multinomial ODE system with initial conditions y(0) = y 0 0, 1 ⊤ y 0 = 1. Again, the challenge is to write it in the form (1.2) with a graph Laplacian A(y) and, again, we can use the same argument to split the task at hand into a sum of homogeneous problems of the form for j = 2, . . . , m -the case j = 1 is trivial. The problem, though, is that (4.2) can be written in the form (1.2) in a multitude of ways -indeed, even the coefficients a ℓ k are not unique. This can be seen in the simplest nontrivial case, d = 2 and j = 2: y ′ 1 = a 1,1 1 y 2 1 + (a 1,2 1 + a 2,1 1 )y 1 y 2 + a 2,2 1 y 2 2 , y ′ 2 = a 1,1 2 y 2 1 + (a 1,2 2 + a 2,1 2 )y 1 y 2 + a 2,2 2 y 2 2 .
Six equalities (inclusive of (4.3)) and four inequalities for eight variables: impossible in some configurations, while other configurations lead to an infinity of solutions. Henceforth we let e i stand for the ith unit vector.
Note that this cannot occur for quadratic equations (4.4) because, once A k,ℓ (y) is a multilinear function of y 0, it is a graph Laplacian only if all off-diagonal coefficients are nonnegative.

Chemical reactions by the Law of Mass Action
An important application are chemical reactions, where the rate of reaction is modelled by the Law of Mass Action. Then the model is a first-order ODE with a multivariate polynomial for the right hand side, so it can be considered an important special case of our framework. Suppose there are N reactions, where the j-th reaction is written in the form Here r i,j , q i,j are integer coefficients, G i , i = 1, 2, . . . , M are symbols for the chemical species, y i denotes the concentration of species i, and k j is the rate constant. The model is the ODE where y ∈ R M , S ∈ R M×N , p ∈ R N , and S i,j = q i,j − r i,j is the matrix of stoichiometric vectors, while is the Law of Mass Action to model the rates of reaction. This is a nonlinear and autonomous differential equation (it would be non-autonomous if the rates k j = k j (t) were time-varying, for example to model fluctuating temperatures). The following theorem shows this model can always be written in the form where the matrix L(y) has the same pattern of signs as a Laplacian, i.e. off-diagonal entries are nonnegative, and negative entries can only appear on the diagonal. Proof. We assume that q i,j , r i,j are non-negative integers, k j is a non-negative real number and y j ≥ 0. Then, a negative coefficient could only appear in the stoichiometric matrix S i,j if r i,j ≥ 1, and this happens in the equation for y ′ i . Since we have p j = k j M k=1,k =i y r k,j k y ri,j i with r i,j ≥ 1, we may allocate this term to the diagonal of the matrix L(y). All other components where r i,j = 0 in the right hand side of the equation for y ′ i have positive coefficients and can be allocated outside the diagonal.
Remark. Note that the matrix L(y) of (4.10) need not be unique, as we previously showed by the example of the Robertsons reaction in (2.4). The theorem shows that we may form L(y) so that it has the right pattern of signs to be a graph Laplacian. Similarly to the remarks following Proposition 1, this ensures positivity of the solutions, and the point we are making here is that the new numerical methods proposed in this paper can be applied, via (4.10), to this big class of important applications. The only difference between (4.10) and the primary focus of this paper in (1.2), is that in (1.2) we additionally assume that 1 ⊤ is in the left null space of A(y), but that does not prevent us from applying the numerical schemes proposed in this paper, and they will preserve positivity as required. (Although there may be issues with other conservation laws, as we show in the autonomous oscillations example (2.8), and our atmospheric chemistry example (5.1).)
We will also consider, for comparison, the following more conventional numerical solvers: • Euler: The first-order explicit Euler method; • RK4: The 4-stage fourth-order explicit RK method (as a reference method to compare); • ROS4: The 4-stage fourth-order Rosenbrock method with coefficients used by default in [24]. • MP2: The second order Modified Patankar method.

Example 1: The SIDARTHE mathematical model
We first consider a generalised SIR model (SIDARTHE) that has been used to model the evolution of the Cov-SARS-2 epidemic in Italy [21]. That model can also be used for any other country with appropriate data or it can be even extended e.g. to age-dependent variables.
The SIDARTHE dynamical system [21] consists of eight ordinary differential equations, describing the evolution of the population in each stage over time. The equations can be written in the form Notice that since the vector field is not a smooth function (it is piecewise constant) the numerical methods deteriorate down to order one. However, a more realistic model should consider b(t) as a smooth time-dependent function, and in this case the order of the methods is recovered.
For simplicity, we take the same initial values for b and the same initial conditions y 0 as in [21], but we take b constant for a longer period, from day 1 to 20.
We observed that the model is very sensitive to the parameter associated to the first component  Next, we take r = 1.5, corresponding to a moderately stiff problem (S(20) still has not dramatically decreased) and we compute the 2-norm error of the solution y(20) versus the time step for the new methods as well as for the explicit Euler method that was used in [21]. The results are displayed in Figure 5.2 (left) where the order of the methods is clearly visible from the slopes of the curves.
The new methods require to compute matrix exponentials and this can be computationally costly in some cases. It is thus interesting to study if it is possible to replace the exact exponential of matrices by cheaper approximations while still preserving positivity. This is not a very stiff problem and we have repeated the same numerical experiments while replacing each exponential by the second-order diagonal Padé approximation. In order to preserve positivity, we proceed as follows, given A =Ã + a * I whereÃ O, we consider the following approximation to the exponential e tA = e ta * e tÃ ≃ e ta * 1 + 1 2 tÃ 1 − 1 2 tÃ .
Note that, since 1 ⊤Ã = −a * , we have and mass is not preserved. This can be fixed, for example, if we also approximate the scalar function e ta * by the second-order diagonal Padé approximation, so and this approach preserves norm and positivity in the stability region.
The results are shown in Figure 5.2 (right). We observe that the schemes maintain their accuracy while being considerably cheaper. The third-order method EM3 exhibits second order accuracy (due to the second order Padé approximation) but this occurs only at higher accuracies.
For clarity in the presentation, the results for MP2 are not shown but, as expected they are slightly worse but close to the results given by EM2.
Unfortunately, this is not the case if we repeat the numerical experiment with the very stiff problem of Robertson's reaction. Once higher-order approximations to the exponential are used, positivity is not guaranteed. Not all higher-order Padé approximations preserve positivity, unlike the second order one, and this deserves further investigation.

Example 2: Robertson's reaction.
Let us now consider the Robertson's reaction written in the form (2.4) with initial conditions y 0 = [1, 0, 0] ⊤ and time interval t ∈ [0, 0.3] as in [24] (p. 57). We numerically solve the problem repeatedly using different values for the time step and compute the 2-norm error of the solution at the final time. Here, we compare with the 'exact' solution that is computed numerically with sufficiently high accuracy.
Notice that this is a very stiff problem that turns into a non-stiff problem if one applies an appropriate time transformation which can be integrated with a constant time step (in the fictitious time) by methods for non-stiff problems. This is basically the case studied in [14] with time step h n = 1.8 n ×h 0 and initial time step h 0 = 10 −6 that allows to integrate for the interval t ∈ [0, 10 11 ] with a very small number of time steps, but the details in the reaction at the very beginning can be lost. for all time steps considered). Note the relatively high accuracy provided by the new schemes even when considering large time steps. The best method among the proposed schemes depends on the desired accuracy where the computational cost has to be taken into account. As in the previous example, the results for MP2, not shown, are slightly worse but close to the results given by EM2.
We have repeated the same numerical experiments using only the new exponential methods, but applied to the equations as given in (2.5), i.e. the same problem but written in a different way such that the matrix is no longer graph Laplacian. Figure 5.2 (right) shows the results obtained. We filled a relevant circle when, during the numerical integration, a negative solution was obtained on any of the components. For small time steps the performance is quite similar (and the performance for EM1 is actually somewhat better) but the errors grow faster for large time steps (lower accuracies) and, even worse, negative solutions do occur.
This is a non-autonomous systems that can be written in the form y ′ = A(t, y)y with A(t, y) an explicitly time-dependent graph Laplacian matrix. We can write the vector field in terms of the production and destruction parts where P (t, y), D(t, y)y are non-negative. While the diagonal matrix D is unique in this case, we can write P (t, y) = A P (t, y)y in many different ways for the matrix A P . We have considered the following choice (other choices of A P can be considered) for A, This problem has two linear mass conservation laws, the number of atoms of oxygen and nitrogen, respectively. Given and both mass conservations cannot be simultaneously preserved by our schemes. We have to decide how to choose A P to optimise the performance of our methods: this is typical to geometric numerical integration of differential equations with multiple invariants. For this particular choice we have and then, in general, w ⊤ 1 y(t) = const. However, a good choice for A P can provide solutions where this quantity is preserved to very high accuracy.
We have observed that y 1 , y 2 and y 5 take very small, but positive, values (say 10 −200 or smaller) along the integration (standard methods usually provide negative values). In that case, measuring relative error is not appropriate for these components. Figure 5.4 shows the evolution of the concentration of the different species in a logarithmic scale. Negative values in this plot correspond to having no particles.
We have repeated the numerical experiments, integrating for just one hour (instead of 72 hours) and measured the two-norm relative error for the vector with componentsỹ = [y 3 , y 4 , y 5 , y 6 ] since at the final time y 1 and y 2 vanish. The reference solution is obtained numerically using the third-order method and a sufficiently small time step. Figure 5.5 (right panel) shows the results obtained where we can observe the order of convergence of each method for this non-autonomous problem. Figure 5.5 (left panel) shows the error in the preservation of the quantities I 1 = w ⊤ 1 y(t f ) (curves with circles) and I 2 = w ⊤ 2 y(t f ) (curves with stars). Remarkably, the error committed for I 2 is orders of magnitude smaller than the error in the actual solution, as seen the left panel in Fig. 5.5! We observe that MP2, as in the previous examples, provide slightly worse results than EM2 when accurate results are desired, but the error considerably grows for large time steps (the approximation to the exponential in this case is not accurate). Surprisingly, it provides more accurate results in the exact preservation of I 2 and for  all time steps positivity was preserved even if this property was not guaranteed for the method MP2 since A is not graph Laplacian. Were one to prove that the matrix A has no eigenvalues with positive real part then I − tA would be an M -matrix and positivity would be guaranteed, and this deserves further investigation.

Example 4: The MAPK cascade
Finally, we consider the model of [23] (Table 3, Fig 3, equations (12)- (17)), which is closely related to the MAPK cascade, given in (2.8) with values therein for the parameters and initial conditions. The solution for each component is shown in the left panel of Figure 5.6 for the time interval t ∈ [0, 200] (the initial conditions clearly identify each curve) where we observe that, after a transition period, the solution turns nearly periodic. Next, we have numerically solved the problem for α = 1 using the new exponential methods using different values of the time step and measured the two-norm relative error in the vector solution at the final time. The right panel of Figure 5.6 shows the results obtained.

Conclusions
Preservation of inequalities is considerably more challenging than the recovery of 'equality invariants' under discretisation. Thus, while numerous geometric numerical integration algorithms present us with a wide range of highly effective means to recover conservation laws, often of crucial importance in applications, this is not the case with inequalities and, of particular importance to us, nonnegativity of solutions. The importance of the latter in applications is clear -the number of chemical species cannot be negative, temperature cannot be less that 0 • K, the number of infected people I(t) cannot (sadly) be negative -yet we cannot be assured that computed ODE solutions remain nonnegative in this setting unless the order is unacceptably low. As aforementioned, the subject has already received significant attention and led to the development of Patankar-type methods [5,14,34,47]. In this paper we have developed a framework allowing us to use higher-order methods in this setting. While this framework is by no means final and many challenges remain, it represents in our view useful contribution to a different kind of geometric numerical integration, one dealing with preservation of inequalities.
An outstanding challenge is to approximate the exponential of matrices by diagonal Padé approximants or by other means (e.g. Krylov-subspace methods) to reduce the cost of the algorithms for large ODE systems while still preserving positivity. Another is to explore the scope of methods, like the commutator-free Magnus integrators (3.18), which almost preserve positivity and formulate 'almost preservation' in more precise terms.
Yet, perhaps the most interesting challenge is to explore the surprising success of 'almost positivity-preserving' methods, e.g. the fourth-order commutator-free Magnus method, in the examples in this paper. Recall that classical ODE solvers that preserve positivity are restricted to order one [10], while in this paper we have introduced second-order positivity-preserving methods in the non-classical class of Magnus integrators, and other high-order methods have been introduced elsewhere, in particular modified Patankar methods. It is natural to formulate the conjecture that this is as much as can be done within the realm of such methods, but equally fascinating is the remarkable almost-preservation of positivity or mass (at any rate in the examples of this paper) by some higher-order methods. For example, Figure 5.5 (left) is concerned with two conservation laws in a stratospheric reaction: one is preserved correctly, up to roundoff error, while the other is preserved to much higher accuracy than the error committed (cf. Fig. 5.5 right) in the solution itself. We look forward to an explanation.