Consistency and convergence for a family of finite volume discretizations of the Fokker--Planck operator

We introduce a family of various finite volume discretization schemes for the Fokker--Planck operator, which are characterized by different weight functions on the edges. This family particularly includes the well-established Scharfetter--Gummel discretization as well as the recently developed square-root approximation (SQRA) scheme. We motivate this family of discretizations both from the numerical and the modeling point of view and provide a uniform consistency and error analysis. Our main results state that the convergence order primarily depends on the quality of the mesh and in second place on the quality of the weights. We show by numerical experiments that for small gradients the choice of the optimal representative of the discretization family is highly non-trivial while for large gradients the Scharfetter--Gummel scheme stands out compared to the others.


Introduction
The Fokker-Planck equation (FPE), also known as Smoluchowski equation or Kolmogorov forward equation, is one of the major equations in theoretical physics and applied mathematics. It describes the time evolution of the probability density function of a particle in an external force field (e.g., fluctuating forces as in Brownian motion). The equation can be generalized to other contexts and observables and has been employed in a broad range of applications, including physical chemistry, protein synthesis, plasma physics and semiconductor device simulation. Thus, there is a huge interest in the development of efficient and robust numerical methods. In the context of finite volume (FV) methods, the central objective is a robust and accurate discretization of the (particle or probability) flux implied by the FPE.
A particularly important discretization scheme for the flux was derived by Scharfetter and Gummel [SG69] in the context of the drift-diffusion model for electronic charge carrier transport in bipolar semiconductor devices [vR50]. The typically exponentially varying carrier densities at p-n junctions lead to unphysical results (spurious oscillations), if the flux is discretized in a naive way using standard finite difference schemes [MW94]. The problem was overcome by considering the flux expression as a one-dimensional boundary value problem along each edge between adjacent mesh nodes. The resulting Scharfetter-Gummel (SG) scheme provides a robust discretization of the flux as it asymptotically approaches the numerically stable discretizations in the drift-(upwind scheme) and diffusiondominated (central finite difference scheme) limits. The SG-scheme and its several generalizations to more complex physical problem settings are nowadays widely used in semiconductor device simulation [Mar86,FRD + 17] and have been extensively studied in the literature [BMP89,EFG06,FKF17,Kan20]. The SG-scheme is also known as exponential fitting scheme and was independently discovered by Allan and Southwell [AS55] and Il'in [Il'69] in different contexts.
Recently, an alternative flux discretization method, called square-root approximation (SQRA) scheme, has been derived explicitly for high dimensional problems. The original derivation in [LFW13] aims at applications in molecular dynamics and is based on Markov state models. However, it can also be obtained from a maximum entropy path principle [DJSD15] and from discretizing the Jordan-Kinderlehrer-Otto variational formulation of the FPE [Mie13a]. In Section 3.3, we provide a derivation of SQRA scheme, which is motivated from the theory of gradient flows. In contrast to the SG-scheme, the SQRA is very recent and only sparsely investigated. The only contributions on the convergence seem to be [Mie13a] in 1D, [DHWK] (formally, rectangular meshes) and [Hei18] using G-convergence.
The SG and the SQRA schemes both turn out to be special cases of a family of discretization schemes based on weighted Stolarsky means, see Section 3.2. This family is very rich and allows for a general convergence and consistency analysis, which we carry out in Sections 4-5. Interestingly, there seems to be no previous results in the literature.
1.1. The FPE and the SG and SQRA discretization schemes. In this work, we consider the stationary Fokker-Planck equation , where κ > 0 is a (possibly space-dependent) diffusion coefficient and V ∶ Ω → R is a given potential. The flux J consists of a diffusive part κ∇u and a drift part κu∇V , which compensate for the stationary density π = e −V (also known as the Boltzmann distribution) as J(e −V , V ) = 0. This reflects the principle of detailed balance in the thermodynamic equilibrium. The right-hand side f describes possible sink or source terms. The SG and the SQRA schemes of the Fokker-Planck operator div J(u, V ) that are considered below are given in the form where ∑ j∶ j∼i indicates a sum over all cells adjacent to the i-th cell of the mesh, m ij is the mass of the interface between the i-th and j-th cell, h ij is the distance between the corresponding nodes and κ ij is the discretized diffusion coefficient κ. We are particularly interested in the two cases (1.4) with either the Bernoulli function B 1 (for SG) or with the SQRA-coefficient B 2 . The schemes are derived under the assumption of constant flux, diffusion constant and potential gradient along the respective edges.
In the pure diffusion regime, i.e., for V i − V j → 0, the Bernoulli function provides B 1 (V i − V j ) → 1, such that the SG scheme approaches a discrete analogue of the diffusive part of the continuous flux: J ij = κ ij (u i − u j ) h ij . In the drift-dominated regime, i.e., for V j − V i → ±∞, the asymptotics of B 1 recover the upwind scheme which is a robust discretization of the drift part of the flux, where the density u is evaluated in the donor cell of the flux. Hence, the Bernoulli function B 1 interpolates between the appropriate discretizations for the drift-and diffusion-dominated limits, which is why the SG scheme is the preferred FV scheme for Fokker-Planck type operators. Indeed, the SQRA scheme is consistent with the diffusive limit, but is less accurate than the SG scheme in the case of strong gradients ∇V .
1.2. The Stolarsky mean approximation schemes . In this work, we investigate the relative L 2 -distance between the discrete SQRA and SG solutions on the same mesh and the order of convergence of the SQRA scheme, which was an open problem. It turns out that both methods are members of a broad family of finite volume discretizations that stem from the weighted Stolarsky means see Section 3.2. We benefit from the general structure of these schemes and prove order of convergence on consistent meshes in the sense of the recent work [DPD18]. We will see that the error naturally splits into the consistency error for the discretization of the Laplace operator plus an error which is due to the discretization of the stationary solution π and the Stolarsky mean, see Theorem 5.4.
We will demonstrate below that the Stolarsky discretization schemes for (1.1) read is the integral of f over the i-th cell and S ij = S α,β (π i , π j ) is a Stolarsky mean of π i and π j . We sometimes refer to the general form (1.6) as discrete FPE.
The Stolarsky means S α,β generalize Hlder means and other f -means (see Table 2). An interesting aspect of the above representation is that all these schemes preserve positivity with the discrete linear operator being an M -matrix. Furthermore, with the relative density U = u π we arrive at − j∶ j∼i which is a discretization of the elliptic equation where the discrete Fokker-Planck operator becomes a purely diffusive second order operator in U . Furthermore, if κ is a symmetric strictly positive definite uniformly elliptic matrix, this operator is also symmetric strictly positive definite and uniformly elliptic.
In the latter setting, we can thus rule out the occurrence of spurious oscillations in our discretization.
Although we treat the Stolarsky means as an explicit example, note that the main theorems also hold for other smooth means.
1.3. Major contributions of this work. Since we look at the FV discretization of the FPE from a very broad point of view, we summarize our major findings.
• We provide a derivation of the general Stolarsky mean FV discretization in Section 3.2. • We discuss the gradient structure of the discretization schemes in view of the natural gradient structure of the FPE in Section 3.3. • We provide order of convergence of the schemes as the fineness of the discretization tends to zero. In particular we show that the order of convergence is mainly determined by two independent parts: the consistency (Def. 2.7) of the mesh (Section 5.1) and an error due to the discretization of π along the edge by the means S * (π i , π j ). that the order of convergence strongly depends on the constant α + β, where α and β are the Stolarsky coefficients in S ij = S α,β (π i , π j ) (Corollary 4.3). that the SG coefficients are to be preferred in regions of strong gradient ∇V (Section 5.2).
1.4. Outlook. The results of this work suggest to search for "optimal" parameters α and β in the choice of the Stolarsky mean in order to reduce the error of the approximation as much as possible. However, from an analytical point of view, the quest for such optimal α and β is quite challenging. Moreover, since the optimal choice might vary locally, depending on the local properties of the potential V , we suggest to implement a learning algorithm that provides suitable parameters α and β depending on the local structure of V and the mesh.
1.5. Outline of this work. After some preliminaries regarding notation and a priori estimates in Section 2, we will recall the classical derivation of the SG scheme in Section 3.1 and discuss its formal relation to SQRA. We will then provide a derivation of SQRA from physical principles in Section 3.3, based on the Jordan-Kinderlehrer-Otto [JKO98] formulation of the FPE. In Section 3.2, we show that SG and SQRA are elements of a huge family of discretization schemes (1.6). Section 5 provides the error analysis and estimates for the consistency and the order of convergence. We distinguish the cases of small and large gradients and have a particular look at cubic meshes.
Finally, we show hat the optimal choice of S * depends on V and f but is not unique. If S α,β denotes one of the Stolarsky means, we will prove in Section 4 that the Stolarsky means satisfying α+β = const show similar quantitative convergence behavior as suggested in Corollary 4.3. Finally, this result is illustrated in Section 6 by numerical simulations.

Preliminaries and notation
We collect some concepts and notation, which will frequently be used in this work.
2.1. The Mesh. For a subset A ⊂ R d , A is the closure of A.
Definition 2.1. Let Ω ⊂ R d be a polygonal domain. A finite volume mesh of Ω is a triangulation T = (V, E, P) consisting of a family of control volumes V ∶= {Ω i , i = 1, . . . , N } which are convex polytope cells, a family of (d − 1)-dimensional interfaces .N is such that x i = x j if i = j and the straight line D ij going through x i and x j is orthogonal to σ ij . and admissible if (v) For any boundary interface σ ∈ E ∂ ∩ E i it holds x i ∈ σ and for D i,σ the line through x i orthogonal to σ it holds that D i,σ ∩ σ = ∅ and let y σ ∶= D i,σ ∩ σ.
Property (iv) is assumed in [GHV00] in order to prove a strong form of consistency in the sense of Definition 2.10 below. It is satisfied for example for Voronoi discretizations.
We write m i for the volume of Ω i and for σ ∈ E we denote m σ its (d − 1)-dimensional mass. In case σ ij ∈ E i ∩ E j we write m ij ∶= m σ ij . For the sake of simplicity, we consider P ∶= (x i ) i=1,...,N and P ∶=P ∪ {y σ ∶ σ ∈ E ∂ , according to (v)}. We extend the enumeration ofP to P = (x j ) j=1,...,Ñ and write We further call P * ∶= {u ∶ P → R} ,P * ∶= u ∶P → R , and E * ∶= {w ∶ E → R} the discrete functions from P resp.P resp. E to R. For w ∈ E * we write w ij ∶= w(σ) if σ ij = σ. Then for fixed i the expression j∶ i∼j is the sum over all edges. Moreover, we define the diameter of a triangulation T as The identity will frequently be used throughout this paper, where we often encounter the case A ij = α ij U i with α ij = −α ji : Formula (2.1) in particular allows for a discrete integration by parts: On a given mesh T = (V, E, P), we consider the linear discrete operator L T κ ∶ P * → P * , which is defined by a family of non-negative weights κ ∶ E → R and acts on functions u ∈ P * via While (2.4) is very general, it is shown in [GHV00], Lemma 3.3, that the property (iv) of Definition 2.1 comes up with some special consistency properties for the choice of (2.5) where d i,ij and d j,ij are the distances between σ ij and x i and x j respectively and averaged diffusion coefficient is defined by κ i = 1 m i ∫ Ω i κ(x)dx. Lemma 2.2 (A consistency lemma, [GHV00]). Let the T = (V, E, P) satisfy Definition 2.1 (i)-(v) and let d ∈ {2, 3} and let h ij be uniformly bounded from above and from below. Then for every u ∈ H 2 (Ω) it holds Lemma 2.2 was one of the motivations to provide a more general and powerful concept of consistency in [DPD18], as we will discuss in Section 2.5 2.2. Existence and a priori estimates. From the standard theory of elliptic systems ( [Eva98] Chapter 6), we have the following theorem.
In what follows, we frequently use the following transformations in (1.1) and (1.6): we define the relative densities U = u π and U T i = u T i π i to find If κ and π are non-degenerate (in the sense of π > c > 0 and ξ ⋅ κξ > c ξ 2 ), the left hand side of (2.7) defines a strongly elliptic operator on the finite volume space L 2 (P) and hence there exists a unique solution U T . Concerning the right hand side, using (2.7) as the discretization of (2.6), one natural choice for Having shown the existence of solutions to (2.6) and (2.7), we recall the derivation of some natural a priori estimates for both the continuous Fokker-Planck equation and the discretization. Continuous FPE. Let u, resp. U = u π, be a solution of the stationary Fokker-Planck equation (2.6) with Dirichlet boundary conditions. Testing with U , we get (assuming homogeneous Dirichlet boundary conditions and exploiting thus the Poincar inequality), that Furthermore, the standard theory of elliptic equations (e.g., [Eva98]) yields that U H 2 (Ω) ≤ C f L 2 , where C depends on the C 1 -norm of κπ and the Poincar-constant. Discrete FPE. Let U T i be a solution of (2.7) with f i = m ifi = ∫ Ω i f dx (as specified in the Tab. 1), i.e., Then, multiplying with U T i we get Summing over all x i ∈ P and using (2.3), we conclude with help of the discrete Poincar inequality (see Theorem 2.5 below) Additionally, one gets

2.3.
Fluxes and L 2 -spaces. In order to derive and formulate variational consistence errors for the discrete FPE (2.7), we introduce the discrete fluxes (2.10) In particular, if S ij = √ π i π j we get the flux of the SQRA J SQRA Note that J ij U is the spatial average of J(U ) = −κπ∇U on σ ij . The quantity J S ij U T can indeed be considered as a flux in the sense that it will be shown to approximate J ij , S ij is a discrete approximation of π σ ij , κ ij is a discrete approximation of κ σ ij and 1 While former approaches focus on the rate of convergence of 1 we additionally follow the approach of [DPD18] applied to U and are interested in the rate of convergence of J S ij U T → J(U ), which is an indirect approach to the original problem as this rate of convergence is directly related to 1 In view of the natural norms for the variational consistency (see (2.17)-(2.19)), we introduce the following With all the above notations, our a priori estimates (2.8) and (2.9) now read Assuming that the diffusion coefficient is bounded, i.e. κ * ≥ κ ≥ κ * , we further get . Remark 2.4 (Naturalness of norms). Let us discuss why these norms are natural to consider. The left norms in (2.11) can be interpreted as the Euclidean L 2 -norms on Ω, P and E, while the right norms are the natural norms for the study of the Fokker-Planck equation as they are weighted with the inverse of the Boltzmann distribution π, resp. π i . Note that assuming V is bounded from above and below, the L 2 -norms ⋅ L 2 π (Ω) and ⋅ L 2 (Ω) are equivalent and the same holds true for the two norms in the discrete setting. Given a discretization T , the linear map defines an integral on Ω w.r.t. a discrete measure µ T having the property that for every bounded measurable set with L d (∂A) = 0. The norm U 2 L 2 (P) is simply the L 2 -norm based on the measure µ T .
Similar considerations work also for the norm on E * . The norm ⋅ 2 L 2 (E) is given via a measureμ T having the propertỹ with the property thatμ T → d ⋅ L d vaguely: every Voronoi cell Ω i consists of disjoint cones with mass 1 d m ij h ij , where one has to account for all cones with j ∼ i. In particular, we obtainμ T (A) ≈ d ⋅ L(A) for Lipschitz domains -an estimate which then becomes precise in the limit. Without going into details, let us mention that heuristically the prefactor d balances the fact that J ij ≈ For the particular case of a rectangular mesh, this is straight forward to verify.

Poincar inequalities.
In order to derive the a priori estimates in Section 2.2 we need to exploit (discrete) Poincar inequalities to estimate u L 2 (Ω) by ∇u L 2 (Ω) or u T In particular, we use the following theorem.
Then for every u ∈ L 2 (P) and for every η ∈ R d it holds and particularly Proof. This follows from Lemma A.1 with C # ≤ diamΩ h 0 and the choice η > diamΩ.
2.5. Consistency and inf-sup stability. Results such as Lemma 2.2 motivated the authors of the recent paper [DPD18] to define the concepts of consistency and inf-sup stability as discussed in the following. For readability, we will restrict the general framework of [DPD18] to cell-centered finite volume schemes and refer to general concepts only as far as needed.
Definition 2.6 (inf-sup stability). A bilinear form a T on L 2 (P) for a given mesh T = (V, E, P) is called inf-sup stable with respect to a norm ⋅ H T on a subspace of H T ⊂ L 2 (P) if there exists γ > 0 such that Usually, and particularly in our setting, a T is the discretization of a continuous bilinear form, say a (u, v) = ∫ Ω ∇u ⋅ (κ∇v). We are interested in the problem where l ∶ H 1 0 (Ω) → R is a continuous linear map, and in the convergence of the solutions of the discrete problems Definition 2.7 (Consistency). Let B ⊂ H 1 0 (Ω) be a continuously embedded Banach subspace and for given T = (V, E, P) consider continuous linear operators R T ∶ B → L 2 (P) with uniform bound. Let u be the solution to the linear equation (2.13) and let l T ∶ L 2 (P) → R be a family of linear functionals. The variational consistency error is the Let now a family (T , a T , l T ) with diamT → 0 be given and consider the corresponding family of linear discrete problems (2.14). We say that consistency holds if . Consistency measures the rate at which R T u − u T → 0 and particularly provides a positive answer to the question whether the numerical scheme converges, at least if the solution of (2.13) lies in B. This is formulated in Theorem 10 of [DPD18].
Theorem 2.9 (Theorem 10, [DPD18]). Using the above notation, it holds ) is a norm on the discrete gradients. By the discrete Poincar inequality, (2.15) also implies an convergence estimate for the discrete solutions itself. The theorem can be understood as a requirement on the regularity of u, resp. the right hand side of (2.13).
The combination of the proofs of Theorems 27 and 33 of [DPD18] shows that the variational consistency error for Introducing on L 2 (P) the H T -norm given by we find In view of the Poincar inequality in Theorem 2.5 the norm ⋅ L 2 (P) is bounded by ⋅ H T ,κ in case κ is uniformly bounded away from 0. The right hand side of equation (2.18) gives rise to the definition of a "dual" H T -norm which we denote With regard to (2.15) and Lemma 2.2, the above considerations motivate the following definition.
Hence, we immediately obtain the following.
Proposition 2.11. Under the assumptions of Lemma 2.2 and assuming h ij ≤ Ch for some constant C > 0 the mesh is ϕ-consistent with ϕ(h) = h. We say that the mesh is h-consistent.
Then we want to estimate the terms In fact the following calculations are quite standard and, therefore, we shorten our considerations. Now, we want to estimate m ijÛ Subtracting both equations, we end up withÛ j −Û i h = ∇U ⋅ ν ij + O(h 2 ), and hence, Theorem 2.12 (Consistency on cubic meshes). Let We will consider a general κ in Section 5.3 below.

Derivation of the methods and formal comparison
In this section, we recall the original derivation of the Scharfetter-Gummel scheme and then show that both the SG and the SQRA scheme are members of a huge family of discretization schemes. Finally, we provide a physically motivated derivation of the SQRA scheme.
One dimensional case. The Scharfetter-Gummel scheme for the discrete flux on the interval [0, h] is derived under the assumption of constant flux J, force q = −dV dx and diffusion coefficient κ on [0, h]. We consider the two-point boundary value problem where q ∶ [0, h] → R describes the constant force inducing the drift current (i.e., the potential V is a linear function on [0, h]) and κ > 0 is a constant, positive diffusion coefficient. The problem has an elementary solution of the form Using the second boundary value u(h) = u h , we get an explicit form for the flux Higher dimensional case. In higher dimensions, the flux between two neighboring cells j ∼ i is discretized along the same lines as in the one dimensional case (i.e., assumption of constant force, flux and diffusion constant along each edge of the mesh). We project the flux J on the edge where the assumption of a linear affine potential (inducing the constant force q = −∇V ) where 0 ≤ s ≤ 1 parametrizes the position on the edge. Then, with du ds = h ij ⋅ ∇u and h ij ⋅ J = h ij J ij , we arrive at the two-point boundary problem which is equivalent to the one dimensional problem. (3.1). The solution reads which can also be written as Remark 3.1. In case of a sufficiently fine discretization that accurately takes into account the structure of V , we can expect that We then infer which is the flux discretization according to the SQRA scheme. This becomes more clear in the following sections.
Remark 3.2 (Motivation of discretized diffusion coefficient κ ij ). Considering inhomogeneous media, where the diffusion coefficient is not necessarily constant, a suitable discretization for the diffusion κ ij is needed. Let us neglect for a moment the drift qu and assume that we have κ i on Ω i around x i and κ j on Ω j around x j . We compute the flux J ij from x i to x j . Let x 0 be the intersection of h ij and σ ij , and moreover, and hence the weighted harmonic mean Note that 1 κ is the mobility and hence we conclude 1 i.e. the arithmetic mean of the mobilities 1 κ i and 1 κ j .
Interestingly, the harmonic mean is yet another special case of Stolarsky means (see below) for α = −2 and β = −1. Thus classical FV discretizations of classical elliptic problems based on discretizations of −∆ are another particular case of our general study.

3.2.
A family of discretization schemes. Repeating the above calculations from a different point of view reveals some additional structure of the Scharfetter-Gummel scheme and puts it into a broader context.
Taking into account the special structure of the Fokker-Planck equation in (3.1), we for a general potential V ∶ [0, h] → R not necessarily assumed to be affine. The general solution reads The flux can be computed explicitly from the assumption J = const. and setting x = h in the above formula. This yields , which clearly determines the constant flux along the edge. In particular, assuming that V is affine, i.e.
, which is the mean corresponding to the Scharfetter-Gummel discretization. However, a potential can also be approximated not by piecewise affine interpolation but in other ways, resulting in different means π mean . We provide an example of such an approximation for the SQRA in the Appendix A.4.
We aim to express π mean by means of the values π 0 and π h at the boundaries. The choice of this average is non-trivial and determines the quality of the discretization scheme, as we will see below. In the present work, we focus on the (weighted) Stolarsky mean, although there are also other means like general f -means (M f for a strictly increasing function f ). The Stolarsky mean has the advantage that it is a closed formula for a broad family of popular means and that its derivatives can be computed explicitly.
An explicit calculation shows that ∂ 2 particularly reproducing the above findings for ∂ 2 x S 0,−1 and ∂ 2 x S −1,1 . With respect to (1.3)-(1.4), we observe that and (1.2) can be brought into the form (1.6), which we equally write as where S * equals either S 0,−1 or S −1,1 . For general means S α,β (x, y), we have the relation S α,β (x, y) = xS α,β (1, y x), such that the weight function for arbitrary parameters α and β reads B α,β (x) = S α,β (1, e −x ) . In particular, it holds for any α and β which guarantees the consistency of the scheme with the thermodynamic equilibrium.
Interestingly, the derivation of the SQRA in Section 2.2 of [LFW13] relies on the assumption that the flux through a FV-interface has to be proportional to u T j π j − u T i π i with the proportionality factor given by a suitable mean of π i and π j . The choice of S −1,1 in [LFW13] seems arbitrary, yet it yields very good results [WE17, FKN + 19, DHWK].
3.3. The Wasserstein gradient structure of the Fokker-Planck operator and the SQRA method. The choice of S * turns out to be crucial for the convergence properties. In this section, we look at physical structures which are desirable to be preserved in the discretization procedure. Our considerations are based on the variational structure of the Fokker-Planck equation. Let us note at this point that a physically reasonable discretization is not necessarily the best from the rate of convergence point of view. Indeed, this last point will be underlined by numerical simulations in Section 6. However, the physical consideration is helpful to understand the family of Stolarsky discretizations from a further, different point of view.
In [JKO98] it was proved that the Fokker-Planck equation (3.9)u = ∇ ⋅ (κ∇u + κu∇V ) has the gradient flow formulationu = ∂ ξ Ψ * (u, −DE(u)) where (3.10) and π = e −V is the stationary solution of (3.9). Indeed, one easily checks that DE(u) = log u + V = log u π and ∂ ξ Ψ * (u, ξ) = −∇ ⋅ (κu∇ξ) such that it formally holds Given a particular partial differential equation, the gradient structure might not be unique. For example, the simple parabolic equation ∂ t u = ∆u can be described by (3.10) with V = 0. But at the same time one might choose E(u) = ∫ u 2 with Ψ * (ξ) = ∫ ∇ξ 2 , which plays a role in phase field modeling (see [HMR11] and references therein) or E(u) = − ∫ log u with Ψ * (ξ) = ∫ u 2 ∇ξ 2 . In view of this observation, one might pose the question about "natural" gradient structures of the discretization schemes. This is reasonable if one believes that discretization schemes should incorporate the underlying physical principles. The energy functional is clearly prescribed by (3.10) with the natural discrete equivalent The discrete linear evolution equation can be expected to be linear. Since we identified the continuous flux to be J = −κπ∇U with U = u π, we expect the form  MPPR17], where a dissipation of cosh-type was appeared in the Large deviation rate functional for a hydrodynamic limit of an interacting particle system. All of them can be written in the abstract form In fact, any positive and convex function ψ * defines a reasonable dissipation functional Ψ * by (3.13) and (3.14). A special case is when choosing for ψ * and exponentially fast growing function ψ * (r) ∶= C * (r) ∶= 2 (cosh(r 2) − 1). Then a ij simplifies to a ij (u, π) = u i u j π i π j , and hence, the square root appears. Choosing S ij = √ π i π j , we end up with a dissipation functional of the form There are (at least) three good reasons why choosing this gradient structure, i.e., modeling fluxes in exponential terms: a historical, a mathematical and a physical: ( we observe that the dissipation mechanism Ψ * is totally independent of the particular form of the energy E, which is determined by the potential V . This is physically understandable, since a change of the energy resulting, e.g., from external fields should not influence the dissipation structure. The same holds for the discretized version (3.15). In fact it was shown in [MS19], that the only discrete gradient structure, where the dissipation does not depend on V resp. π = e −V , is the cosh-gradient structure with the SQRA discretization S ij = S −1,1 (π i , π j ). In particular, this characterizes the SQRA. For convenience, we add a proof for that to the Appendix A.2. We think that these properties distinguish the SQRA, although in the following the convergence proofs do not really rely on the particular discretization weight S ij . Remark 3.3 (Convergence of energy and dissipation functional). Let us finally make some comments on the convergence of E T and Ψ * T given in (3.11) and (3.15) to the continuous analogies E and Ψ * . Γ-convergence can be shown if the fineness of T tends to 0. For the energies it is clear, since u ↦ u log (u π) − u is convex. For the dissipation potentials Ψ * T (u, ξ) we observe the following: For smooth functions u and ξ, we have The considerations from Section 2.3 then yield Ψ * T (u, ξ) ≈ 1 2 ∫ Q u ∇ξ 2 . For quadratic dissipation, qualitative convergence results in 1-D using the underlying gradient structure are obtained in [DL15] looking at energy-dissipation mechanism, and in [GKMP19] proving convergence of the metric.

Comparison of discretization schemes
We mutually compare any two discretization schemes of the form (1.6) in case of Dirichlet boundary conditions. In this case, even though the problem is only defined onP, we can simply sum over all P once we multiplied with a test function that assumes the value 0 at all P P .
Let us recall the formula (2.10) for the fluxes Moreover, let u i = U i π i andũ i =Ũ i π i be the solution of the discrete FPE (1.6) for two different smooth mean coefficients S ij = S(π i , π j ) andS ij =S(π i , π j ) (e.g. once for Scharfetter-Gummel and once for SQRA) such that In order to compare the solutions of (4.1) and (4.2) we take the difference of these two equations and multiply with E i = U i −Ũ i . We obtain Introducing the notation α ik = κ ik m ik h ik and using (2.2) we get Using the notation D ik A = A k − A i for discrete gradients In the case of Stolarsky means the constants are more explicit. We have the following expansion of S ij : writing π ij = 1 2 (π i + π j ), π + = π − = 1 2 (π i − π j ) and π i = π 0 + π + and In case (α + β) = α +β , we obtain S ij −S ij = O (π i − π j ) 3 and hence this yields the following first comparison result: Theorem 4.1. Let T be a mehs with right hand side f ∈ L 2 (P) and let u andũ be a two solution of the discrete FPE for different Stolarsky mean coefficients S ij = S α,β (π i , π j ) andS ij = Sα ,β (π i , π j ) respectively. Then In case (α + β) = α +β we furthermore find We aim to refine the above result to an order of convergence result for J S U − JSŨ .. We introduce the auxiliary smooth meanŜ ik =Ŝ(π i , π k ) and find and using Cauchy-Schwartz inequality, we get Altogether we obtain We make once more use of (4.4) writing C α,β ∶= α+β 24 and exploiting π i = π ij +π ij ( Hence, we conclude the following result. Theorem 4.2. Let T be a mesh with right hand side f ∈ L 2 (P) and let u andũ be two solutions of the discrete FPE for different Stolarsky means S andS. Moreover, letŜ be any Stolarsky mean and assume that either α + β ≠α +β orα +β ≠α +β. Then the solutions u andũ of the discretized FPE satisfy the symmetrized error estimate up to higher order More general, for any mean we have and in particular for Stolarsky means with α + β =α +β =α +β we find the following result: Corollary 4.3. Let T be a mesh with right hand side f ∈ L 2 (P) and let u andũ be two solutions of the discrete FPE for different Stolarsky mean coefficients S ij = S α,β (π, π j ) andS ij = Sα ,β (π, π j ) with α + β =α +β =α +β. Then estimate (4.5) holds. In particular, we find the refined estimate In particular, the last result shows that convergence rates are similar up to order 3 for different α, β which satisfy α + β = const.

Convergence of the discrete FPE
In this section, we derive general estimates for the order of convergence of the Stolarsky FV operators. Throughout this section, we assume that the mesh satisfies the consistency property of Definition 2.10 with a suitable consistency function ϕ ∶ R ≥0 → R ≥0 and discretization operator R T ∶ H 1 (Ω) ⊃ B → L 2 (P). The parameters π i are then given in terms of π i = (R T π) i . We derive consistency errors for U in Section (5.1) and consistency errors for u in Section (5.2).

Error Analysis in U .
In what follows, we assume that the discrete and the continuous solution satisfy Dirichlet conditions. In view of the continuous and the discrete FPE given in the form (2.6) and (2.7) as well as formula (2.16) we observe that the natural variational consistency error for a given Stolarsky mean S takes the form We recall that an estimate for E T ,FPE (U ; ⋅) implies an order of convergence estimate by (2.15). Our main result of this section provides a connection between E T ,FPE (U ; ⋅) and the variational consistency E T (U ; ⋅) (given by (2.16)) of the second order equation Proposition 5.1. Let T = (V, E, P) be a mesh. The variational consistency error E T ,FPE (U ; ⋅) can be estimated by (5.1) Proof. For simplicity, we writeÛ ∶= R T U . We observe that Using the fact that Taking together (5.3)-(5.5) we obtain (5.1).
Lemma 5.2. Assume there exists a constant C > 0 such that for all cells Ω i , Ω j with h i = diamΩ i it holds Then for C 2 -smooth means S Remark 5.3. Note that (5.6) can be easily verified for cubes.
Proof. Observe that The first term can be estimated by π −π i ≤ h i ⋅∇π +O(h 2 i ) and a similar estimate holds for the second term. The last term, assuming that the mean is C 2 -smooth, can be estimated by Using that π i − π j = ∇π ⋅ h ij + O(h ij ) and that S π i +π j 2 , π i +π j 2 = π i +π j 2 , we obtain that π − S ij 2 ≤ O (h 2 i ). In total we obtain Using the above estimates, we can now show the main result of the section.
Theorem 5.4 (Localized order of convergence). Let the mesh T be admissible in sense of Definition 2.1 and consistent in sense of Definition 2.10. Let u ∈ C 2 0 (Ω) be the solution to (1.1). Let f T ∶= R * T f and let u T ∈ S T be the solution to (2.3). Moreover, let κ ≤ κ * , b > 0 and S ∈ C 2 (R ≥0 × R ≥0 ). Then it holds it holds Proof. Inserting estimate (5.7) int to the estimate of the variational consistency, we get Using (2.15) we obtain an estimate for the discretization error in the form Using the consistency assumption on the discretization of the pure elliptic problem we obtain the desired estimate.

Error Analysis in u.
In the following, we will discuss how to derive bounds on the rate of convergence of u instead of U . As a basis for both proofs of this section, we start with the discrete FP operator which we rewrite as We have where we want to estimate the right-hand side.
and hence Since κ ij ≈ κ., Remark 5.6. The calculation (5.9) is an approximation for small values of V j − V i . In the particular case of large discrete gradients a general approximation of is not at hand. However, in the SG case S * = S 0,−1 we observe (compare with (1.5) and (3.6)) introducing Hence we observe that the SG method is particularly suited to minimize the error term for large gradients ∇V .

5.3.
Qualitative comparison on cubic meshes. In view of Section 2.6 we consider a polygonal domain Ω ⊂ R d with d ≤ 3 and a cubic mesh where In fact the following calculations are quite standard and, therefore, we shorten our considerations. We have for x ∈ σ ij Moreover, we have π i − π(x) = ∇π ⋅ (x i − x). The gradient of S is given by (1 2, 1 2) T and hence, we S ij − π(x) = π i +π j −2π(x) 2 + O(h 2 ). We compute the first term in more detail. We have π j − π(x) = ∇π ⋅ (x j − x) and π i − π(x) = ∇π ⋅ (x i − x) and the sum yields the coordinate on the cell surface with respect to the middle pointx = Now we can fix the function s(x) = κ(x)∇U (x) ⋅ ν ij ∇π(x) with respect tox. We have s(x) = s(x)+(x−x)∇s(x)+O(h 2 ), which implies (assuming that U, π ∈ C 2 and κ ∈ C 1 ) that . But the first vanishes, since the interface σ ij is symmetric w.r.t. the mid pointx and we are integrating alongx. Hence, we have Hence, iterating the above argument twice for κ and π and exploiting in the first step Theorem 2.12 we proved the following.
Theorem 5.7. Let d ≤ 3. On a polygonal domain Ω ⊂ R d with a cubic mesh where

Numerical simulation and convergence analysis
In this section, we provide a numerical convergence analysis of the flux discretization schemes based on weighted Stolarsky means described in the previous sections. For the sake of simplicity, we restrict ourselves to one-dimensional examples, for which already non-trivial results can be observed. The convergence results are summarized in Fig. 1. In Figure 1 (a), the logarithmic error log 10 ( u − u ref L 2 ) is shown in the (α, β)-plane of the Stolarsky mean parameters for an equidistant mesh with 2 10 +1 = 1025 nodes. First, we note that the accuracy for a mean S α,β is indeed practically invariant along α + β = const, which is consistent with our analytical result in Section 4. In this particular example, we observe optimal accuracy at about α + β ≈ 4.2. This coincides with the convergence results under mesh refinement shown in Figure 1 (b), where the fastest convergence is obtained for the scheme involving the S 3.2,1mean. The other considered schemes, however, show as well a quadratic convergence behavior with a slightly larger constant. Interestingly, for the same example, we find that the optimal mean for an accurate approximation of the flux J is on α + β = −3, see Figure 1 (c). This is further evidences in Figure 1 (d), where the harmonic mean S −1,−2 converges significantly faster than the other schemes. Obviously, in the present example, the minimal attainable error for both u and J can not be achieved by the same discretization scheme.
Example 6.2. We consider the potential V (x) = 5 (x + 1) x. The right hand side function, the diffusion constant and the boundary conditions are the same as in Example 6.1. The problem has an exact solution involving the imaginary error function (which is related to the Dawson function), that has been obtained using Wolfram Mathematica [WR17]. The numerical results are show in Figure 2. The discretization errors of both the density u and the flux J shown in Figure 2 (a) and (c) exhibit a sharp minimum on α + β = −1. This includes the Scharfetter-Gummel mean S 0,−1 , which converges fastest to the exact reference solutions for u and J, as shown in 2 (b) and (d). The SQRA scheme, with geometric mean S α,−α , is found to be second best in the present example.
The numerical results are in line with our previous statements from Remark 5.6: In the case of strong gradients ∇V , the Scharfetter-Gummel scheme provides the most accurate flux discretization, in particular, the SG mean S 0,−1 is the only Stolarsky mean that recovers the upwind scheme (1.5). Away from that drift-dominated regime, the situation is less clear and other averages S α,β can be superior, see for instance Example 6.1. Appendix A. Appendix A.1. A General Poincar Inequality. We derive a general Poincar inequality on meshes. The idea behind the proof seems to go back to Hummel [Hum99] and has been adapted in a series of works e.g. [Hei18,HKP17]. Let e 0 = 0 and (e i ) i=1,...,n be the canonical basis of R n . Define: Every ν ∈ S d−1 satisfies ν ⋅ e i ≠ 0 for at least one e i . Thus, for every ν ∈ S d−1 it holds ν ∈ D d−1 if and only if −ν ∈ D d−1 . We denote Γ = ⋃ σ∈E Ω σ and say that x ∈ Γ is a Lipschitz point if Γ is a Lipschitz graph in a neighborhood of x. The set of Lipschitz-Points is called Γ L ⊂ Γ and we note that for the (d − 1)-dimensional Hausdorff-measure of Γ Γ L it holds H d−1 (Γ Γ L ) = 0.
For x ∈ Γ L , we denote ν x ∈ D d−1 the normal vector to Γ in x.. Let For two points x, y ∈ R n denote (x, y) the closed straight line segment connecting x and y and for ξ ∈ (x, y) ∩ Γ L denote the jump of the function u at ξ in direction (y − x), i.e. ⟦u⟧ x,y (ξ) ∈ ±⟦u⟧ (ξ). We can extend ⟦u⟧ to Γ by ⟦u⟧ (x) = 0 for x ∈ Γ Γ L and define Then we find the following result: Let Ω ⊂ R d be a bounded domain. The space H 1 0 (Ω; Γ) is linear and closed for every s ∈ [0, 1 2 ) and there exists a positive constant C s > 0 such that the following holds: Suppose there exists a constant C # > 0 such that for almost all (x, y) ∈ Ω 2 it holds # ((x, y) ∩ Γ) ≤ C # .. Then for every u ∈ H 1 0 (Ω; Γ) it holds Furthermore, for every u ∈ H 1 (Ω; Γ) and every η ∈ R d it holds Proof. In what follows, given u ∈ C 1 0 (Ω; Γ), we write ∇u(x) ∶= ∇u(x) if x ∈ Ω Γ and ∇u(x) = 0 else. For y ∈ R d we denote (x, y) = {x + s (y − x) ∶ s ∈ [0, 1]}. Using 2ab < a 2 +b 2 , we infer for u ∈ C 1 0 (Ω; Γ) and x, y ∈ Ω Γ such that (x, y) ∩ Γ is finite the inequality Since ⟦u⟧ x,y = ⟦u⟧ we compute and obtain We fix η > 0 and consider the orthonormal basis (e i ) i=1,...,d of R d . The determinant of the first fundamental form of Γ is bigger than 1 almost everywhere. Hence we can observe that Ω ξ∈(x,x+ηe 1 )∩Γ where we used that the surface elements are bigger than 1. Furthermore, we have Replacing e 1 in the above calculations with any unit vector e, we obtain from integration of (A.3) with y = x + η, η = ηe, over Ω that Dividing by η and integrating over η ∈ R d , we obtain that for every s ∈ [0, 1 2 ) there exists a positive constant C s > 0 independent from u and K such that Hence, by approximation, the last two estimates hold for all u ∈ H 1 0 (Ω; Γ)..
A.2. Physical relevance of the geometric mean.
In the general case, symmetry of ψ * in ξ i − ξ j implies ψ * (ξ i − ξ j ) = ψ * ( ξ i − ξ j ). We make use of the fact that the original C * (ξ) = cosh ξ −1 is a bijection on [0, ∞) and suppose that hence ψ * (ξ i − ξ j ) = θ (C * (ξ i − ξ j )). This implies particularly that Furhtermore, the symmetry of ψ * implies by the last inequality that ∂ ξ θ (C * (x)) > 0. Inserting this information in (3.13) and (3.14) we observe that has to be independent from π i and π j . From the above case S ij = √ π i π j , we know that is constant in π i and π j . Hence it remains to show that f (π i , π j ) ∶= S ij √ π i π j −1 ∂ ξ ψ u i u j π j π i + u j u i π i π j −1 is independent from π i and π j if and only if ∂ ξ ψ = const and S ij = √ π i π j .
Lemma A.4. It holds (3.5)∂ 2 x S α,β (π, π) = 1 12π (α + β − 3). Proof. We know from Lemma A.3 that ∂ x S α,β (x, x) = 1 2 and ∂ 2 x S α,β (x, x) = −∂ y ∂ x S α,β (x, x). Hence we find We make use of the explicit form We insert x = x + h and y = x − h and make use of the following expansions and Together with a + bh c + dh = a c + bc − ad c 2 h + O h 2 1 + ah 2 1 + bh 2 A.4. Approximation of potentials to get the SQRA mean. The aim of this section is to provide a class of potentials which are easy to handle and which generate the SQRA-mean S −1,1 (π 0 , π h ) by π mean = 1 h ∫ h 0 π −1 −1 . Clearly, choosing the constant potential V (x) ∶= V c ∶= − log S −1,1 (π 0 , π h ) we obtain right mean. Although this works for any means, this has two drawbacks (1) The potential jumps and hence the gradient is somewhere infinite, which means that at these points the force on the particles is infinitely high which is not physical.
(2) Approximating a general function by piecewise constants, on each interval the accuracy is only of order h. However, approximating a function by affine interpolation the accuracy is of order h 2 on each interval (see below for the calculation). So we want to get a potential which may be used as a good approximation (i.e. approximating of order h 2 ), is physical (i.e. continuous) and generates the SQRA-mean. Note, that most considerations below also work for other Stolarsky means. For simplicity we focus on the SQRA mean S −1,1 .
A.4.1. Approximation order for linear approximation. Let us first realize that a linear interpolation provides an approximation of order h 2 . Let V ∶ [0, h] → R be a general C 2 -potential. We define with V (0) = V 0 and V Then one easily checks that and hence, Clearly, we also have A.4.2. Definition of potentialsV which generate the SQRA mean. We consider a piecewise linear potential of the form .
A.4.3. Proof that the potential approximates an arbitrary potential of order h 2 . Since the linear potentials approximates a general potential of order h 2 it suffices to approximate the linear potentialṼ byV . We show that there are α, β satisfying α β = λ, such that V −Ṽ C([x i ,x i+1 ]) = O(h 2 ). The difference ofV andṼ is the largest at x = x 1 or x = x 2 . We estimate both differences. We havẽ Hence we have to estimate In the case of SQRA, one possible choice for α, β is given by α + β = 1.