Computation of Optimal Transport with Finite Volumes

We construct Two-Point Flux Approximation (TPFA) finite volume schemes to solve the quadratic optimal transport problem in its dynamic form, namely the problem originally introduced by Benamou and Brenier. We show numerically that these type of discretizations are prone to form instabilities in their more natural implementation, and we propose a variation based on nested meshes in order to overcome these issues. Despite the lack of strict convexity of the problem, we also derive quantitative estimates on the convergence of the method, at least for the discrete potential and the discrete cost. Finally, we introduce a strategy based on the barrier method to solve the discrete optimization problem.


Introduction
The theory of optimal transport provides a robust way to define an interpolation between probability measures which takes into account the geometry of the space where they are defined. This theory is built around the problem of finding the optimal way of reallocating one given density into another, minimizing a total cost of displacement in space. The fundamental nature of such a problem is responsible for the surprising links between optimal transport (and its generalizations) and physical models, most notably in fluid dynamics or via the theory of gradient flows, but also of its many applications in social sciences or biology (see, e.g., [28] and references therein). Nowadays, several numerical methods are available to solve optimal transport problems and in particular to compute the associated interpolations between measures. However, only few of these can actually be generalized to more complex settings which are relevant for numerical modelling, and moreover their numerical analysis is often neglected.
In this work we consider the numerical discretization of one of the most classical optimal transport problems in which the cost of displacement per unit mass is given by the square of the Euclidean distance. In particular, we consider finite volume discretizations of the socalled dynamical formulation of such a problem, following the approach originally proposed by Benamou and Brenier [3]. This formulation has inspired some of the first numerical methods for optimal transport, but it is still one of the most general, since it can be adapted easily to very complex settings. We will focus on three main aspects. Firstly, we will expose some numerical issues related to the stability of finite volumes methods that have been considered for this problem, and we propose a strategy based on nested meshes to overcome these. Secondly, we provide quantitative estimates on the convergence of the proposed methods to smooth solutions of the problem. Finally, we tackle the issue of the efficient computation of numerical solutions by applying and analyzing a classical interior point strategy adapted to our setting. if p > 0, 0 if p = 0, Q = 0, +∞ else.
Problem (1.1) selects the density interpolation between ρ in and ρ f which minimizes the total kinetic energy among all the non-negative solutions of the continuity equation (1.2). Note that the problem is written in the variables density-momentum rather than density-velocity, in order to obtain a convex formulation.
Adapting appropriately the definitions above, problem (1.1) provides a notion of interpolation between ρ in and ρ f when these latter are arbitrary probability measures. In this case the solution ρ is itself a curve of probability measures which is generally denoted as Wasserstein interpolation (or geodesic), see, e.g., [28]. Moreover the minimum of the cost (1.3) coincides with half of the Wasserstein-2 distance squared between ρ in and ρ f . More precisely, for a primal-dual solution (φ, ρ) to system (1.7), this is given by: 1.2. Discretization. In the original work of Benamou and Brenier [3], problem (1.1) was discretized on regular grids using centered finite differences. Later in [26] Papadakis, Peyré and Oudet introduced a finite difference discretization using staggered grids, which are better suited for the discretization of the continuity equation. Similar finite differences approaches have been used also in more recent works [9,23]. Note that the use of regular grids can be beneficial for the efficient solution of the scheme, but is not adapated to complex domains. Several finite elements approaches have been considered in order to construct schemes able to handle more general unstructured grids [4,5,22,24]. In particular in [24] the authors proposed a H(div)−conforming finite element discretization that preserves at the discrete level the conservative form of the problem, in the same spirit of [26]. Another approach to discretize problem (1.1) is to use finite volumes, which is a natural choice given the conservative form of the constraint (1.2) and allows one to use unstructured grids. In [11] Erbar, Rumpf, Schmitzer and Simon considered a discretization of problem (1.1) on graphs, which can be written under the formalism of Two-Point Flux Approximation (TPFA) finite volumes [19]. They proved the Gamma-convergence of the discrete problem towards a semi-discrete version of (1.1), discrete in space and continuous in time. In [19], Gladbach, Kopfer and Maas proved a convergence result for this semi-discretization towards the continuous problem. Combining these two results, it is possible to obtain a global convergence result, under conditions on the ratio between the temporal and spatial step sizes. Carrillo, Craig, Wang and Wei proved the Gamma-convergence without conditions on the step sizes but only for sufficiently regular and strictly positive solutions [9]. They used a centered finite difference discretization, which coincide with TPFA finite volumes on cartesian grids. Finally, in [21] Lavenant proved the weak convergence of discrete solutions (reconstructed as space-time measures) of a large class of time-space discretizations of (1.1), unconditionally with respect to time and space steps and without assuming any regularity, and applied this result to the discretization studied in [11]. The same result has been applied for example to the discretizations proposed in [22,24].
Our starting point in this work is the finite volume discretization presented in [21,11]. We observe numerically that for this discretization the density interpolation can exhibit oscillations which prevent strong convergence of the numerical solution, even when the exact interpolation is smooth. The same phenomenon has been observed by Facca and coauthors in [15,16] when dealing with finite elements discretizations for the L 1 optimal transport problem, which is closely related to (1.1). Our strategy to overcome this issue is inspired by these last works and consists in enriching the space of discrete potentials. We will show numerically that such a modification attenuates the oscillations and favors a stronger convergence.
Note that with this modification, the convergence result in [21] cannot be applied straightforwardly. However, we will derive quantitative estimates for the convergence of the discrete Wasserstein distance and the discrete potential, which hold both in the enriched and original non-enriched case, in the case of smooth and strictly positive solutions. Even if such results are only partial as they do not apply to the density, they are still surprising given that the problem is not strictly convex. Moreover, we are not aware of similar estimates for the discretizations mentioned above. With these results at hand, it is possible to deduce again the weak convergence of the discrete density and momentum.

Numerical solution.
A typical approach for solving discrete versions of the dynamical formulation (1.1) is to apply first order primal dual methods. This goes back to the original paper of Benamou and Brenier [3], who proposed to use an Alternating Direction Method of Multipliers (ADMM) approach applied to the augmented Lagrangian of the discrete saddle point problem. Later [26] considered different proximal splitting methods and recasted the previous algorithm into the same framework. Nowadays, these approaches are frequently used [4,5,22,9,24]. In fact, they are robust and can take care automatically of the positivity of the density thanks to the definition of the function B. Nevertheless, they are not easy to apply to arbitrary discretizations of the problem (especially on unstructured grids). More importantly, they are efficient only as far as high accuracy is not mandatory and uniform grids are used.
In the present work, we apply the so called barrier method, an instance of the wider class of interior point methods [7,20,27,17]. The problem is perturbed by adding to the functional a strictly convex barrier function which repulses the density away from zero. In this way it reduces to an equality-constrained minimization problem, where the minimizer is automatically greater than zero and the objective functional is locally smooth around it, and which can be effectively solved using a Newton scheme. The perturbation introduced by the barrier function can be tuned by multiplying it by a positive coefficient µ and the original solution is recovered via a continuation method for µ going to zero. The final algorithm is robust and can be easily generalized to similar problems (for example, we have already applied it successfully in [25] for the solution of Wasserstein gradient flows).
A similar strategy has been applied by Achdou and coauthors [1] (although in the context of mean field games), perturbing the Lagrangian associated to the problem with the Dirichlet energies of the density and the potential. Such a perturbation does not ensure the positivity of the solution and this forces the use of a monotone discretization. Using a barrier function allows us to consider more general discretizations, with higher accuracy in space. The idea of using a regularization term to deal directly with the positivity constraint has been also explored in [23], where the authors used the Fischer information as penalization term, but without considering a continuation method. In particular, the problem is solved for a fixed (small) value of the perturbation's parameter, leading to diffusive effects.
1.4. Structure of the paper. In section 2 we present the finite volume discretization of (1.1): we set the notation for the partition of the domain Ω, introduce the discrete operators and then define the discrete optimal transport problem. In section 3 we derive quantitative estimates on the convergence of the discrete distance and the discrete potential towards their continuous counterparts, under special hypotheses. In section 4 we present the barrier method, the strategy we employ to solve the discrete optimization problem as a sequence of simpler perturbed problems. We conclude with the presentation of few numerical results in order to assess the reliability of the scheme and verify the convergence results, in section 5.

Finite Volume discretization
2.1. The discretization of Ω. We assume the domain Ω ⊂ R d to be polygonal if d = 2 or polyhedral if d = 3, and we consider an admissible discretization for TPFA finite volumes [12, Definition 9.1]. Cartesian grids, Delaunay triangulations or Voronoï tessellations are typical examples of admissible meshes in this sense. We denote such a discretization as T , Σ, (x K ) K∈T , namely the ensemble of the set of polyhedral control volumes K, the set of faces σ and the set of cell centers x K . The set Σ is composed of boundary faces Σ ext = {σ ⊂ ∂Ω} and internal faces σ ∈ Σ = Σ \ Σ ext . We denote by Σ K = Σ K ∩ Σ the internal Figure 1. Exemplification of the notation of a triangular cell (left) and its subdivision (right).
faces belonging to ∂K. The cell-centers (x K ) K∈T ⊂ Ω are such that, if K, L ∈ T share a face σ = K|L, then the vector x L − x K is orthogonal to σ and has the same orientation as the normal n K,σ to σ outward with respect to K.
We denote the Lebesgue measure of K ∈ T by m K . For each internal face σ = K|L ∈ Σ, we denote m σ its (d − 1)−dimensional Lebesgue measure and we refer to the diamond cell as the polyhedron whose edges join x K and x L to the vertices of σ. Denoting by d σ = |x K − x L |, the measure of the diamond cell is then equal to m σ d σ /d. We denote by d K,σ the Euclidean distance between the cell center x K and the midpoint of the edge σ ∈ Σ K . In figure 1 the notation is exemplified for a triangular element.
We will need to distinguish between two different admissible discretizations of Ω, where one is obtained as a subdivision of the other. We denote by T , Σ , (x K ) K ∈T the coarse mesh and by T , Σ, (x K ) K∈T the fine one, and we require that ∀K ∈ T , ∃ K ∈ T such that K ⊆ K .
In practice we will consider two specific instances of this construction. The first is the trivial case where the two meshes coincide. The second holds at least in two dimensions and can be defined as follows. First, we take as coarse mesh a Delaunay triangulation, with cell centers x K the circumcenters of each cell K . We further require that all the triangles are acute, so that all the cell centers x K lie in the interior of the corresponding cell K . Then, we define the fine mesh by dividing each triangular cell K into three quadrilaterals by joining x K to the three midpoints of the edges σ ∈ Σ K . We take again as cell centers x K of the fine mesh the circumcenters of each cell K. This construction is illustrated in figure 1. Note that the partition obtained in this way is indeed admissible.

2.2.
Discrete spaces and operators. We introduce two discrete spaces defined on the two meshes, P T = R T and P T = R T , each one endowed with its own weighted scalar product, and similarly for ·, · T . Note that P T ⊆ P T , and we denote by I the canonical injection operator, which is given explicitly by In the case where the two discretizations of Ω coincide, I is just the identity operator. We will denote by I * the adjoint of I, i.e. I * ·, · T = ·, I· T . We further introduce two discrete spaces defined on the finer mesh: the space P Σ = R Σ , defined on the diamond cells, endowed with the scalar product and the space of discrete conservative fluxes, endowed with the scalar product We denote by · T , · T , · Σ and · F T the norms associated with the inner products defined above. We denote F σ = |F K,σ | = |F L,σ | and we will use the convention |F | = (F σ ) σ∈Σ ∈ P Σ and (F ) 2 = (F 2 σ ) σ∈Σ ∈ P Σ , for F ∈ F T . Moreover, we define the element-wise multiplication by . In particular, given F , G ∈ F T and u ∈ P Σ , we define F G, u F ∈ F T by We now introduce the discrete differential operators. The discrete divergence div T : The discrete gradient ∇ Σ : Moreover, as for the discrete conservative fluxes, we define ∇ σ a := |∇ K,σ a|. We also need to introduce a reconstruction operator from cells to diamond cells R Σ : P T → P Σ , which will be required to construct the discrete energy. We require that the operator R Σ be a concave function (component-wise), positively 1-homogeneous and positivity preserving. In practice, we will consider two weighted means, L Σ and H Σ , which correspond respectively to a linear and a harmonic mean, and are defined as follows: for any a ∈ P T . We denote by dR Σ [a] : P T → P Σ the differential of R Σ with respect to a, evaluated at a given a ∈ P T . Clearly, if R Σ = L Σ , we simply have dR Σ [a] = L Σ . Moreover, we denote by (dR Σ [a]) * the adjoint of dR Σ [a], with respect to the two different scalar products. For the two reconstructions we consider, this operator is given by either L * Σ or (dH Σ [a]) * , which are defined by for any u ∈ P Σ . Finally, for any fixed a ∈ P T , we define the reconstruction operator on the coarse grid R T [a] : P Σ → P T by Remark 2.1. The space of discrete conservative fluxes and the reconstruction operator introduced above take only into account the interior edges. This is sufficient for our purposes due to the zero flux boundary conditions. In particular, since the flux should be zero at the boundary the reconstruction of the density on the exterior edges is not needed for the construction of the scheme.

2.3.
Discrete problem. Consider a discretization of the time interval [0, 1] in N +1 subintervals of constant length τ = 1 N +1 , and let t k := kτ for all k ∈ {0, .., N + 1}. We denote the time evolution of a discrete density by ρ : Since R Σ is assumed to be concave, the function (2.5) is convex and lower semi-continuous. Note that on each subinterval [τ (k − 1), τ k], the time integral of the kinetic energy is discretized using the midpoint rule. This implies that a given F k σ needs to vanish only if the reconstruction of (ρ k + ρ k−1 )/2 on the same edge vanishes. Approximating the integral with a left/right-endpoint approximation would be more restrictive in this sense (see [21] for more details on this choice of time discretization). At each time step, the kinetic energy is discretized on the diamond cells of the finer grid. The measure of each diamond cell is taken d times. This is done in order to compensate the unidirectional discretization of the vector field F and therefore obtain a consistent discretization (see, e.g., lemma 3.1). Indeed, each F σ is meant as an approximation of |F · n σ | and encodes then the information of F only along the direction n σ . This choice is also linked to the definition of inflated gradient (see [10,14] for more details on this construction).
Remark 2.2. Note that (2.5) is not simply the discretization of (1.3) on the diamond cells, in which case the functional would take the value +∞ whenever the time-space reconstruction of the density is negative on some diamond cell. The functional in (2.5) takes the value +∞ whenever the density is negative on some cell K ∈ T , which is a stronger condition.
Given two discrete densities ρ in , ρ f ∈ P + T , with the same total discrete mass K ∈T ρ in K m K = K ∈T ρ f K m K , we consider the following discrete version of problem (1.1): is the convex subset whose elements (ρ, F ) satisfy both the discrete continuity equation and the initial and final conditions The continuity equation is discretized in time using the midpoint rule (F is indeed staggered in time with respect to ρ). Moreover, given the definition of the discrete space of conservative fluxes and the operator div T , (2.7) is to be understood with zero flux boundary conditions in space. Hence equations (2.7)-(2.8) imply that the total discrete mass is preserved. In the following, we explicitly enforce the constraint (2.8), i.e. we identify ρ 0 and ρ N +1 with ρ in and ρ f , respectively.
We derive now the first order optimality conditions for problem (2.6), which are necessary and sufficient conditions for a solution. We consider the minimization on ρ to be taken only among non-negative densities. The Lagrangian associated with the constrained optimization problem (2.6) is given by where the potential φ ∈ [P T ] N +1 is the Lagrange multiplier for the continuity equation constraint. The stationarity condition of L N,T with respect to F gives so that the Lagrangian reduces to A stationary point of (2.11) must then satisfy the conditions: where k ∈ {1, .., N + 1} for the discrete continuity equation, k ∈ {1, .., N } for the discrete Hamilton-Jacobi equation, and where by equation If R Σ = L Σ , then this operator does not depend on ρ and in particular we will drop such dependency in the notation by setting R T = I * • L * Σ . We emphasize that the discrete no flux boundary conditions are automatically enforced by the definition of the discrete fluxes (see also remark 2.1).
The inequality in the second condition derives from the fact that the minimization in ρ is taken over non-negative values, and the equality holds where ρ k does not vanish. Hence, we can write the full system of optimality conditions using a slack variable λ ∈ [P + T ] N : where k ∈ {1, .., N + 1} for the discrete continuity equation and k ∈ {1, .., N } for the other conditions. Note that system (2.13) is a discrete version of the system of optimality conditions (1.7) holding at the continuous level. In particular, the continuity equation is discretized on the fine grid whereas the Hamilton-Jacobi equation on the coarse one. Using a discretization that preserves the monotonocity of the discrete Hamilton-Jacobi operator it is possible to show that the value zero for λ is optimal (see [8] for a problem closely related to 2.6), i.e. the discrete Hamilton-Jacobi equation can be saturated. However this is not the case for the discretizations we consider since they do not preserve the monotonicity.
Remark 2.3. If the two discretizations of Ω coincide, I becomes the identity and we recover the finite volumes discretization already considered in [21], which is a fully discrete version of the continuous-time discrete optimal transport problem studied in [19].
is not difficult to obtain, as the minimization in ρ is taken over a compact set and one can show that |F | is uniformly bounded for any minimizing sequence (by the same arguments as in the proof of theorem (4.1) below). The uniqueness of the solution, which is guaranteed for the continuous problem (1.1) as soon as the initial (or final) density is absolutely continuous with respect to the Lebesgue measure, is not evident. System (2.13) is not guaranteed in general to have a unique solution. In particular, where the density vanishes, the potential and the positivity multiplier are clearly non unique. The potential is however uniquely defined, up to a global constant, if the density solution is unique and everywhere strictly positive.
Given a solution (ρ, φ) to system (2.13), we can construct the associated momentum F by equation (2.10) so that (ρ, F ) is a minimizer of problem (2.6). Then, we define the discrete Wasserstein distance W N,T (ρ in , ρ f ) by (2.14) More precisely, replacing (2.10) in (2.14), the discrete Wasserstein distance can be computed using the following expression: In the case of the linear reconstruction, i.e. taking R Σ = L Σ , one can also easily express the dual to problem (2.6) in terms of the potential φ, as in the continuous case, i.e. problem (1.5). In fact, in this case, replacing the second condition of system (2.13) into the Lagrangian (2.11) we obtain the following problem:

Convergence to the continuous problem
In this section, we provide quantitative estimates for the convergence of the action and the discrete potential φ towards their continuous counterparts, in the case of solutions with smooth strictly positive densities. Note that we restrict ourselves to the case of the linear reconstruction operator, i.e. we take R Σ = L Σ . As a consequence of remark 2.3, these results are also valid for the finite volume discretization considered in [21].
First of all, we introduce some additional notation. Let F , G ∈ [F T ] N +1 and ρ ∈ [P + T ] N +2 . We define the following weighted inner products: We will denote by · ρ and · ρ k the (semi-)norms associated with these (semi-)inner products.
We will consider two sampling operators: one for the density Π T : L 1 (Ω) → P T , which performs an average on each cell, and one for the potential Π T : C 0 (Ω) → P T , which evaluates the function at the cell center. More precisely, given f ∈ L 1 (Ω) and g ∈ C 0 (Ω), we define for all K ∈ T and K ∈ T . For any time dependent functions ρ ∈ C 0 ([0, 1], L 1 (Ω)) and We will denote by h the maximum cell diameter of the fine mesh, i.e. h := max K∈T diam(K). We will assume two regularity conditions on the fine mesh. Firstly, there exists a constant ζ, which does not depend on h, such that Secondly, there exists a constant η h > 0 only depending on h, with η h → 0 for h → 0, such that The latter condition is essentially a specific instance of the asymptotic isotropy condition in [19] (see Definition 1.3). When the cell centers x K are chosen as the circumcenters of the associated cell (as in the particular examples of meshes described in section 2.1), a stronger property holds, which has been referred to as center of mass condition [19] or superadmissibility [13], and which reads as follows: However, for generality of the discussion, in the following we will only require (3.4) and therefore we will keep the dependence on η h explicit.
The following lemma collects some consistency properties of the projection Π T . In particular, point (3) below shows that the asymptotic isotropy condition implies the consistency of the quadratic term in the discrete Wasserstein distance (2.15), and justifies our discretization of the functional B N,T .
Proof. The first two points follow easily from the definition of Π T and the regularity condition (3.3). For (3), observe that, by definition of the linear reconstruction operator, Then, using the definition of the operator Π T and the regularity condition (3.2), Replacing this into (3.6), neglecting higher order terms, and using the asymptotic isotropy assumption (3.4), we obtain the desired bound. Propostion 3.3 below is an adaptation to our setting of standard approximation results for elliptic problems. It quantifies the consistency of the projection Π T in terms of the associated potential. As in [19], we will use it to construct an admissible competitor for the discrete problem. Before proving the result, we state the following classical finite-volume version of the Poincaré inequality. Lemma 3.2 (Discrete mean Poincaré inequality, Lemma 10.2 in [12]). There exists a constant C > 0, only depending on Ω, such that for all admissible meshes T , and for all ψ ∈ P T , the following inequality holds: Proposition 3.3. Suppose that ρ, ∂ t ρ ∈ L ∞ ([0, 1], C 0,1 (Ω)), with ρ ≥ ε > 0, and let φ ∈ L ∞ ([0, 1], C 1,1 (Ω)) be a solution of (3.7) − div(ρ∇φ) = ∂ t ρ , ∇φ · n ∂Ω = 0 on ∂Ω .
Let ρ = Π T ρ and let φ be a solution of Then, there exists a constant C > 0 depending only on φ, ρ, ε, ζ and Ω, such that with η h defined as in (3.4).
Proof. First, we integrate equation (3.7) over the time-space cell [t k−1 , t k ] × K and divide it by τ m K . This yields We define e ∈ [F T ] N +1 and r ∈ [P T ] N +1 by and denoting by K the cell in T such that K ⊂ K ,

Multiplying both sides by
Using the discrete Poincaré inequality of lemma 3.2 and the lower bound on ρ, this implies where C > 0 is a constant only depending on the lower bound ε and the domain. By the regularity of φ and ρ, and the estimate (3.2), we then obtain where now C depends also on ρ and φ.
In order to get an estimate on the energy, we observe that φ k minimizes the functional which implies the inequality Using again the discrete Poincaré inequality of lemma 3.2 and the lower bound on ρ, as well as its regularity, we get Hence, using (3.10), we obtain Finally, using Jensen's inequality and then lemma 3.1, we find which concludes the proof.
We are now ready to state the two main convergence results of this section, which provide quantitative estimates for the convergence rates of the discrete action and the discrete potential. (1) if φ ∈ C 1,1 ([0, 1] × Ω), there exists a constant C > 0 only dependent on φ and ζ such that 1], C 1,1 (Ω)) and ρ, ∂ t ρ ∈ L ∞ ([0, 1], C 0,1 (Ω)), with ρ ≥ ε > 0, there exists a constant C > 0 depending only on ρ, φ, ε, ζ and Ω such that Proof. For the first point, we first observe that by lemma 3.1 and the regularity of φ, Π T φ verifies Then φ is admissible for the dual problem (2.16), hence Replacing back the definition of φ and using the fact that |∇ σ φ 1 | and |∇ σ φ N +1 | are uniformly bounded by a constant depending only on φ, we get For the second point it suffices to observe that the couple (ρ, φ) satisfies (3.7). Then, defining ρ and φ as in the statement of proposition 3.3, we can construct an admissible competitor (ρ, F ) for the discrete optimal transport problem by defining the momentum F ∈ [F T ] N +1 as in equation (2.10). Since, by definition, we obtain the desired estimate using (3.8).
The issue of convergence of the discrete solution (ρ, F ) towards its continuous counterpart has been treated in detail in [21] for a general class of discretizations. These include the finite volume schemes considered here, in the case where the two domain decompositions coincide so that I is the identity operator (see remark 2.3). For this case, one has that the discrete denstiy ρ can be lifted to a measure on [0, 1] × Ω converging weakly to the exact optimal transport interpolation with mesh refinement.
It is not difficult to show that the second point of theorem 3.4 implies a similar convergence result, for smooth positive solutions, also when the two discretizations of the domain do not coincide (e.g., this is a direct consequence of theorem 2.18 in [21]). Besides this weak convergence result, theorem 3.4 also implies the following quantitative estimate for the convergence of the potential, although in a norm dependent on the discrete solution itself.
Proof. Consider the quantity Expanding the square in (3.11) we obtain whereF is given by equation (2.10). The second term in (3.12) can be written as follows The third term in (3.12) can be written as follows (3.14) where Adding and subtracting W 2 2 (ρ in , ρ f )/2 = Ω φ(1, ·)ρ f − Ω φ(0, ·)ρ in from the right-hand side of (3.12), substituting (3.13) and (3.14), and rearranging terms we obtain where since ρ 0 = Π T ρ in and ρ N +1 = Π T ρ f . Finally, we estimate I 1 and I 3 using lemma 3.1, I 2 using the regularity of φ, and the remaining term using the second point in theorem 3.4.
Remark 3.6. It is easy to construct solutions to the optimality conditions (1.7), and therefore to problem (1.1), satisfying the assumptions of theorem 3.4 or 3.5. In fact, given any smooth compactly-supported initial potential φ 0 : Ω → R, there exists δ > 0 such that the map is a smooth solution to the Hamilton-Jacobi equation. Moreover, given a strictly postive and smooth initial density ρ 0 , the density ρ(t, ·) = (ρ 0 /det(∇T t ))•T −1 t solves the continuity equation with velocity ∇φ(t, ·), and it is also smooth and strictly positive for t ∈ [0, δ]. Then, the curve t → (ρ(tδ, ·), δφ(tδ, ·)) solves the optimality conditions (1.7) on the time interval [0, 1]. On the other hand, even in the case where ρ 0 and ρ 1 are smooth and strictly positive the interpolation may not even be stricly positive as shown in [29].
Remark 3.7. The quantity E N,T (ρ,φ|φ) defined in equation (3.11) is the discrete H 1 seminorm of the error weighted by the discrete solutionρ. Note that this can also be seen as a discretization of the modulated energy (or relative entropy) of the kinetic energy, interpreted as a convex function of (ρ, F ). In section 5 we will use a similar quantity in order to evaluate numerically the convergence rate of the scheme.

Primal-dual barrier method
We introduce now the primal-dual barrier method, the discrete optimization technique we use to deal with the uniqueness, smoothness and positivity issues and effectively solve problem (2.6). The method consists in perturbing the discrete problem with a barrier function which forces the density to be positive. Here we show that the solutions of such perturbed problem converge to the ones of the original problem, when the perturbation vanishes, therefore justifying the use of a continuation method. Finally, we will detail the implementation of the algorithm commenting on the choice of the parameters involved.
The most classical barrier function used when dealing with positivity constraints is the logarithmic barrier, − log ρ. In order to write the perturbed problem, we first define precisely the barrier, so that it is convex and lower semi-continuous. We define the barrier function as J N,T (ρ) = N k=1 τ K∈T J(ρ k K )m K and the perturbed version of problem (2.6) is therefore: Thanks to the strict convexity of the function J N,T on [P + T \ {0}] N , the solution (ρ µ , F µ ) is now unique. Proceding as in section 2.3, ρ µ can be characterized as solution to the system of optimality conditions where k ∈ {1, .., N + 1} for the continuity equation and k ∈ {1, .., N } for the other conditions, and where (µ) K = µ. The variable s ∈ [P T ] N , (s k ) K = µ ρ k K , has been introduced in order to decouple the optimization in ρ and s, and it highlights the connection with system (2.13). In particular, system (4.2) can be seen as a perturbation of (2.13), where ρ k K and s k K = −λ k K are automatically forced to be positive and the orthogonality is relaxed. In this way, the solution (φ µ , ρ µ , s µ ) is now unique, up to an additive constant for the potential, and the problem is smooth.
As it is classical in interior point methods (see, e.g., [7]), if we regard (ρ µ , F µ ) as an approximate solution to problem (2.6), we can derive an explicit estimate on how far it is from optimality. Given a solution (ρ, F ) of the original problem, and definingλ ∈ [P T ] N by where we used the fact that (ρ µ , F µ ) is optimal for L N,T (φ µ , ρ, F ) + N k=1 τ λ k , ρ k T , which can be easily verified by comparing the associated optimality conditions with (4.2). We have therefore As a consequence of (4.4), the smaller the parameter µ, the closer the perturbed solution is to the original one.
Proof. Consider a sequence (µ n ) n ⊂ R + converging to zero and the corresponding sequence (ρ µn , F µn ) of solutions to problem (4.1). We first derive a bound on (ρ µn , F µn ), independent of µ. The bound on ρ µn derives easily from the conservation of mass. To obtain a bound for the momentum F µn , for any b ∈ [F T ] N +1 with |b k σ | ≤ 1 for all σ ∈ Σ, k ∈ {1, .., N + 1}, we observe that there exists a constant C > 0 independent of µ such that where the weighted norm || · || ρ µ is defined via (3.1). Note that the first inequality derives from a simple rescaling argument with the term ρ s ∈ P N +1 Σ , given by and applying Cauchy-Schwarz. The second one is obtained using the inequality (4.4). Taking the sup with respect to b in (4.5) we obtain the bound on F µn . The sequence (ρ µn , F µn ) is bounded hence we can extract a converging subsequence (still labeled with µ n for simplicity) (ρ µn , F µn ) → (ρ * , F * ). Consider (ρ, F ) minimizer of the unperturbed problem (2.6). Using inequality (4.4) and taking the lim inf for n → +∞, we obtain B N,T (ρ * , F * ) = B N,T (ρ, F ), hence (ρ * , F * ) is a minimizer for problem (2.6).

Remark 4.2.
If the solution (ρ, F ) of the discrete problem (2.6) is unique, then the entire sequence (ρ µn , F µn ) converges to it. In case it is not unique, for any solution (ρ, F ) and therefore (ρ µn , F µn ) converges up to subsequence to a solution (ρ * , F * ) with minimal J N,T . In case the solution ρ * is strictly positive everywhere, the whole sequence (ρ µn , F µn ) converges again.
The strict positivity derives automatically from the definition of the barrier function, which attains the value +∞ in zero. As a consequence, for every value of µ > 0 the objective function B N,T (ρ, F ) + µJ N,T (ρ) is smooth in a neighborhood of the solution (ρ µ , F µ ), ensuring a good behavior of the Newton scheme for the solution of the system of equations (4.2). It is possible to derive a quantitative bound for the positivity of ρ µ as follows.
Proposition 4.3. There exists a constant C > 0 independent of µ such that the density ρ µ solution to problem (4.1) satisfies the following bound: Proof. Consider the solution (ρ µ , F µ ) to (4.1). We define the constant density c ∈ [P + T ] N , c k K = ( K∈T m K ) −1 . It can be easily checked that c is solution to From now on, with a slight abuse of notation, we consider c to be complemented with the boundary conditions ρ in , ρ f . Thanks to the surjectivity of the divergence operator (to the space of discrete functions in [P T ] N +1 with zero mean), we can find the momentum F c , with minimal || · || c norm (defined via equation (3.1)), such that (c, F c ) ∈ C N,T . Taking the admissible competitor (ρ, The right hand side of (4.7) is bounded: indeed, by convexity of B N,T , it holds The left hand side of (4.7) can be bounded from below thanks to the convexity of J N,T , by the following quantity Hence, we obtain Simplifying in (4.9) and taking the limit for → 0, we obtain which implies the result.
By theorem 4.1 the solution of problem (4.1) provides an approximation to a solution (φ, ρ) to problem (2.13), although the smaller the parameter the more difficult it is to solve the problem using a Newton method. The idea is then to use a continuation method, that is construct a sequence of solutions to problem (4.2) for a sequence of coefficients µ decreasing to zero, using each time the solution at the previous step as starting point for the Newton scheme. The resulting algorithm in shown in Algorithm 1. We denote by θ the rate of decay for µ; by ε 0 and ε µ the tolerances for the solution to (2.13) and (4.2), respectively; and by δ 0 and δ µ the error in the convergence towards solutions of the original and perturbed problem. The parameter δ µ can be taken to be a norm of the residual of the system of equations (4.2) or of the Newton step d. Concerning δ 0 , it is either possible to choose a norm of the residual of the system of equations (2.13) or δ 0 = µ N N +1 |Ω|, by virtue of (4.4), whether the proximity to the minimizer or to the minimum is preferred.
A linesearch technique is typically employed in order to ensure global convergence of the Newton scheme. However, in many cases it leads to a non negligible cost by forcing the Newton scheme to do several steps before reaching convergence. Instead of modifying the step size α, we adaptively control θ in order to force the convergence. The Newton scheme is repeated with an increased θ (i.e. with an increased µ) if it is not able to converge in n max steps. The step size α is chosen just to ensure that ρ and s do not become negative. Again, the Newton scheme is repeated if α needs to be smaller than α min . In particular, taking α min = 1 one only allows full Newton steps.
There exist of course several optimization solvers that could tackle the solution of problem (2.6), most of which are usually based on interior point strategies, especially for large scales. Nevertheless, the specificity of the problem at hand, its non-linearity of course but more importantly its lack of smoothness, led us to develop our own solver, in order to better handle it. Moreover, the solution of the sequence of linear systems requires an ad-hoc strategy, as mentioned in sections 5-6. Finally, we remark that in the particular case of the linear reconstruction, the corresponding dual problem in (2.16) can be cast in the form of a secondorder cone program, which can be solved again using an interior point method in polynomial time. This does not apply to the case of the harmonic reconstruction (or more general reconstructions) for which the dual problem has a more complex structure.

Numerical results
In this section we assess the performance of the scheme using several two-dimensional numerical tests. In particular, we demonstrate the numerical implications of enriching the space of discrete potential, both from a qualitative and quantitative point of view. As already noted in remark 2.3, considering the two subdivisions of the domain to be the same and taking I to be the identity operator, we recover the discretization presented in [21]. We will refer to this case as the non-enriched scheme. Needless to say, the greater is the richness of the space of discrete potentials the higher is the computational complexity.
For the construction of the enriched scheme we use the nested meshes described in section 2.1. In particular, the coarse mesh is given by a regular triangulation of the domain with only acute angles. Here, we will use the first family of grids provided in [18], which discretize the domain Ω = [0, 1] 2 .
The code is implemented in MATLAB and is available online 1 . In particular, we exploit the built-in MATLAB direct solver to solve the sequence of linear systems generated by Algorithm 1. For µ → 0 the Jacobian matrix becomes ill-conditioned and the computation time, along with the memory consumption, rapidly increases for this solver. Using an iterative method could be extremely beneficial in this sense. However, the design of effective preconditioners is a delicate issue and should take into account the structure of the problem at hand (see, e.g., the general survey [6]). Therefore, we do not explore the use of such techniques in this article. We calibrated Algorithm 1 with the following parameters: θ = 0.2, α min = 0.1, 0 = 10 −6 ( 0 = 10 −8 for the convergence tests), µ 0 = 1, φ 0 = 0, ρ 0 = c (c defined as in 4.3) and s 0 satisfying ρ 0 s 0 = µ 0 . In all the simulations performed in this section, but also more generally, the algorithm proved to be extremely robust under this configuration. The Newton scheme rarely reaches a breakdown and, in case this happens, the adaptive strategy on the parameter θ overcomes the issue. Notice only that for complex simulations the value µ 0 may be increased to ease the start of the Newton scheme. Finally, we stress that all our results are presented in their piecewise-constant form on the grid, without any kind of interpolation.

5.1.
Oscillations. In this section we show that the discrete density obtained by using the non-enriched scheme can be very oscillatory. We observed numerically that the oscillations are more severe in cases where there is high compression of mass, i.e. when the corresponding continuous velocity field is not divergence free, and also more persistent with refinement (this is also confirmed by the convergence tests shown below in section 5.2). On the other hand, this type of instability can be prevented using the enriched scheme, which eliminates the oscillations almost entirely.
In order to illustrate this phenomenon, we consider the interpolation between the two densities where x = (x, y) and x 0 = ( 1 2 , 1 2 ). For h = 0.0625 and #T = 896, and for a number N + 1 = 8 of time steps, we compute the approximate Wasserstein interpolation between ρ in = ρ in (x K ) K ∈T and ρ f = ρ f (x K ) K ∈T , by solving problem (2.6), in four different ways: with the enriched and the non-enriched schemes, both with linear and harmonic reconstruction. The results are shown in figure 2. The non-enriched scheme with linear reconstruction exhibits severe oscillations which disappear using the enriched one. Oscillations are evident also using the harmonic reconstruction. The enriched scheme with harmonic reconstruction provides the smoothest solution.
It is worth mentioning that the non-enriched scheme does not exhibit oscillations for rectangular cartesian grids. Indeed, oscillations do not appear either in other works based on finite differences [9,26,23], which coincide with finite volumes on such simple grids.

Convergence test.
We now quantify numerically the convergence rate for the potential, the Wasserstein distance and the density, by considering specific smooth solutions (φ, ρ) to (1.1) with compact support, and with smooth initial and final densities ρ in and ρ f . Note, however, that the convergence results of section 3 are less general, since they require strictly positive densities, and only apply to the linear reconstruction. We compute the solutions to problem (2.6), with ρ in = (ρ in (x K )) K ∈T , ρ f = (ρ f (x K )) K ∈T , on a sequence of admissible meshes T , Σ , (x K ) K ∈T , and with an increasing number of time steps. We consider four type of errors: the error on the distance, the L 1 error on the density curve, the weighted L 2 error on the potential and on its gradient on the whole trajectory. We define a discrete potential φ ∈ [P T ] N +1 by sampling the continuous solution, i.e. φ k K = φ(t k−1 + τ 2 , x K ), for k ∈ {1, .., N + 1}, and similarly for the density we introduce ρ ∈ [P T ] N +1 , with ρ k K = ρ(t k−1 + τ 2 , x K ), for k ∈ {1, .., N + 1}. Given the discrete solution (φ,ρ), the four errors are then computed as follows: where the weighted (semi-)norm || · ||ρ is defined via (3.1). To conclude, we consider the transport problem between a cross distributed density and its rotation by 45 degrees. We compute the discrete solution with the enriched scheme, using the harmonic reconstruction, with h = 0.0156, #T = 14336 and N + 1 = 32 time steps. The approximate density interpolation is displayed in figure 3: as expected, each branch of the cross splits symmetrically in two parts which move towards the two opposite branches of the rotated cross.

Perspectives
In this article we considered TPFA discretizations of the dynamical formulation of the quadratic optimal transport problem. In particular, we proposed a method based on nested meshes to deal with numerical instabilities that occur when using this type of techniques. We also proved quantitative convergence estimates in the case of smooth solutions and proposed the use of interior point techniques for the efficient numerical solutions of the scheme. Several interesting questions remain open on all the three aspects of the problem we considered:  Figure 3. Wasserstein interpolation between a cross distributed density and its rotation by 45 degrees. Time increases from left to right, from top to bottom.
the solution of the linearized system. However, this is possible only once appropriate preconditioners are available. The challenging nature of the problem, which is mostly due to the interplay of the time and space discretization, implies that the design of such preconditioners requires a dedicated study and must be adapted to the discrete problem itself.