Optimal Partial Transport Problem with Lagrangian costs

We introduce a dual dynamical formulation for the optimal partial transport problem with Lagrangian costs cL(x, y) := inf ξ∈Lip([0,1];RN )  1 ∫ 0 L(ξ(t), ξ̇(t))dt : ξ(0) = x, ξ(1) = y  basing on a constrained Hamilton–Jacobi equation. Optimality condition is given that takes the form of a system of PDEs in some way similar to constrained Mean Field Games. The equivalent formulations are then used to give numerical approximations to the optimal partial transport problem via augmented Lagrangian methods. One of advantages is that the approach requires only values of L and does not need to evaluate cL(x, y), for each pair of endpoints x and y, which comes from a variational problem. This method also provides at the same time optimal active submeasures and the associated optimal transportation.

The optimal partial transport problem (1.1) was first studied theoretically by Caffarelli and McCann [11] (see also Figalli [23]) with results on the existence, uniqueness and regularity of active submeasures for the case where c(x, y) = |x − y| 2 . We also refer to the papers [17,19,30] for the regularity. On the other hand, concerning numerical approximations, the authors in [2] studied numerically the case c(x, y) = |x − y| via an approximated PDE and Raviart-Thomas finite elements. Recently, Benamou et al. [5] introduced a general numerical framework to approximate solutions of linear programs related to optimal transport such as barycenters in Wasserstein space, multi-marginal optimal transport, optimal partial transport and optimal transport with capacity constraints. Their idea is based on an entropic regularization of the initial linear programs and Bregman-Dykstra iterations. This is a static approach to optimal transport-type problems. In this direction, we also refer to the very recent paper of Chizat et al. [18]. These approaches need to use (approximated) values of the ground cost function c L (x, y) in the definition (1.2) below.
In this paper, we are interested in the optimal partial transport problem with Lagrangian costs c(x, y) := c L (x, y), where c L (x, y) := inf with a connected Lipschitz domain Ω ⊂ R N . The case where L(x, .) is convex, 1-homogeneous with a linear growth was treated in the recent paper [29]. In this case, c L turns out to be a distance which allows to reduce the dimension of the Kantorovich-type dual problem (by ignoring the time variable). The present article concerns the situation where L(x, .) has a superlinear growth. Our main aim is to develop rigorously the variational approach to provide equivalent dynamical formulations and use them to supply numerical approximations via augmented Lagrangian methods. For the uniqueness, using basically the idea of [38], we establish the uniqueness of active submeasures in the case where the source and target are absolutely continuous. To introduce and comment our main results, let us take a while to focus on the typical situation where the cost is given by with q > 1 and k being a (positive) continuous function. Recall here that if k ≡ 1 and q = 2, the cost function c L corresponds to the quadratic case: c L (x, y) = |y − x| 2 for any x, y ∈ R N . (1.4) This is more or less the most studied case in the literature (cf. [11,17,19,23,30]). However, let us mention here that our approach is variational and goes after our program of studying the optimal partial transportation from the theoretical and numerical points of view (cf. [28,29]). To begin with, it is not difficult to see that, by using standard results concerning the Eulerian formulation of the optimal mass transport problem in the balanced case, an Eulerian formulation associated with the problem (1.1)-(1.3) can be given by minimizing in a weak sense with ρ(0) ≤ µ and ρ(1) ≤ ν and ρ(0)(R N ) = m. However, to solve numerically (1.1) via augmented Lagrangian methods, we will prove rigorously that in fact the minimization problem of the type (1.5)-(1.6) is the Fenchel-Rockafellar dual of a new dual problem to (1.1). Indeed, using the general duality result on the optimal partial transportation of [28], we prove that a dynamical formulation of the dual problem of (1.1) consists in maximizing among the couples (λ, u) ∈ [0, ∞) × Lip(Q), where u satisfies the following constrained Hamilton-Jacobi equation (1.8) Here q := q q−1 denotes the usual conjugate of q and α = q q . Then, we will prove that the minimization problem (1.5) remains to be the Fenchel-Rockafellar dual of the maximization problem (1.7)-(1.8). Using these equivalents, we overbalance the problem into the scope of augmented Lagrangian methods and give numerical approximations to the optimal partial transport problem. In particular, we will see that this approach does not need to evaluate c L (x, y), for each pair of endpoints x and y, but requires only some values of L. Also, the method provides at the same time active submeasures and the associated optimal transportation.
In addition, the Fenchel-Rockafellar duality between the maximization problem (1.7) and the minimization problem (1.5) brings up (as optimality condition) a system reminiscent of Mean Field Games (MFG) introduced by Lasry and Lions [32][33][34] and Huang et al. [27]. For the particular case (1.3), this system aims to find (ρ, υ) ∈ M + b (Q) × L 1 ρ (Q) N satisfying both the usual MFG system associated with the cost (1.1)-(1.3): and the following non-standard initial boundary values: In other words, these initial boundary values may be written as: and ρ(1) = ν in the set [u(1, .) < 0]. In the system (1.9), λ is an arbitrary non-negative parameter and the couple ( is unknown. Once the system is solved with the optimal λ for (1.5), the couple (ρ(0), ρ(1)) gives the active submeasures and ρ gives the optimal transportation.
Actually, for a given λ ≥ 0, (1.9)-(1.10) is a new type of constrained MFG system. In this direction, one can see some variant of constrained MFG systems and their connection with MFGs under congestion effects in the paper [40] (see also [7, 13-16, 36, 37] for recent progress on this very active research topic). However, let us mention that (1.9)-(1.10) is different from the class of MFG systems introduced in [40]. In particular, one sees that the constraints in (1.10) focus only the state at time t = 0 and t = 1. As to the constraints in [40], they are maintained on all the trajectory for every time t ∈ [0, 1] to handle some kind of congestion.
On the other hand, one sees that taking m = µ(R N ), the optimal partial transportation problem (1.1) can be interpreted as the projection with respect to W L -Wasserstein distance associated with c L onto the set of non-negative measures less than or equal to ν. Recall here that the case where k ≡ 1 and q = 2, this problem appears in the study of the congestion in the modeling of crowd motion (cf. [35]). So, it is possible to merge our numerical approximation algorithm into the splitting process of [35] for the study of the crowd motion with congestion.
At last, the main difficulty in the study of the above variational approach of the problem (1.1) for general Lagrangian L remains in the regularity of the solutions of the optimization problems like (1.5)-(1.8). To handle this difficulty we prove some new results concerning the approximation of the solutions of general constrained Hamilton-Jacobi equation like (1.8) by regular function. Moreover, we show how to use the notion of tangential gradient to study MFG system like (1.9)-(1.10) in the general case. In particular, when m = µ(R N ) = ν(R N ) and L(x, v) = L(v) is independent of x, this MFG system reduces to the PDE as in the work of Jimenez [31].
The remainder of the article is organized as follows: In the next section, we begin with the notations and assumptions. Section 3 concerns the uniqueness of active submeasures. Section 4 is devoted to the rigorously theoretical study of the problem with equivalent dual formulations that would be used later for numerical approximation. Besides these, an optimality condition is also given which recovers the Monge-Kantorovich equation as a particular case. The details of our numerical approximation are discussed in Section 5. The paper ends by some numerical examples.

Notations and assumptions
Let X be a topological vector space and X * be its dual. We use x * , x or x, x * to indicate the value x * (x) for x ∈ X and x * ∈ X * . The scalar product in R N is denoted by x · y or x, y for x, y ∈ R N . We denote by I K the indicator function of K and by ∂G the subdifferential of convex function G : X −→ R ∪ {+∞} in the sense of convex analysis. Given a measure γ, by γ A we mean the restricted measure of γ on A, that is, To avoid unnecessary difficulties, in our theoretical study, we will work with Ω = R N in the definition (1.2) of c L . Throughout this article, we drop the subscript L and write simply c instead of c L .
For example, the function L(x, v) := k(x) |v| q q , q > 1 satisfies the above assumption whenever k is (positive) continuous.
The convex conjugate H of L is, as usual, defined by Note that, under the assumption (A) on L, the function H(., .) is continuous in both variables and that c(., .) is locally Lipschitz. We set Q : for any compactly supported smooth function φ ∈ C ∞ c (Q). For short, we denote (2.1) by We say that (ρ 0 , ρ 1 ) is a couple of active submeasures if ρ 0 = π x #γ and ρ 1 = π y #γ for some optimal plan γ of the PMK problem (1.1).

Uniqueness of active submeasures
The existence of active submeasures follows easily from the direct method (see e.g. [11,28]). The present section concerns the uniqueness. The idea of the proof is based on the recent paper ( [38], Prop. 5.2). For completeness, we give here an adaptation to our case. Lemma 3.2. Assume that L satisfies the assumption (A). Let (ρ 0 , ρ 1 ) be a couple of active submeasures and γ ∈ π(ρ 0 , ρ 1 ) be an optimal plan.
Proof. We prove that ρ 1 = ν a.e. on B c (x * , R). If the conclusion is not true then there exists a compact set K B c (x * , R) with a positive Lebesgue measure such that ρ 1 < ν a.e. on K. The proof consists in the construction of a better planγ. Since (x * , y * ) ∈ supp(γ), we have where B(x, r) is the ball w.r.t. the Euclidean norm. Now, geometrically speaking, instead of transporting mass from x * to around y * , we can give more mass on K.
To be more precise, we construct a new planγ as follows Then π x #γ = ρ 0 and for all sufficiently small r. It follows thatγ ∈ π m (µ, ν). Furthermore, we have where we used the fact that This holds because of the definition of K and the continuity of c.
The next lemma provides an expression for active submeasures. 1 for some measurable sets B 0 , B 1 .
can be obtained via PDE techniques applied to the so-called obstacle Monge-Kantorovich equation (see [28]).

Equivalent formulations
In the present section, under the general assumptions of Section 2, we introduce and study the equivalent formulations for the PMK problem of the type (1.5)-(1.8).

Dual formulation
We start with the following Kantorovich-type dual formulation.
Coming back to our analysis, we note that if u ∈ K λ c and u is smooth then, using the definition of the convex conjugate function H, we get (4.5) In general, for any u ∈ K λ c , we can approximate u by smooth functions satisfying a similar estimate for (4.5). This is the content of the following lemma. Although we obtain here only the estimate at the limit, this is enough for later use.
and, for all ξ ∈ Lip([0, 1]; R N ), Proof. Let α ε , β ε be standard mollifiers on R and R N , respectively, such that Set η ε (t, x) := α ε (t)β ε (x). Letũ be a Lipschitz extension of u on R × R N . By means of convolution in both time and spacial variables, let us defineũ Let us show that u ε satisfies all the requirements. First, sinceũ is Lipschitz, u ε converges uniformly to u on Q. It remains to check the two inequalities (4.6) and (4.7). For all t ∈ [0, 1], using (4.8), we have + v, ∇ y u(s, y) ) dy ds Letting ε → 0, we obtain (4.6). Finally, let us fix any ξ ∈ Lip([0, 1]; R N ). Using (4.6) with x = ξ(t), v =ξ(t), we have lim sup Recall that (the Reverse Fatou's Lemma) if there exists an integrable function g on a measure space (X, η) such that g ε ≤ g for all ε, then lim sup ε g ε dη ≤ lim sup ε g ε dη. (4.11) In our case, on X := [0, 1] with the Lebesgue measure, the functions g ε (t) := ∂ t u ε (t, ξ(t)) + ∇ x u ε (t, ξ(t)),ξ(t) for a.e. t ∈ [0, 1] are bounded by a common constant depending Lipschitz constants of u and of ξ. Applying the Reverse Fatou's Lemma and (4.10), we deduce that lim sup Note that we can do even better in the case where L(x, v) = L(v) is independent of x. Indeed, from our argument (4.9), we can choose u ε such that without passing ε to 0. Now, we are ready to prove the duality (4.1). We check directly that the maximization is less than the minimum in (4.1) with the help of Lemma 4.4. For the converse inequality, we make use of the theory of Hamilton-Jacobi equations.
Proof of Theorem 4.1. Fix any u ∈ K λ c . Let u ε be the sequence of smooth functions given in Lemma 4.4. Fix any ξ ∈ Lip([0, 1]; R N ) such that ξ(0) = x, ξ(1) = y. By (4.7), we have (4.14) Since ξ is arbitrary, we get Thanks to Remark 4.3, we deduce that Conversely, let (φ, ψ) ∈ Φ λ c (µ, ν) be a maximizer in (4.4). Set Since c(., y) is locally Lipschitz w.r.t. the variable x, φ * is Lipschitz on the compact set supp(µ). Moreover, φ * is non-positive (since ψ ≤ 0 and c ≥ 0) and (φ * , ψ) is also a maximizer of the DPMK problem. By extension, we can assume that φ * is non-positive and Lipschitz on R N . Now, we set u * (t, x) := inf Then (see e.g. [22], Chap. 10 or [12], Chap. 6) u * is Lipschitz on Q and u * is a viscosity solution of the Hamilton-Jacobi equation These imply that u * ∈ K λ c and that Combing this with (4.15), the duality (4.1) holds and u * is a solution of the maximization problem on the right hand side of (4.1).

Eulerian formulation by Fenchel-Rockafellar duality
As we said in the introduction, the Fenchel-Rockafellar duality is an important ingredient of our analysis, especially for the numerical analysis by augmented Lagrangian methods. where Roughly speaking, the minimization in (4.19) is the Fenchel-Rockafellar dual of the maximization problem. However, the interesting point to note here is that the maximization problem in (4.19) does not satisfy the sufficient conditions to use directly the dual theory of Fenchel-Rockafellar. To overcome this difficulty, we approximate the maximization problem by a suitable supremum problem. To this end, for general Lagrangian L, we make use of the smooth approximations given in Lemma 4.4.
Proof of Theorem 4.5. Let us first show that (4.21) Fix any u ∈ K λ c and (ρ, υ, θ 0 , θ 1 ) ∈ B c . Let u ε be the sequence of smooth functions given in Lemma 4.4. Taking u ε as a test function in the continuity equation Letting ε → 0, using Lemma 4.4 and the fact that u(1, .) ≤ 0, u(0, .) + λ ≥ 0, we have This implies the desired inequality (4.21). Let us now prove the converse inequality. Obviously, we have It is sufficient to show that This will be proved by using the Fenchel-Rockafellar dual theory. Indeed, the supremum problem in (4.27) can be written as The proof is completed by computing explicitly the quantities in this maximization problem.

Optimality condition and constrained MFG system
To write down the optimality condition for the duality (4.19), we need to use the notion of tangential gradient to a measure. This notion is an adaptation of the usual gradient (for Lebesgue measure) to any finite Radon measure. Recall that the tangential gradient ∇ η u is well-defined for any Lipschitz function u and any finite Radon measure η (see e.g. [8,9,31]).
To prove Theorem 4.6, we need a similar estimate for (4.5) for any u ∈ K λ c . Since u is not smooth in general, we will characterize the estimate (4.5) via the tangential gradient instead of the usual one. Lemma 4.9. Let u be a Lipschitz function on Q and ∂ t u(t, we have ∇ ρ u(t, x) · (1, υ(t, x)) ≤ L(x, υ(t, x)) ρ-a.e. (t, x) in Q.

Numerical approximation
We will apply the ALG2 algorithm to the dual formulation on the right hand side of (4.1) in order to give numerical approximations for the optimal partial transport problem. We will solve for active submeasures ρ 0 = µ − θ 0 , ρ 1 = ν − θ 1 and the optimal movement of density ρ t from ρ 0 to ρ 1 .

ALG2 method
The method is to solve numerically the minimization of the form where F, G are l.s.c. convex functionals on Hilbert spaces V et Z; and Λ ∈ L(V, Z) is a linear operator. The Fenchel-Rockafellar dual problem of (5.1) reads as By introducing a new variable q ∈ Z to the primal problem (5.1), we can rewrite it in the form inf (u,q)∈V ×Z : Λu=q ALG2 is a primal-dual method (i.e. it provides numerical solutions of both primal and dual problems) consisting of 3 steps. Known σ i , q i , the next step (u i+1 , σ i+1 , q i+1 ) is computed as follows. Fix any parameter r > 0 (in practice r = 1, 2): For the theory of this method, its interpretation, we refer the reader to [20,24,25]. Let us recall below its convergence which is enough for our discretized problems later.

Applying ALG2 for the optimal partial transport problem
Recall that the dual maximization formulation in (4.1) can be rewritten into the form and, for all (q, z, w) ∈ Z, Figure 2. Active submeasures and their displacement.
Let us discuss the details of computation. Actually, in computation, we replace V, Z by finite-dimensional spaces using finite element methods. We denote by P i , i = 1, 2 the spaces of piecewise polynomials of degree i.

•
Step 1: We split into two steps: First using λ i to compute u i+1 and then using u i+1 to calculate λ i+1 . 1.  There are other first-order splitting methods for solving the minimization of F + G • Λ as implemented and compared in [10] for the context of mean field games.
This projection is also done in pointwise. Given a grid with vertices x k , with L(x, v) = k(x)|v| 2 , k ∈ C(Ω), k(x) > 0 for all x ∈ Ω, v ∈ R 2 . For this cost, the last projection in the Step 2 (the projection on K x ) is converted to a problem on R and the latter is computed easily by the bisection method. The active submeasures and the optimal displacement are given in Figure 1. Timestep 0 and timestep 9 (ρ 0 and ρ 1 ) are active submeasures of the source and the target, respectively. The intermediate timesteps show the optimal movement of density from ρ 0 to ρ 1 . Example 6.2. We take the similar data to the previous example but the source and the target are taken as the sums of two distributions, The result is given in Figure 2. This cost means that we have to pay much if we transport through around (0.5, 0.5) (where k(x) is big). The numerical result is illustrated in Figure 3.