The model reduction of the Vlasov-Poisson-Fokker-Planck system to the Poisson-Nernst-Planck system via the Deep Neural Network Approach

The model reduction of a mesoscopic kinetic dynamics to a macroscopic continuum dynamics has been one of the fundamental questions in mathematical physics since Hilbert's time. In this paper, we consider a diagram of the diffusion limit from the Vlasov-Poisson-Fokker-Planck (VPFP) system on a bounded interval with the specular reflection boundary condition to the Poisson-Nernst-Planck (PNP) system with the no-flux boundary condition. We provide a Deep Learning algorithm to simulate the VPFP system and the PNP system by computing the time-asymptotic behaviors of the solution and the physical quantities. We analyze the convergence of the neural network solution of the VPFP system to that of the PNP system via the Asymptotic-Preserving (AP) scheme. Also, we provide several theoretical evidence that the Deep Neural Network (DNN) solutions to the VPFP and the PNP systems converge to the a priori classical solutions of each system if the total loss function vanishes.


Motivation: a diagram of diffusion limit
The description of physical dynamics in various scales is one of the main questions of interest in the mathematical modeling of complex systems. In kinetic theory, the description of the evolution of gases has been explained via the statistical approach on the probabilistic distribution functions on the mesoscopic level, whereas the fluid theory describes the dynamics on the macroscopic level. Each of these interpretations and the asymptotic expansions of the mesoscopic equations to the macroscopic equations have been crucial issues.
The aim of this paper is to establish the commutation of the following diagram of diffusion limit, which provides the reduction of the kinetic equation (the Vlasov-Poisson-Fokker-Planck system) to the fluid equation (the Poisson-Nernst-Planck system) as the perturbation parameter tends to zero: Keywords and phrases. Vlasov-Poisson-Fokker-Planck system, Poisson-Nernst-Planck system, diffusion limit, artificial neural network, asymptotic-preserving scheme. We refer to a theoretical result from [83] to obtain the bottom side (Part I) of the diagram. For the left-hand side (Part II), the right-hand side (Part III), and the upper side (Part IV) of the diagram, we use a Deep Learning method using the Deep Neural Network (DNN) solution to approximate the solutions to the kinetic equation and the fluid equation. We provide large time behaviors and the steady-states of several physical moments of these DNN solutions to show an agreement with the theoretical results. Also, we provide theoretical evidence on the relationship between DNN solutions and the analytic solutions for the left and right-hand sides of the diagram.
There are many numerical studies to simulate an initial-boundary value problem for the kinetic and fluid equations. Especially, it is computationally challenging in a numerical scheme to automatically capture the limit for the asymptotic expansions on the small parameter (e.g. the parameter tends to zero as the upper side of Fig. 1). Many numerical schemes have been developed to overcome this challenge. These schemes are the so-called Asymptotic-Preserving (AP) schemes, which have been firstly introduced by Jin [54]. The key idea is to develop a numerical scheme to preserve the asymptotic limits from a mesoscopic to a macroscopic model in the fixed discrete setting.
A Deep Learning method has achieved remarkable success in various areas. Many studies have recently been introduced for learning partial differential equations (PDEs) using the Deep Learning method. These studies approximate the solutions of PDEs using a neural network architecture as a function approximator based on a universal approximation theorem [21]. Along with many numerical methods, this Deep Learning approach has been proposed as a new way to simulate PDE problems.
In this paper, we provide a Deep Learning algorithm to simulate each side of Figure 1. In addition, we prove that the Deep Neural Network solutions converge to the analytic solutions. The simulation results of the Deep Learning method as an AP scheme to see the trend of the diffusion limit of the Vlasov-Poisson-Fokker-Planck system.
The main distictive of this paper compared to the numerical methods is the use of the neural network approach as a function approximator for the VPFP system, the PNP system, and the AP scheme as the Knudsen number goes to 0. In this paper, our main goal is to complete the commutation in Figure 1 of the neural network version similar to Figure 2 of the numerical analysis version.

The Vlasov-Poisson-Fokker-Planck equation
In order to study the diffusion limit of the Vlasov-Poisson-Fokker-Planck system in a bounded interval Ω def = (−1, 1), we need to rescale the VPFP system with the Knudsen number . The small parameter represents the ratio of the mean free path of the particles to the typical macroscopic length scale of the particle flow. We are interested in the scaling of the system using the change of variables ′ = 2 and ′ = ; see Section 4 of [77] and Section 1 of [83]. With these variables, the VPFP system in a bounded interval Ω = (−1, 1) can be written in the dimensionless form as follows: The existence and the uniqueness of the VPFP system have been well-studied. Victory and O'Dwyer [78] showed the existence of the classical solutions to the VPFP system in two dimension. Rein and Weckler [73] and Bouchut [11] showed the existence of global solutions to the three-dimensional VPFP system in the whole space. We refer to [17] for the global weak solutions of VPFP system in a bounded domain with absorbing and reflection type boundary conditions. The large time asymptotic solutions to the Vlasov-Fokker-Planck equation has been studied first in [26,29] in the case that the particles occupy the whole space. They prove that the distribution function ( , , ) tends to a Maxwellian function. This result has been extended by Bouchut and Dolbeault [12] under the more general assumption on the external potential. Also, we refer to [18,27] in whole space domain. In the case of initial-boundary value problem for the VPFP system, Bonilla et al. [10] studied the large time asymptotic behaviors of the solutions with the reflection type boundary condition. The authors in [38] considered the global well-posedness of the nonlinear Fokker-Planck equation with the specular reflection boundary. Also, the well-posedness and regularity for different boundary conditions to the kinetic Fokker-Planck equation has been studied in [44][45][46][47][48][49][50][51].

The Poisson-Nernst-Planck equation
One of the macroscopic models to describe the distribution and the transport of ionic species is the Poisson-Nernst-Planck (PNP) system, where it is also often called the Drift-Diffusion-Poisson (DDP) equation. The PNP system consists of the Nernst-Planck equation that describes the drift and diffusion of ions and the Poisson equation that describes the effect of the self-consistent electric field. In this paper, we consider the following the 1-dimensional Poisson-Nernst-Planck (PNP) system in a bounded interval Ω = (−1, 1): We also assume the neutrality condition for the background charge ℎ( ) as follows: (1.6) The PNP system has a number of applications in many fields, such as electrical engineering, elctrokinetics, elctrochemistry and biophysics. Therfore, the analytical study of the PNP system has also a long history in the various context. An initial-boundary value problem for a system on the transport of mobile carriers in a semiconductor is studied by Gajewski and Groger [35]. The existence and large time behavior of the PNP equation is studied in [8]. Also, the convergence rate of solutions to the PNP system is studied in [4,7]. We refer to a review paper [79] for a recent development of generalized PNP systems.
1.5. Boundary conditions 1.5.1. Phase boundary and the specular reflection for the VPFP system Throughout this paper, we will denote the phase boundary in Ω × R as def = Ω × R. Additionally we split this boundary into an outgoing boundary + , an incoming boundary − , and a singular boundary 0 for grazing velocities, defined as where is the outward normal vector. We define the boundary integration for ( , ), ( , ) ∈ Ω × R, where d is the standard surface measure on Ω and denote We also define the 2 ( ) norm with respect to the measure | · |d d , In terms of , we formulate the specular reflection boundary condition as One of the well-known a priori conservation laws for the Vlasov-Poisson-Fokker-Planck system (1.1) for the specular boundary conditions is the conservation of mass. It means that the total mass, "Mass", of distribution ( , , ) is preserved for any time as follows: which means d d Mass( ) = 0.

No-flux Boundary Condition for the PNP system
The PNP system is usually posed in a bounded domain with some boundary condition. In this paper, we use the no-flux boundary condition for the PNP equation as follows: This implies that the conservation of total density, that is, 1.6. The equilibrium state and the macroscopic quantities 1.6.1. The equilibrium state and the macroscopic quantities for the VPFP system It is well-known that the VPFP system has a local equilibrium solution. Bonilla et al. [10] introduced the form of the steady-state of the VPFP system in bounded domains with the reflection boundary condition on ( , , ) and the Dirichlet boundary condtion for the potential Φ( , ) without a background charge. They remark that they can prove a result analogously with the Neumann boundary conditions instead of the Dirichlet boundary conditions. In this regard, the VPFP system (1.1), which has the background charge as in (1.1) 3 , has the equilibrium state as follows: (1.14) We expect that the neural network solutions of the VPFP system reach the steady-state (1.14) (see simluation Sect. 4.3). The Lyapunov functional ( ) for the VPFP system (1.1) is defined by the relative entropy of the solution ( , , ) with respect to a non-normalized stationary distributionˆ. As explained in [10], we define the Lyapunov functional ( ) as whereˆis defined asˆd Then, ( ) can be reduced as If we assume the background charge ℎ( ) as constant and assume the zero-mean constraint for the Φ (see Rem. 2.1), then we have Therefore, it yields that where the entropy of the system "Ent", the total kinetic energy "KE", and the electric pontential energy "EE" of the system are defined as and The Lyapunov functional is also called the free energy defined as Therefore, we have the following steady state: where ∞ ( ) is a solution of the Poisson-Boltzmann (PB) equation (1.23) The PB equation (1.22) has a solution ∞ ( ) = 0 similar to the PB equation (1.13) in the VPFP system. Therefore, the PNP system (1.4) has the steady state as follows: with the constant which is defined in (1.23). We expect that the neural network solutions of the PNP system reach the steady-state (1.24) (see simluation Sect. 5.3).
Also, the free energy FE ( ) of the PNP system (1.4) is defined as follows (similar to [33,53,63]): which has both the entropic part and the interaction part. The first term ( , ) log ( , ) on the right-hand side is the entropy related to the Brownian motion of each particles, and the second term 1 2 ( , ) 2 is the electric potential energy of the particles.
Under the specific boundary conditions (1.10) and (1.4) 5 , the PNP system has the following relation: by multiplying Φ( , ) onto (1.4) 3 , integrating it over the domain Ω and using the integration by parts with respect to . Therefore, the free energy can be rewritten as By taking the time derivative of the free energy FE ( ), we can derive Then, we have Therefore, the PNP system (1.4) satisfies the following free energy dissipation law: We expect that the free energy (1.25) of the PNP system is a non-increasing function (see simulation Sect. 5.3).

Mathematical results on the diffusion limit
In this section, we introduce past results on the diffusion limit of the VPFP system. There are two scalings of the VPFP. The first one is the diffusion limit (or the parabolic limit, or the low field limit), and the second one is the drift limit (or the hyperbolic limit, or the high field limit). In this paper, we consider the first one on the diffusion limit of the VPFP system only. The diffusion limit has been extensively investigated in many works. Poupaud [70] considers the diffusion limit of the semiconductor Boltzmann equation. The diffusion limit for the VPFP system with a given background was considered by [36,71] in the two-dimensional case. And later, El Ghani and Masmoudi [30] extend these results to higher dimensional cases in the renormalized sense. The case of multiple-species dynamics is also considered in [41,83]. A recent paper [83] of Wu et al. treats the diffusion limit of the VPFP system in a bounded domain with reflection boundary conditions. We used the results of this paper to show the bottom side of Figure 1. Also, there are many works that deal with the drift limit as in [3,9,37,66].

Existing numerical methods and an Asymptotic Preserving scheme
In this section, we introduce a brief history of the numerical methods to approximate the solutions of the VPFP system and the PNP equation. We also introduce the numerical studies concerning the asymptotic expansions on the small parameters, the so-called Asymptotic Preserving (AP) scheme.
There are many numerical studies to solve the VPFP system and related systems. There is a wide range of literature on numerical analysis for the Fokker-Planck (FP) equation including the finite difference method [19,74], and its conservative type scheme [6,[13][14][15]23]. The particle method [1,40] is an effective method for the stochastic properties of the Fokker-Planck operator. Also, Wollman and Ozizmir [80][81][82] provided the deterministic particle method for the VPFP systems in one and two-dimensional cases. Another approach is the spectral method to solve the Fokker-Planck equation. In [68], they develop a new spectral method based on a Fourier spectral approximation for the Boltzmann equation. Filbet and Pareschi [32] extended the method to the nonhomogenous case. The review paper [25] contains the latest references on numerical methods for collisional kinetic equations.
Also, a lot of efforts have been made to the numerical methods for the PNP system. Many of the existing methods have been constructed for both one-dimensional and higher dimensional cases in various chemical and biological contexts. We refer to some recent studies for solving time-dependent PNP systems. Solkalski et al. [76] proposed the finite difference scheme for analyzing liquid junction and ion-selective membrane potentials. Hyon et al. [53] provided another finite element method with the back-Euler method for the modified PNP system. It is considered to be difficult for numerical schemes to provide the physical properties of the PNP system; namely the nonnegativity principle, the mass conservation, and the free energy dissipation. Regarding these difficulties, Liu and Wang [62] developed a finite difference method for the PNP system. They focus on the development of a free energy satisfying numerical method for the PNP system. They also provided the discontinuous Galerkin scheme for a one-dimensional case in [63]. The implicit methods with the trapezoidal rule and backward differentiation are presented in [33].
Regarding the numerical methods to capture the relation between two regimes, Shi Jin [54] first introduced the numerical scheme that preserves the asymptotic limits from the mesoscopic to the macroscopic models for transport in diffusive regimes -the asymptotic-preserving (AP) scheme. The commutative diagram of Figure 2 (taken from Figure 1 of [55]) illustrates the AP scheme of [55]. As explained in [55], ℱ is a mesoscopic model which depends on parameter that characterizes the small scale. ℱ is a discretization of ℱ with parameter that is related to numerical discretization (such as mesh size and/or time step). As goes to zero, the mesoscopic model ℱ is approximated by a macroscopic model ℱ 0 . Then, the scheme ℱ is called AP if the asymptotic limit of ℱ as → 0 with fixed, denoted by ℱ 0 , is a good approximation of ℱ 0 .
The AP schemes are developed for various equations. Especially, there are many studies that deal with the AP schemes for the kinetic equations with the Euler regime. Filbet and Jin [31] developed a penalization method to overcome the Boltzmann integral, which is a fully nonlinear collision operator. Jin and Yan [57] generalized their idea to the nonhomogeneous Fokker-Planck-Landau equation. Dimarco and Pareschi [24] introduce an exponential Runge-Kutta method for kinetic equations. The AP schemes for the high field limit of the VPFP system are considered in [20,56]. In [20], they also developed the AP scheme based on a micro-macro decomposition for the diffusion limit of the Vlasov-Poisson-BGK model. We refer to the recent surveys by Jin [55], Degond [22] and Pareschi and Russo [67].
Given the existing numerical methods in the literature, the main distinctive of this paper is the use of the neural network approach as a solver for these important problems. We used the neural network method as a function approximator for the VPFP system, PNP system, and the AP scheme as the parameter goes to 0. The aim of this paper is to complete Figure 1 of neural network vesion similar to Figure 2 of the numerical analysis version.

Neural network and an approach to solve a PDE
Neural network has also drawn attention in the machine learning community. It has been used for various fields such as natural language process, image recognition, speech recognition, and others. Deep Learning, which uses a deep stack of neural network layer called a Deep Neural Network (DNN), is effectively applied in these areas. The neural network architecture is introduced in [65] for the first time. There are theoretical results to justify the use of neural networks in these applications such as [21,34,42,43]. The key theorem to these results is the universal approximation theorem. The universal approximation theorem states that an arbitrary real-valued function can be well-approximated by a feed-forward neural network. Later, Li [61] showed that the neural network with one hidden layer could approximate not only a target function but also its higher partial derivatives on a compact set.
Then Deep Learning as a PDE solver has also been studied; Lagaris et al. [59,60] suggested the use of neural networks to solve ODEs and PDEs. Recently, Raissi et al. [72] introduced physics informed neural networks. They design data-driven algorithms for two main problems: data-driven solutions and data-driven discovery of partial differential equations. The data-driven method to solve the high-dimensional PDEs with a DNN is proposed in [75]. The second problem, called the forward-inverse problem, is also considered in [58] with a theoretical analysis of the convergence of the DNN solutions to the classical solutions. In [2], they present a method for approximating the solution of PDEs using an adaptive collocation strategy. Also, Han et al. [39] deal with the uniformly accurate moment system using the kinetic equation as an example.
Hwang et al. [52] introduce the Deep Neural Network solutions to the kinetic Fokker-Planck equation in a bounded interval under the varied types of the physical boundary conditions. They observed the asymptotic behaviors of the DNN solutions to verify an agreement with theoretical results. They also provide the theoretical proofs on the relationship between the DNN solutions and the a priori analytic solutions. Our paper is motivated by several ideas in [52]. We expand their ideas to a more general VPFP system and its diffusion limit.

Outline of the paper
Each of the four sides of Figure 1 consists of four parts (Part I, II, III, and IV). In Section 2 (Part I), we show that the solutions of the VPFP system converge to the solutions of the PNP system as the Knudsen number tends to zero, which corresponds to the bottom side of Figure 1. To this end, we use the theoretical result from the paper [83]. In Section 3, we will introduce in detail our Deep Learning method to approximate the solution of the VPFP system and the solution of the PNP system, which is used for the numerical simulations in Part II, III, and IV. Part II will include the detailed descriptions on the DNN architectures for each system (Sect. 3.2), the definition of grid points (Sect. 3.3), and a "Grid Reuse" method that is a newly devised tool in the paper (Sect. 3.4) to capture the dynamics under a small Knudsen number . In Section 4 (Part II), we will introduce the DNN approximated solutions to the VPFP system (1.1), which corresponds to the left-hand side of Figure 1. We will provide the suitable loss functions (4.7) to approximate the VPFP system using the Deep Learning in Section 4.1. We will prove the convergence of the DNN solution to an analytic solution of the VPFP system as the loss function vanishes in Section 4.2. We will also provide the numerical simulations that show the asymptotic behaviors of macroscopic quantities and the pointwise values of the DNN solution to the VPFP system in Section 4.3. In Section 5 (Part III), we will introduce the DNN approximated solutions to the PNP system (1.4), which corresponds to the right-hand side of Figure 1. The contents would be analogous to those in Section 4. In Section 6 (Part IV), we will provide several numerical simulations to see the trend of the diffusion limit from the VPFP system to the PNP system, which corresponds to the upper side of Figure 1. We will analyze the convergence (2.4) and (2.5) using the DNN solutions of the VPFP system by varying the Knudsen number from 1 to 0.05 via the Asymptotic-Perserving (AP) scheme. Finally, in Section 7, we will summarize our methods and the results.

Part I. On convergence of the VPFP solution to the PNP solution
In this section, we introduce the convergence of solutions of the VPFP system to a solution of the PNP system from the recent paper ( [83], Thm. 2.1). Wu et al. [83] prove that the VPFP system (1.1) with the Maxwellian reflection boundary condition converges to the PNP system (1.4) as tends to zero for the multi-species model case. To be more specific, they consider the renormalized solution ( , , Φ ) of a rescaled -species VPFP system ( = 1, 2, . . . , ) in a bounded interval Ω ⊂ R using the scaled parameters as with initial condition, reflection boundary condition (especially, Maxwellian boundary condition) for the distribution function , and zero-outward electric field condition (Neumann boundary condition) for the electric potential Φ . They show that the solution ( , , Φ ) converges to ( ( , ) ( ), Φ( , )), where ( , Φ) is a weak solution of the PNP system in a bounded interval Ω as with the initial-boundary conditions given as follows as tends to zero ( = ( )'s are the normalized Maxwellians for each species). Using this result, we derive our specific system (1.1) with the boundary condition. Firstly, we specify the 1-dimension bounded domain Ω = (−1, 1) ⊂ R on the spatial domain and R on the velocity domain. Also, we consider the single-species case with = 1. This case is reasonable in plasma physics when the relatively huge ions are supposed to be static in the background. In this case, we denote the distribution function , as for the VPFP system, since = 1. We also choose the classical specular reflection boundary condition for the ( , , ) instead of the Maxwellian boundary condition used in [83]. We use the Dirichlet boundary condition for the electric force ( , ) which is the same as the Neumann boundary condition for the electric field Φ ( , ) assumed in [83]. The boundary conditions (2.3) imply the Neumann condition (∇ · = 0 on Ω) for the density function and the Dirichlet condition ( = ∇ Φ · = 0 on Ω) for the electric force ( , ). Additionally, we set all the parameters to be 1 in the systems (2.1) and (2.2) except the Knudsen number to take the limit.
Then, the solution (corresponding to the , =1 ) and the solution = Φ to the VPFP system (1.1) with the specular boundary condition (1.8) satisfy the following convergence: as the Knudsen number tends to zero, where the density (corresponding to the =1 ) and the solution satisfy the system (1.4) with the no-flux boundary condition (1.10). In Part IV (Sect. 6) of this paper, we provide the corresponding numerical simulations which show the trend of the convergence (2.4) and (2.5).
Remark 2.1. In [83], they prove the diffusion limit with two assumption for Φ on the Poisson equation; the global neutrality condition and the zero-mean constraint. The global neutrality condition is the same as the condition (1.2) we assumed. They also assume the zero-mean constraint as follows: his constraint is necessary to uniquely determine the solution Φ . However, we are interested in the solution ( , ) = − Φ ( , ) which is the partial of Φ( , ) instead of Φ ( , ) in this paper. Without loss of generality, we can assume (2.6) to apply the diffusion limit theorem from [83].

Simulation methodology: The Deep Learning approach
In this section, we introduce our deep learning method to solve the Cauchy problem to the Vlasov-Poisson-Fokker-Planck system (1.1) and the Poisson-Nernst-Planck (PNP) system (1.4).

A Deep Learning approach for solving partial differential equation
A Deep Learning algorithm can be described in terms of a non-linear function approximation method using a Deep Neural Network (DNN). A Deep Neural Network consists of a sequence of multiple layers. Each layers has several neurons, which receive the neuron activation from the pre-layer as input. The neurons implement the weighted sum of the input and apply an activation function in order to transform the output to a non-linear one. The output is transmitted to neurons in the post-layer. We assume that a DNN has layers; it has an input layer, − 1 hidden layers and an output layer ( -th layer). Similarly to the explanation of [52], we denote the relation between the -th layer and the ( + 1)-th layer ( = 1, 2, . . . , − 1) as

=1, =1
, and -: the -th neuron in the -th layer -¯: the activation function in the -th layer -( +1) : the weight between the -th neuron in the -th layer and the -th neuron in the ( + 1)-th layer -( +1) : the bias of the -th neuron in the ( + 1)-th layer -: the number of neurons in the -th layer.
Note that the relation between the input layer and the first hidden layer is expressed as follows: . The deep learning algorithm learns the complex nonlinear mapping by adapting these weights ( +1) and biases ( +1) to make the output of Deep Neural Network similar to the target function, in our case, the solution of the VPFP and PNP system. The Deep learning uses the back-propagation learning algorithm, which applies the chain rule to calculate the influence of each weight and each bias to reduce a pre-defined cost function, which is called "loss function" in the Deep Learning. Then, the algorithm uses the gradient method to update the weights and biases.
To approximate a solution of PDEs using the deep learning algorithm, we need an appropriate loss function with respect to the PDE system. For example, suppose we coinsider the following parabolic PDE: where ℒ is a differential operator and ℬ is the boundary operator with known functions 0 ( ) and ( ). In many papers (e.g. [72,75]), they approximate the solution ( , ) using the DNN output ( , ) with the loss function as The proposed loss function is an intuitive one to approximate the solution of PDE. In our case, we propose slightly different loss functions for each system. We define the loss function for the VPFP system in Section 4.1 (Part II) and for the PNP system in Section 5.1 (Part III). We propose the loss functions based on our theoretical evidence. In each section, we prove that the DNN output to the VPFP system and PNP system converges to a priori classical solution to each system if the proposed loss function goes to zero. The details are precisely described in Part II and Part III.

Our Deep Learning algorithm and the architecture
We take two different neural network structures which share the same inputs to approximate the coupleded nonlinear equations. Each DNN has four hidden layers and each layer has 3(or 2)-100-100-100-100-1 neurons. For the VPFP system, the two Deep Neural Networks are used to approximate the solutions, and , respectively. The neural network structure is precisely shown in Figure 3.
We denote the approximated solution as ( ( , , ; , , ), ( , ; , , )), which consists of the output of each DNN. The two outputs ( , , ; , , ) and ( , ; , , ) are used to calculate the pre-defined loss function. Then, we use a gradient descent algorithm to update the weights and biases of our model's parameters by iteratively moving in the direction of reducing the loss function. In this work, we use the Adam (Adaptive  Similarly, we use the two Deep Neural Networks to approximate the solutions ( , ) for the PNP system as in Figure 4. We denote the approximated solution as ( ( , ; , , ), ( , ; , , )). For the four hidden layers in each DNN, we use the hyper-tangent activation function (¯( ) = − − + − ), which is the common activation function in Deep Learning literature. While the choice of the activation function for the hidden layers is quite clear, the choice of an activation function for the output layer depends on the purposes. We use the Softplus activation function (¯( ) = ln(1 + )) only for the output ( , , ; , , ). It is one of the main issues to preserve the positivity of the output when the numerical scheme is constructed. Since the Softplus function has outputs in scale of (0, +∞), we easily apply the positivity constraint for the output ( , , ; , , ). We use the PyTorch library for deep learning. It is one of the most standard deep learning frameworks due to its simplicity and ease of use. We also use the Adam optimizer in PyTorch library with the Learning rate scheduling, which adjusts the learning rate based on the number of epochs. Regarding the loss function, we need the derivation and integration of the output with respect to the variables , and . To approximate the derivatives of the neural network output with respect to the input variables, we use the Autograd package in PyTorch library. It provides Automatic Differentiation (AD), which is one of the powerful techniques in scientific computing. The AD is different from the usual differentiation methods, such as numerical differentiation or the symbolic differentiation. We refer to the survey papers [5,69] for more details. Also, we use the trapezoidal rule from the PyTorch library to approximate the integration. The specific loss functions we defined for the VPFP system and the PNP system are explained in Part II (VPFP) and Part III (PNP).

Training data: grid points
To approximate the solutions to the VPFP system and the PNP system via the Deep Learning algorithm, we make the grid points for each variable domain as inputs in the neural networks. We need three-dimensional time-space-velocity grid for the probability density ( , , ; , , ) in VPFP system and two-dimensional time-space grid for the density ( , ) in PNP system and the force field ( , ; , , ) to the VPFP system and ( , ; , , ) to the PNP system. We choose the time interval [0, ] as [0, 5] only for the VPFP system with = 1 and [0, 1] with the smaller Knudsen number , which is enough to see the steady-state of both the VPFP system and the PNP system. Also, we truncate the momentum space for the variable as

"Grid Reuse" method to capture the small Knudsen number
In Part IV, we provide the numerical simulations when the Knudsen number is small. It is hard to capture the asymptotic limit with the fixed numerical discretization in numerical schemes.
To overcome this challenge, we propose a newly devised technique in this paper; we call it "Grid Reuse" method. The Deep Neural Network is trained to minimize the sum of loss functions at randomly sampled grid points in every epoch, as explained in (3.2)-(3.4). The idea of our "Grid Reuse" method is that we add more top− grid points of these randomly sampled grid points to use for training in the next epoch. Here the top− grid points { , , } =1 are being selected such that the integrand is being the largest before we take the integration of the error with respect to the variables , and . Namely, we choose the top− grid points that make the largest values of the integrand in the loss function (4.1). The "Grid Reuse" method helps to solve the time dependency, which is one of the main difficulties on capturing the diffusion limit. Also, we note that we only save grid points of 2-tuple ( , ) as in Algorithm 1 though the top− grid points consist of 3-tuple ( , , ). This is because we need to calculate the integration term ∫︀ ( , , ; , , )d for the loss function (4.2) when we reuse the top− grid points. Therefore, we only catch the temporal and spatial grid points ( , ) where the integrand in the loss function has the largest value. We then make the three-dimensional time-space-velocity grid with a randomly sampled velocity grid in = [−10, 10].
The "Grid Reuse" method is inspired by the Residual-based adaptive refinement (RAR) method in [64] and the adaptive collocation method in [2]. The technique of these methods and our method are similar to the adaptive mesh refinement method in numerical analysis.

Summary of Deep Learning algorithm
Finally, we summarize our Deep Learning algorithm for the VPFP system as follows:

6:
Make a pair the samples to set the training data as (3.2)-(3.4).

7:
Add new top-training data paired with the velocity samples.

8:
Evaluate the loss function:

10:
Approximate the integration of the DNN output (Trapezoidal rule).

11:
Evaluate the loss function for the VPFP system (4.7).

13:
Update neural network parameters using the Adam optimizer: 14: in the direction of minimizing the pre-defined loss function.

18: end for
We also apply a similar Deep Learning algorithm to the PNP system.

Part II. On convergence of DNN solutions to an analytic solution to the VPFP system and simulation results
In this section, we provide a DNN solution to the VPFP system. This section consists of three subsections. First, we propose the loss functions of the VPFP system for deep learning. We also prove the convergence of DNN solutions to an analytic solution of the VPFP system in two steps. Finally, we show that the simulation results on DNN solutions to the VPFP system agree with theoretical results by comparing the time-asymptotic behaviors and the macroscopic physical quantities which are defined in Section 1.6.1.
We will focus on the VPFP system (1.1) when the Knudsen number is 1 in this section. The fixed Knudsen number can be arbitrarily chosen. For the sake of simplicity, we abuse notations and write ( , , ) as ( , , ) and ( , ) as ( , ) in this section. Later, in Part IV (Sect. 6), we consider the varied Knudsen number regimes.

Loss functions for the VPFP system
In Algorithm 1, the Adam optimizer finds the optimal parameters new and new in the direction of minimizing a loss function. Thus, we need to define the loss functions for the Vlasov-Poisson-Fokker-Planck system: Loss GE for the VPFP system (1.1) 1 and (1.1) 3 , Loss IC for the initial condition (1.1) 2 and (1.1) 4 , Loss BC for the boundary conditions (1.8) and (1.1) 5 . Note that we use the superscript Loss for all loss functions to the VPFP system to distinguish it from the superscript Loss used for the loss functions to the PNP system in Section 5.1. First, we define the following loss functions for the governing equation (1.1) as and where def = [−10, 10]. Then we define Loss GE as We now define the loss function for the initial condition via the use of the initial grid points as and (4.4) Note that we use the equation (1.3) for the loss function Loss 2 IC . Then, we define Loss IC as Loss IC ( ) def = Loss IC (1) + Loss IC (2) .
The loss functions for the specular boundary condition for in Section 1.5.1 and the Dirichlet boundary condition for (1.1) 5 are defined as and Then we define the total loss for the boundary conditions as Finally, we define the total loss as Loss Total ( ) def = Loss GE + Loss IC + Loss BC . (4.7) Note that we compute these loss functions via the approximation of the integration by the Riemann sum on the grid points, which is explained in Section 3.3. For example, the loss function Loss GE ( ) can be approximated as where , , and , are the number of grid points.

On convergence of DNN solutions to analytic solutions to the VPFP system
In this section, we show the convergence of DNN solutions to analytic solutions to the VPFP system (1.1) in two steps. We first prove that there exists a sequence of neural network parameters (neuron numbers , weights and biases as defined in Sect. 3.1) such that the total loss function Loss Total converges to 0. Sequentially, we also prove that if we minimize the total loss function Loss Total , it implies that the Deep Neural Network solution converges to an analytic solution. Throughout the section, we assume that the existence and the uniqueness of solutions for the VPFP system (1.1) with the specular boundary condition (1.8) are a priori given.
We first introduce the following definition and the theorem from [61] on the existence of approximated neural network solutions: where ∈ R, ∈ R , and ∈ R, 0 ≤ ≤ such that for ∈ Z + , ≤ , for some , 1 ≤ ≤ . Remark 4.4. We can generalize the result above to the one with several hidden layers (see, [43]). Also, we may assume that the architecture is assumed to have only one hidden layer; i.e., = 2.
Now we introduce our first main theorem which states that a sequence of neural network solutions that makes the total loss function converge to zero exists if â︀ (1,1,2) solution to the VPFP system exists: Proof. This is similar to that of Theorem 3.4 of [52]. The first main Theorem 4.5 provides us that we can find the neural network parameters that reduce the pre-defined total loss function as much as we want. However, it does not imply that the DNN solutions converge to an analytic solution to the VPFP system. Therefore, we introduce our second main theorem, Theorem 4.7, which shows that the DNN solutions converge to an analytic solution in a suitable function space when we minimize the total loss function Loss Total . We assume that our compact domain = [−10, 10] of the -variable is chosen sufficiently large so that we can have for some sufficiently small > 0 and = 0, 1. where is a positive constant depending only on .
The proof of this theorem is provided in Appendix A.
Remark 4.8. Note that we fix the DNN architecture in Figure 3 before we train the DNN. Namely, we first fix the number of neurons for each layer before training, and then we update the weights and biases to minimize the total loss function. Therefore, if we want to approximate the DNN solution to an analytic solution to the VPFP system, Theorem 4.7 indicates how much the total loss function Loss Total ( ( , , ; , , )) has to be reduced. Then, Theorem 4.5 guarantees the existence of a 3-tuple ( , , ) where the total loss function is sufficiently reduced as we want. In the DNN simulation, we use Algorithm 1 to find the optimal weights and biases to reduce the total loss function while the number of neurons for each layer is fixed.

Neural Network simulations
In this section, we introduce numerical simulations for the solutions ( , , ; , , ) and ( , ; , , ) to the VPFP system (1.1). We consider the following initial condition: which has different initial ditributions at each position ∈ [−1, 1]. We consider the time interval [0, 5] which is enough to reach the steady state of the solution to the VPFP system. Also, we set the background charge ℎ( )  The first plot in Figure 5 shows the time-asymptotic behaviors of the ∞ norm of the distribution ( , , ; , , ) with respect to position and velocity . After 3 time grids, the value converges to almost constant. This indicates that the distributions ( , , ; , , ) converge to the steady state. It can be observed more clearly in the third plot in Figure 5, which shows the difference between the distribution ( , , ; , , ) and the global equilibrium (1.14). The 1 , 2 and ∞ norm of the difference with respect to position and velocity tend to zero as time increases. This is consistent to our theoretical supports provided in the equation (1.14). Later, the pointwise values of ( , , ; , , ) show the shape of the convergence to the global Maxwellian in Figure 7.
The second plot in Figure 5 shows the value of Mass( ) over time defined in (1.9). The plot shows that the total mass of the system is conserved. It shows an agreement with the theoretical result that the VPFP system with the specular boundary condition (1.8) yields the conservation of the total mass (1.9), which is an important a priori physical law for the VPFP system. Figure 6 shows the time-asymptotic behaviors of four macroscopic quantities of ( , , ; , , ); the total kinetic energy "KE" (1.18), the entropy "Ent" (1.17), the electric pontential energy "EE" (1.19) and the free energy "FE" (1.20). The steady state values of these four macroscopic quantities can obtained from the macroscopic quntities of the equilibrium in (1.14). Therefore, we expect the steady state values of the four macroscopic qunatities as follows: EE ∞ = 0, (4.15)   where |Ω| = 2 and ‖ 0 (·, ·)‖ 1 , ≈ 6.917 in our case. We denote the steady state values via the red-dotted lines in Figure 6. The four plots show that the each physical quantitity converges to each steady state. Also, the fourth plot in Figure 6 shows a non-increasing trend of the free energy. This is also consistent to our theoretical supports of (1.26).

|Ω|
( ) and ,∞ ( ) = 0, which is precisely explained in (1.14). We expect that the steady-state of the distribution ( , , ; , , ) to the VPFP system has the same global Maxwellian at each position ∈ [−1, 1] although the initial condition (4.12) has the different ditributions at each position. To confirm this, we denote the global Maxwellian function ,∞ ( , ) via the red-dotted lines in Figure 7. As we expect, Figure 7 shows that the distribution functions ( , , ; , , ) converge to the same Maxwellian shape at time = 5. The relative 2 , error between the global Maxwellian ,∞ ( , ) and the equilibrium of the neural network solution at = 5 is 4.7 × 10 −3 . Also, the pointwise values of ( , ; , , ) for all positions ∈ [−1, 1] converge to zero as shown in Figure 8. This result also shows an agreement with the theoretical steady-state (1.14).

Part III. On convergence of DNN solutions to an analytic solution to the PNP system and simulation results
In this section, we provide a DNN solution to the PNP system (1.4). This section also consists of three subsections, similarly to Part II (Sect. 4). First, we propose the loss functions for the PNP system. Second, we prove the convergence of a DNN solution to an analytic solution to the PNP system in two steps. Finally, we show the simulation results of the DNN solutions to the PNP system by comparing the time-asymptotic behaviors, the macroscopic quantities, and the steady-state of the PNP system which is defined in Section 1.6.1.

Loss functions for the PNP system
We need to define loss functions for the PNP system: Loss GE for the PNP system (1. Note that this loss function Loss GE (2) ( ) is not just the 2 error with respect to and . We add thederivative and the -derivative of the error to the original 2 error as shown in the definition (5.2). We need these two terms to prove the convergence of the neural network solution to the analytic solution in Theorem 5.2 in the following section. Then we define Loss GE as (2) .
We now define the loss function for the initial condition and Loss Then, we define Loss IC as Loss IC ( ) def = Loss IC (1) + Loss IC (2) .
The loss function for the Neumann boundary condition for ( , ) is defined as follows: We defined the loss function for the Dirichlet boundary condition for ( , ) Note that we add the error of ( , ; , , ) to the original 2 error as shown in the definition (5.6). This is also for the proof in Theorem 5.2 in the following section. Then, we define the total loss for the boundary conditions as Loss BC ( ) def = Loss BC (1) + Loss BC (2) .
Finally, we define the total loss as Note that we compute these loss functions via approximating the integration by the Riemann sum on the grid points similarly to Section 4.1.

On convergence of DNN solutions to an analytic solution to the PNP system
This section shows the convergence of the DNN solutions to an analytic solution to the PNP system (1.4) in two steps, similarly to Section 5.2. First, we prove that there exists a sequence of neural network parameters such that the total loss function Loss Total converges to 0. We then show that the corresponding sequence of DNN solutions converges to an analytic solution if we minimize the total loss function Loss Total . Throughout the section, we assume that the existence and the uniqueness of solutions for the PNP system (1.4) with the no-flux boundary condition (1.10) are a priori given.
We introduce our first main theorem similarly to that of Theorem 4.5 which shows the existence of a sequence of neural network parameters that makes the total loss function converge to zero if thê︀ (1,2)  Now we introduce our second main theorem, which shows that the sequence of DNN solutions converges to an analytic solution to the PNP system in a suitable function space when we minimize the total loss function Loss Total . We also refer to Remark 4.8, which explains how these main theorems are related to our Deep Learning algorithm.  where is a positive constant depending only on .
The proof of this theorem is provided in Appendix B.

Neural Network simulations
In this section, we provide numerical simulations for the solutions ( , ; , , ) and ( , ; , , ) to the PNP system (1.4). We set the initial condition of ( , ; , , ) as follows: Note that we set the initial condition (5.10) which satisfies (0, ) = 0 ( ) = ∫︀ R d 0 ( , ) to compare the convergence on the solutions of the VPFP system with the soluitons of the PNP system in Part IV. We also set the background charge ℎ( ) as constant to satisfy ∫︀ Ω (0, ) − ℎ( ) = 0. The details of our Deep Learning algorithm are explained in Sections 3.2, 3.3, and the summary of Algorithm 1. Figure 9 shows the total density (1.11) and the free energy (1.25) of the PNP system. As shown in the left plot in Figure 9, the total density Mass ( ) of the neural network solution ( , ; , , ) is conserved. It is well-matched to the theoretical result as in (1.11), which is an important property in the PNP system with the no-flux boundary condition (1.10). The right plot in Figure 9 shows the free energy of the neural network solution with the steady-state value via the red-dotted line. We compute the steady-state value of the free energy using the steady state of the PNP solution, ∞ ( ) and ∞ ( ), in (1.24). We observe that the free energy is nonincreasing as shown in the right plot in Figure 9. It verifies that the neural network solutions, ( , ; , , ) and ( , ; , , ), of the PNP system satisfy the dissipation law of the free energy as explained in (1.26). Also, we expect that the free energy decreases exponentially to the steady-state based on Theorem 1.2 in [7]. In Figure 10, the plot shows the time evolution of the free energy of the neural network solution in a log-linear scale. We compute the decreasing rate of the free energy with the difference between the free energy FE ( ) and the steady-state of the free energy FE ,∞ at = 1. We also denote the algebraic rates and the geometric rates in log-linear scale. We can observe that the decreasing rate of the free energy is almost simlilar to the geometric rate −11.7 , which is a linear function in Figure 10, except for a small error near the final time. Figure 10. The time-asymptotic behaviors of the difference between the free energy FE ( ) and the steady-state of the free energy FE ,∞ in log-linear scale. We consider the numerical solution at = 1 as the steady state of the free energy. Note that this plot verifies the exponential decay of the approximated free energy.  which is precisely explained in (1.24). As shown in the first plot in Figure 11, the neural network solution ( , ; , , ) converges to constant for all . It is well consistent to the theoretical supports provided in (1.24). Also, the second plot in Figure 11 shows that the neural network solution ( , ; , , ) converges to zero for all ∈ [−1, 1] as increases. This simulation result also well matches the expected steady state of the PNP system as in (1.24). 6. Part IV. On the simulation results of the diffusion limit from the VPFP system to the PNP system In this section, we provide the trend of the diffusion limit from the VPFP system to the PNP system using the simulation results of our Deep Neural Network approach. We consider the convergence of the VPFP solutions to the PNP solution, as summarized in Part I (Sect. 2). We expect that the neural network solutions of the VPFP system and the PNP system have the trend of diffusion limit as explained in the equations (2.4) and (2.5). To observe the trend of the convergence, we compare the neural network solutions to the VPFP system with the Knudsen numbers = 1, 0.5, 0.2, 0.1, 0.05 and the corresponding neural network solutions to the PNP system. The methods of how to train the neural network solutions to the VPFP system and the PNP system are precisely described in Section 3. Also, the results of the numerical simulations are given in Part II (Sect. 4) for the VPFP system and Part III (Sect. 5) for the PNP system, respectively.
As we have introduced in Section 3.4 on the simulation methodology, we use the "Grid Reuse" method to capture the VPFP with the small Knudsen number . When the "Grid Reuse" strategy is not used, the neural network solutions to the VPFP system could not approximate well at the early part of the time grid (about 0.0∼0.2 time grids) as the Knudsen number is smaller. This means that the VPFP system with the small Knudsen number is hard to be approximated at the early time grid using Deep Learning. Therefore, the "Grid Reuse" method is essential to observe the diffusion limit from the neural network solution of the VPFP system to the neural network solution of the PNP system.
We define the total loss function (4.7) in the sense of the Mean Square Error (MSE). In this section, we use the Root Mean Square Error (RMSE) as the loss function for the VPFP system. These two cases show almost similar results, but we choose the RMSE loss function that offers better results. We use 50 reused grid points ( = 50) in our work. Here we mention that it was not successful to approximate the solution when we chose between 0 and 10 for the small Knudsen number . When was selected around 50, the DNN solution well approximated the solution of the VPFP system with the Knudsen number being as small as 0.05. The most accurate DNN solution was obtained when we selected the value of as 50.
The details of our Deep Learning algorithm are explained in Section 3.2 and the summary of the Deep Learning Algorithm 1.

Neural Network simulations
In this section, we present the results of the numerical simulations for the diffusion limit from the VPFP system to the PNP system. We set the initial condtion (4.12) for the VPFP system and the initial condition (5.10) for the PNP system. It is worth noting that we do not change the number of the grid points for the VPFP system with any Knudsen numbers, i.e., we anlayze the diffusion limit in the sense of the Asymptotic-Preserving (AP) scheme. Instead, we use the "Grid Reuse" method for all neural network solutions. Figure 12 indicates the total mass ∫︀ Ω× d d of the VPFP system with the different Knudsen numbers = 1, 0.5, 0.2, 0.1 and 0.05 in different colors as shown in the legend. As shown in Figure 12, all five graphs overlap so that they appear as one graph. This is because that all the five cases conserve the total mass over time. This plot implies that the neural network solutions for all 5 cases well approximate the solution of the VPFP system. Figure 13 shows the ∞ norm of the solution ( , , ; , , ) to the VPFP system with the different Knudsen numbers = 1, 0.5, 0.2, 0.1 and 0.05 in different colors as shown in the legend. Also, we plot the ∞ norm of the neural network solution ( , ; , , ) ( ) via the red-dotted-line in Figure 13. We can observe that the ∞ norm of the solution ( , , ; , , ) converges pointwisely to the ∞ norm of the ( , ; , , ) ( ) as the Knudsen number becomes close to zero. This gives more information than the theoretical result of convergence as explained in (2.4).
The graphs in Figure 14 and in Figure 15 show the pointwise values of the solutions as time varies at each 's. Figure 14 shows the pointwise values of ∫︀ ( , , ; , , )d as the Knudsen number varies in different colors as shown in the legend. We also plot the pointwise values of ( , ; , , ) via the red-dotted lines.  The first plot in Figure 14 shows that the initial condition (4.12) for the neural network solution of the VPFP system is consistent with the initial condition (5.10) for the neural network solution of the PNP system, since we set the initial conditions to satisfy the relation ∫︀ R 0 ( , , )d = 0 ( , ). It is remarkable that the neural network solutions to the VPFP system with the different Knudsen numbers well approximate the initial condtion and the same for the solution to the PNP system. Also, we expect that the integration of neural network solution ∫︀ ( , , )d converges to the density ( , ; , , ) which is consistent to the convergence of to ( , ) ( ) as explained in (2.4). As shown in the six plots in Figure 14     We remark that the first plot in Figure 15 shows the same pointwise values of the electric force to the VPFP system with different Knudsen numbers and to the PNP system. It means that the initial conditions of the neural network solution of the electric force to the VPFP system and the PNP system are well approximated. Also, we observe that the six plots in Figure 15 show the solution ( , ; , , ) of the VPFP system converges to the solution ( , ; , , ) of the PNP system as the Knudsen number goes to zero. It agrees with the theoretical result as explained in (2.5).
Finally, Figure 16 shows that the 1 , , norm of the difference between the distribution ( , , ; , , ) and the solution ( , ; , , ) ( ) as varies. We note that Figure 16 shows the convergence of to ( , ) ( ) more quantitatively than the plots in the previous figures. As we expected in (2.4), the graph shows that the 1 , , norm of the difference between and becomes smaller as the Knudsen number tends to zero.

Conclusion
In this paper, we establish the commutation of the diagram of diffusion limit in Figure 1. This also implies the reduction of the VPFP system with the specular boundary condition to the PNP system with the no-flux boundary condition as the Knudsen number tends to zero. To this end, we have introduced the Deep Neural Network (DNN) solutions to the VPFP system and the PNP system using the Deep Learning algorithm. We use the two neural networks to approximate the VPFP system and the PNP system, coupled with the Poisson equation. Also, we propose appropriate loss functions for training, including the loss function for the initial conditions and the boundary conditions to each system: the VPFP system in Part II and the PNP system in Part III. We also provide the theoretical supports on which the approximated DNN solutions converge to analytic solutions of each system as the proposed total loss function tends to zero. We also provide the numerical simulations on the DNN solutions of each system, which support the theoretical predictions on the asymptotic behaviors of each system. These include the steady-states for the solutions and the physical quantities such as the total mass, the kinetic energy, the entropy, the electric energy, and the free energy.
Finally, using these DNN solutions of the two systems, we observe the trend of the diffusion limit in Part IV. We analyze our DNN solutions based on the theory shown in Part I. We use the newly devised technique "Grid Reuse" method adapted to the Deep Learning algorithm. This technique makes it possible to approximate the solution of the VPFP system with the Knudsen number in the range between 0.05 and 1.
We have provided the numerical simulation for the trend of the diffusion limit without a theoretical support and have seen that the DNN solution of the VPFP system converges to the DNN solution of the PNP system at the deep learning level. It is an interesting problem to theoretically prove the diffusion limit between the DNN solutions of those systems in the deep learning stage. Also, an improved method to approximate the VPFP system will be needed to make it work with a smaller value of the Knudsen number than 0.05. We leave these questions for future works.
One of the difficulties that the Deep Learning approach experiences as a PDE solver is on its rate of convergence and stability. Compared to the traditional numerical schemes which have a great amount of well-known studies on each method's performance, the Deep Learning method still has some difficulties in dealing with such things due to the optimization issues. However, it is also true that the Deep Learning is a new approach with many advantages as the Deep Learning method is a mesh-free method. The Deep Learning algorithm that we introduce in this paper does not require itself to have the mesh generation and instead it re-samples the grid points for each domain in every epoch. We expect that our work can be applied to arbitrary domains in higher dimensional kinetic equations.
Appendix A. Proof of Theorem 4.7 In this section, we provide the proof of Theorem 4.7.
Proof. Motivated by [17], we define a transform¯( , , ) of a function ( , , ) as follows: Then the transformed function¯satisfies Also, we define the error values of the functions¯and as the following equations: is defined as [0, ] × ± , where ± is equal to ± with the velocity domain R is replaced by . We now consider the following equation on the difference between¯and¯for each fixed on the compact set of , , only as , ( , , ). (A.1) Then we derive the inequality below by multiplying 2(¯−¯) onto (A.1) and integrating it over [−1, 1]× as where ⟨·, ·⟩ denotes the standard inner product on 2 ([−1, 1] × ). On the left-hand side of (A.2), we note that by the Leibniz rule and So, it yields that Also, note that So, it yields that , by Hölder's inequality. Thus, we have by (4.10). Then the integration by parts in variable yields that , ‖ 2 + √ 2‖ (2) Here, by Hölder's inequality, we have Also, ∈ˆ( 1,1,2) implies that for some positive constant 0 . Thus, by (4.10), we have , ‖ 2 + √ 2‖ (2) Also, by (A.5), we have by (4.10). Therefore, (A.7) yields that if < 1, then for some positive constant 1 , 2 , 3 , and 4 by the use of Young's inequality. Note that ‖ ‖ 1 is also bounded as we have ∈ˆ( 1,1,2) on the compact domain and we also have ‖ ‖ 1 ([−1,1]; 1 (R∖ )) ≤ by (4.10). Then, we can reduce (A.2) to d d , ( , , )| 2 d + 4 ‖ If we use 1 + 1 ≤ 5 with some positive constant 5 , we can rewrite (A.9) as follows: , | 2 d + ‖ for some positive constant 7 . Moreover, under the assumption on (4.10), we have for some positive constant 8 and 9 . Therefore, (A.11) and the inverse transform from¯to imply that In this section, we provide the proof of Theorem 5.2.
Proof. We define the error values of the neural network outputs and as the following equations: where ⟨·, ·⟩ denotes the standard inner product on 2 ([−1, 1]). On the left-hand side of (B.2), we note that Also, we reduce the absolute value of the first term on the right hand side of (B.5) to by the integration by parts with respect to . Note that we have Therefore, by Holder's inequality, trace theorem and the Cauchy-Schwarz inequality with 0 , we can bound To bound the second term 2 ( ) on the right-hand side of the inequality (B.6), we have using the inequalities (B.9) and (B.15). Using this bound, we can bound the second term 2,2 ( ) on the righthand side of the inequality (B.11) as for some positive constant . Therefore, this completes the proof of theorem.