logo

Paper Review: Physics-Informed Neural Networks 📂Machine Learning

Paper Review: Physics-Informed Neural Networks

Overview

  • The notation and numbering of references and formulas follow the conventions of the original paper.

Physics-informed neural networks (referred to as PINN) are artificial neural networks designed to numerically solve differential equations, introduced in the 2018 Journal of Computational Physics paper Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. The authors of the paper are M. Raissi, P. Perdikaris, and G.E. Karniadakis from the departments of Applied Mathematics and Mechanical Engineering.

The physics information mentioned in this paper, although it may sound grandiose, simply refers to the given differential equations themselves. In other words, using the given differential equations when solving them with artificial neural networks is essentially the same as saying ‘using physics information’ in this context. When reading machine learning papers, one should be cautious not to be swayed by such seemingly impressive terminology.

The reason PINN is receiving significant attention in the numerical solution of differential equations is likely due to the simplicity and ease of understanding of the idea behind the loss function, as well as its straightforward implementation. In fact, the paper introduces a very simple DNN as an example.

Commonly, the model introduced in Section 3.1 is referred to as PINN.

Implementation

0. Abstract

The authors describe PINN as ‘an artificial neural network trained to solve supervised learning problems while satisfying a given nonlinear partial differential equation’. The two main issues addressed in this paper are the ‘data-driven solution and data-driven discovery of partial differential equations’. To evaluate performance, problems in fluid mechanics, quantum mechanics, and diffusion equations were solved.

1. Introduction

Although recent advances in machine learning and data analysis have led to innovative results in scientific fields such as image recognition, cognitive science, and genomics, there is a challenge in complex physical, biological, and engineering systems to yield desired results with limited information (due to the high cost of data collection). In such a small data regime, the convergence of advanced technologies like DNNs, CNNs, and RNNs is not guaranteed.

Studies on methods to learn physics information efficiently (i.e., solve differential equations with minimal data) were conducted in [4-6]. The extension to nonlinear problems was proposed in subsequent studies by Raissi, one of the authors of this paper, in [8,9].

2. Problem setup

The function represented by an artificial neural network is determined by its input values (coordinates x,tx, t of the solution uu in a partial differential equation) and parameters. Automatic differentiation is utilized to differentiate these two types of variables.

Such neural networks are constrained to respect any symmetries, invariances, or conservation principles originating from the physical laws that govern the observed data, as modeled by general time-dependent and nonlinear partial differential equations.

This sentence from the paper might seem complex, but simply put, it means that the proposed artificial neural network, PINN, must satisfy the given differential equations. This is because the condition of satisfying the differential equations is used as a loss function, as will be discussed later.

The aim of this paper is to present a new modeling and computational paradigm to advance deep learning in mathematical physics. To this end, as mentioned earlier, this paper mainly addresses two issues. One is the data-driven solution of partial differential equations, and the other is the data-driven discovery of partial differential equations. All the codes and datasets used can be found at https://github.com/maziarraissi/PINNs. In this paper, a simple MLP using hyperbolic tangent as the activation function is used without any regularization such as L1L1, L2L2, or dropout, as introduced in the regularization section. The structure of the neural network, optimizer, learning rate, etc., are specifically introduced in each example.

This paper deals with the general form of parameterized and nonlinear partial differential equations as follows:

ut+N[u;λ]=0,xΩ,t[0,T] \begin{equation} u_{t} + \mathcal{N}[u; \lambda] = 0,\quad x \in \Omega,\quad t \in [0,T] \end{equation}

Here, u=u(t,x)u=u(t,x) is the hidden (i.e., not given or unknown) function, the solution of (1)(1) that we seek, and N[;λ]\mathcal{N}[\cdot; \lambda] is a nonlinear operator parameterized by λ\lambda, with ΩRD\Omega \subset \mathbb{R}^{D}. Many problems in mathematical physics can be represented in this form. For instance, consider the one-dimensional viscous Burgers’ equation:

ut+uux=νuxx u_{t} + uu_{x} = \nu u_{xx}

This corresponds to the case in (1)(1) where N[u;λ]=λ1uuxλ2uxx\mathcal{N}[u; \lambda] = \lambda_{1} uu_{x} - \lambda_{2}u_{xx} and λ=(λ1,λ2)\lambda = (\lambda_{1}, \lambda_{2}). The two problems addressed for the given equation (1)(1) are as follows:

  • data-driven solution of PDEs: For a fixed λ\lambda, what is the solution u(t,x)u(t,x) of the system?
  • data-driven discovery of PDEs: What are the parameters λ\lambda that best describe the observed data?

3. Data-driven solutions of partial differential equations

Section 3 discusses the problem of finding data-driven solutions for partial differential equations of the following form:

ut+N[u]=0,xΩ,t[0,T] \begin{equation} u_{t} + \mathcal{N}[u] = 0,\quad x \in \Omega,\quad t \in [0,T] \end{equation}

This corresponds to the situation in (1)(1) where the parameter λ\lambda is fixed. Section 3.1 and Section 3.2 will cover continuous time models and discrete time models respectively. The problem of finding the equations will be addressed in Section 4. The meaning of ‘data’ mentioned here will be explained in detail below.

3.1. Continuous time models

Assuming (t,x)R×R(t,x) \in \mathbb{R} \times \mathbb{R}, then u:R2Ru : \mathbb{R}^{2} \to \mathbb{R}. This will be approximated using an artificial neural network, employing a simple MLP implemented as follows. In Julia, it would be:

using Flux

u = Chain(
    Dense(2, 10, relu),
    Dense(10, 10, relu),
    Dense(10, 1)
    )

In PyTorch, it would be:

import torch
import torch.nn as nn
import torch.nn.functional as F

layers = [2, 10, 10, 1]

class network(nn.Module):
    def __init__(self):
        super(network, self).__init__()
        layer_list = [nn.Linear(layers[i], layers[i+1]) for i in range(len(layers)-1)]
        self.linears = nn.ModuleList(layer_list)
        
    def forward(self, tx):
        u = tx

        for i in range(len(layers)-2):
            u = self.linears[i](u)
            u = F.relu(u)

        u = self.linears[-1](u)
        
        return u

u = network()

Now, uu represents the artificial neural network we’ve defined, with 22 input nodes and 11 output node. Let’s define the left-hand side of (2)(2) as a function f=f(t,x;u)f = f(t,x; u) as follows:

f:=ut+N[u] \begin{equation} f := u_{t} + \mathcal{N}[u] \end{equation}

Since uu is an artificial neural network, ff also becomes a sort of artificial neural network with hidden layer parameters. The ff defined in this way is called a physics-informed neural network (PINN), which is, in essence, the given partial differential equation itself. The differentiation included in ff is implemented through automatic differentiation and shares the same parameters as uu. If the artificial neural network uu accurately approximates the solution to (2)(2), the function values of ff should be zero everywhere. We can infer that we will train the artificial neural network in a direction where f0 f \to 0.

Let’s say (tui,xui)(t_{u}^{i}, x_{u}^{i}) are points in the domain where the initial and boundary conditions are defined: (tui,xui)(Ω×{0})(Ω×[0,T]) (t_{u}^{i}, x_{u}^{i}) \in( \Omega \times \left\{ 0 \right\}) \cup (\partial \Omega \times [0, T]) If uu_{\ast} is the actual solution, having initial and boundary conditions means that the following values are given:

{tui,xui,ui}i=1Nu,ui=u(tui,xui) \left\{ t_{u}^{i}, x_{u}^{i}, u^{i} \right\}_{i=1}^{N_{u}},\quad u^{i} = u_{\ast} (t_{u}^{i}, x_{u}^{i})

Theoretically, we would have an infinite number of such values, but in numerical problems, we can only handle a finite number of points, so let’s say we have NuN_{u} points. The artificial neural network uu should output uiu^{i} when given (tui,xui)(t_{u}^{i}, x_{u}^{i}) as input, making these pairs the inputs and corresponding labels:

input=(tui,xui),label=ui \text{input} = (t_{u}^{i}, x_{u}^{i}),\qquad \text{label} = u^{i}

This is precisely the ‘data’ to be learned in PINN. We can now consider the following as the loss function:

MSEu=1Nui=1Nuu(tui,xui)ui2 MSE_{u} = \dfrac{1}{N_{u}} \sum\limits_{i=1}^{N_{u}} \left| u(t_{u}^{i},x_{u}^{i}) - u^{i} \right|^{2}

Additionally, ff should satisfy (2)(2) at appropriate points (ideally at all points where the solution uu_{\ast} is defined, but numerically we can only handle a finite number of points) {tfi,xfi}i=1Nf\left\{ t_{f}^{i}, x_{f}^{i} \right\}_{i=1}^{N_{f}}. In the paper, these points are referred to as collocation points. We set the following as the loss function for the collocation points:

MSEf=1Nfi=1Nff(tfi,xfi)2 MSE_{f} = \dfrac{1}{N_{f}}\sum\limits_{i=1}^{N_{f}} \left| f(t_{f}^{i}, x_{f}^{i}) \right|^{2}

In other words, MSEfMSE_{f} getting closer to 00 means satisfying the physical information (the partial differential equation). Therefore, the final loss function for training the artificial neural network uu is as follows:

MSE=MSEu+MSEf MSE = MSE_{u} + MSE_{f}

The paper explains that using MSEfMSE_{f} as a constraint for physical information, as done here, was first researched in [15, 16]. However, in the PINN paper, it was reviewed using modern computational tools and applied to more challenging dynamic systems.

The term physics-informed machine learning was first used in Wang’s study [17] on turbulence modeling. However, prior to PINN, studies simply employed machine learning algorithms like support vector machines, random forests, and FNNs. PINN is distinguished from these previous approaches by considering not only the derivatives with respect to the parameters commonly used in machine learning

but also the derivatives with respect to the coordinates x,tx, t of the solution. That is, if the solution approximated by an artificial neural network with parameter ww is denoted as u(t,x;w)u(t,x; w), while previously proposed methods only utilized the partial derivatives uwu_{w}, PINN also uses utu_{t}, uxu_{x}, etc., to find the solution. It explains that this approach allows for finding the solution well even with a small amount of data.

Despite the fact that there is no theoretical guarantee that this procedure converges to a global minimum, our empirical evidence indicates that, if the given partial differential equation is well-posed and its solution is unique, our method is capable of achieving good prediction accuracy given a sufficiently expressive neural network architecture and a sufficient number of collocation points NfN_{f}.

The paper notes that although there is no theoretical guarantee for the convergence of the proposed method, empirical evidence suggests that if the given partial differential equation is well-posed and has a unique solution, and if there are a sufficient number of points, then high prediction accuracy can be achieved.

3.1.1. Example (Schrodinger Equation)

This example focuses on verifying the effectiveness of the proposed method for solutions with periodic boundary conditions and complex values. As an example, the Schrodinger Equation with the following initial and boundary conditions is considered:

iht+0.5hxx+h2h=0,x[5,5],t[0,π/2],h(0,x)=2sech(x),h(t,5)=h(t,5),hx(t,5)=hx(t,5) \begin{align*} ih_{t} + 0.5h_{xx} + \left| h \right|^{2}h &= 0,\quad x\in [-5, 5], t\in[0, \pi/2], \\ h(0,x) &= 2\operatorname{sech} (x), \\ h(t,-5) &= h(t,5), \\ h_{x}(t,-5) &= h_{x}(t,5) \end{align*}

The solution to the problem, h(t,x)h_{\ast}(t,x), is a function with complex-valued function outputs, namely h:[0,π/2]×[5,5]Ch_{\ast} : [0, \pi/2] \times [-5, 5] \to \mathbb{C}. However, instead of defining an artificial neural network that outputs complex numbers, we define it to output a 2-dimensional vector consisting of u(t,x)u(t,x) representing the real part and v(t,x)v(t,x) representing the imaginary part. In simple terms, it is defined as an MLP with 2 input nodes and 2 output nodes:

h(t,x)=[u(t,x)v(t,x)] h(t,x) = \begin{bmatrix} u(t,x) \\[0.5em] v(t,x) \end{bmatrix}

In this problem, the PINN ff is defined as:

f:=iht+0.5hxx+h2h f := ih_{t} + 0.5h_{xx} + \left| h \right|^{2} h

The parameters of h(t,x)h(t,x) and f(t,x)f(t,x) are trained to minimize the loss for initial values MSE0MSE_{0}, the loss for boundary values MSEbMSE_{b}, and the loss for physical information MSEfMSE_{f}.

MSE=MSE0+MSEb+MSEf MSE = MSE_{0} + MSE_{b} + MSE_{f}

where MSE0=1N0i=1N0h(0,x0i)h0i2(h0i=2sech(x0i))MSEb=1Nbi=1Nb(h(tbi,5)h(tbi,5)2+hx(tbi,5)hx(tbi,5)2)MSEf=1Nfi=1Nff(tfi,xfi)2 \begin{align*} \text{where } MSE_{0} &= \dfrac{1}{N_{0}}\sum_{i=1}^{N_{0}} \left| h(0, x_{0}^{i}) - h_{0}^{i} \right|^{2} \qquad (h_{0}^{i} = 2\operatorname{sech} (x_{0}^{i})) \\ MSE_{b} &= \dfrac{1}{N_{b}}\sum_{i=1}^{N_{b}} \left( \left| h(t_{b}^{i}, -5) - h(t_{b}^{i}, 5) \right|^{2} + \left| h_{x}(t_{b}^{i},-5) - h_{x}(t_{b}^{i},5) \right|^{2} \right) \\ MSE_{f} &= \dfrac{1}{N_{f}} \sum\limits_{i=1}^{N_{f}} \left| f(t_{f}^{i}, x_{f}^{i}) \right|^{2} \end{align*}

  • Be aware that there is a typo in the formula for MSEbMSE_{b} in the paper.

Here, {x0i,h0i}i=1N0\left\{ x_{0}^{i}, h_{0}^{i} \right\}_{i=1}^{N_{0}} are the initial value data, {tbi}i=1Nb\left\{ t_{b}^{i} \right\}_{i=1}^{N_{b}} are the collocation points at the boundary, and {tfi,xfi}i=1Nf\left\{ t_{f}^{i}, x_{f}^{i} \right\}_{i=1}^{N_{f}} are the collocation points for ff.

For data generation, traditional spectral methods were used. The number of initial value data N0=50N_{0} = 50 and the number of boundary value data Nb=50N_{b} = 50 were chosen randomly. Additionally, the number of collocation points for ff is Nf=20,000N_{f} = 20,000. The artificial neural network was constructed by stacking

5 linear layers each with 100 nodes, and hyperbolic tangent tanh\tanh was used as the activation function between layers.

figure1.png

Figure 1.

In Figure 1, the upper image shows the heatmap of the predicted solution h(t,x)\left| h(t, x) \right|. The lower images show how well the predicted solution matches the actual solution at times t=0.59,0.79,0.98t = 0.59, 0.79, 0.98, respectively. The relative L2L_{2}-norm is 0.00197=1.971030.00197 = 1.97 \cdot 10^{-3}, which means the predicted solution differs by about 0.02%0.02\% when compared to the accurate solution. Therefore, PINN can accurately capture the nonlinear behavior of the Schrodinger equation even with a small amount of initial data.

The continuous time model being discussed works well even with a few initial values but has a potential limitation in that a large number of collocation points NfN_{f} are needed. This is not a significant issue when the spatial dimension is 2 or less, but in higher dimensions, the required number of collocation points can increase exponentially, which can be problematic. Therefore, in the next section, a more structured neural network that does not require many collocation points is presented, utilizing the classical Runge–Kutta time-stepping schemes.

3.2. Discrete time models

In Section 3.1, we approximated the solution over continuous time. In that case, the artificial neural network is trained simultaneously over the entire domain, providing an output for any arbitrary point (x,t)(x,t). In this section, unlike Section 3.1, we deal with discrete time. In other words, we will describe how to approximate the value at tn+1t_{n+1} using an artificial neural network, given the value at tnt_{n}. Applying a qq-stage Runge-Kutta method to (2)(2) yields the following: u(tn+1,x)=u(tn,x)Δtj=1qbjN[u(tn+cjΔt,x)] u(t_{n+1}, x) = u(t_{n}, x) - \Delta t \sum_{j=1}^{q} b_{j}\mathcal{N}\left[ u(t_{n}+c_{j} \Delta t, x) \right]

If we denote un(x)=u(tn,x)u^{n}(x) = u(t_{n}, x) and un+cj=u(tn+cjΔt,x)u^{n+c_{j}} = u(t_{n} + c_{j}\Delta t, x), then:

un+1=unΔtj=1qbjN[un+cj]where un+cj=unΔti=1qaj,iN[un+ci]j=1,,q \begin{equation} \begin{aligned} u^{n+1} &= u^{n} - \Delta t \sum_{j=1}^{q} b_{j}\mathcal{N}\left[ u^{n+c_{j}}\right] \\ \text{where } u^{n+c_{j}} &= u^{n} - \Delta t \sum_{i=1}^{q} a_{j,i}\mathcal{N}\left[ u^{n+c_{i}}\right] \quad j=1,\dots,q \end{aligned}\tag{7} \end{equation}

In the q+1q+1 equations above, let’s move all the \sum terms on the right-hand side to the left-hand side. Then, denote the left-hand side as uinu_{i}^{n}.

uq+1n:=un+1+Δtj=1qbjN[un+cj]=unu1n:=un+c1+Δti=1qa1,iN[un+ci]=unu2n:=un+c2+Δti=1qa2,iN[un+ci]=unuqn:=un+cq+Δti=1qaq,iN[un+ci]=un \begin{equation} \begin{aligned} u_{q+1}^{n} &:= u^{n+1} + \Delta t \sum_{j=1}^{q} b_{j}\mathcal{N}\left[ u^{n+c_{j}}\right] = u^{n} \\ \\ u_{1}^{n} &:= u^{n+c_{1}} + \Delta t \sum_{i=1}^{q} a_{1,i}\mathcal{N}\left[ u^{n+c_{i}}\right] = u^{n} \\ u_{2}^{n} &:= u^{n+c_{2}} + \Delta t \sum_{i=1}^{q} a_{2,i}\mathcal{N}\left[ u^{n+c_{i}}\right] = u^{n} \\ &\vdots \\ u_{q}^{n} &:= u^{n+c_{q}} + \Delta t \sum_{i=1}^{q} a_{q,i}\mathcal{N}\left[ u^{n+c_{i}}\right] = u^{n} \end{aligned}\tag{9} \end{equation}

From this, we can see that all these values should be equal to unu^{n}.

un=u1n=u2n==uq+1n(8) u^{n} = u_{1}^{n} = u_{2}^{n} = \cdots = u_{q+1}^{n} \tag{8}

Therefore, the physics information mentioned in Section 3.2 refers to the given initial & boundary conditions and (8)(8). Now, to compute u(tn+1,x)u(t_{n+1}, x), we define two artificial neural networks. The artificial neural network used in Section 3.1 was uu which is expected to converge to the exact solution uu_{\ast} and the differential equation ff that uu must satisfy, but here it’s slightly different. First, let’s define the artificial neural network UU as the following function:

U:RRq+1 U : \mathbb{R} \to \mathbb{R}^{q+1}

That is, it’s a neural network with 11 input node and q+1q+1 output nodes. Let’s assume the output of this network is as follows:

U(x)=[un+c1(x)un+c2(x)un+cq(x)un+1(x)](10) U(x) = \begin{bmatrix} u^{n+c_{1}}(x) \\[0.5em] u^{n+c_{2}}(x) \\ \vdots \\[0.5em] u^{n+c_{q}}(x) \\[0.5em] u^{n+1}(x) \end{bmatrix} \tag{10}

This network corresponds to the neural_net defined within the PhysicsInformedNN class in the attached code.

In the learning process below, the last component of the output of UU is expected to converge to u(tn+1,x)u(t_{n+1}, x). The second neural network is defined using the output of UU and the definition in (7)(7) as follows.

3.2.1. Example (Allen–Cahn equation)

The example for the discrete time model deals with the Allen-Cahn equation, given the following initial condition and periodic boundary conditions:

ut0.0001uxx+5u35u=0,x[1,1],t[0,1],u(0,x)=x2cos(πx),u(t,1)=u(t,1),ux(t,1)=ux(t,1) \begin{equation} \begin{aligned} &u_{t} - 0.0001u_{xx} + 5 u^{3} - 5u = 0,\qquad x\in [-1, 1], t\in[0, 1], \\ &u(0,x) = x^{2} \cos (\pi x), \\ &u(t,-1) = u(t,1), \\ &u_{x}(t,-1) = u_{x}(t,1) \end{aligned}\tag{12} \end{equation}

In this example, the nonlinear operator included in (9)(9) is as follows:

N[un+cj]=0.0001uxxn+cj+5(un+cj)35un+cj \mathcal{N}[u^{n+c_{j}}] = -0.0001u_{xx}^{n+c_{j}} + 5(u^{n+c_{j}})^{3} - 5u^{n+c_{j}}

Let’s denote the value of uu at time step tnt^{n} as un,iu^{n,i}:

un,i=un(xn,i)=u(tn,xn,i),i=1,,Nn u^{n,i} = u^{n}(x^{n,i}) = u(t^{n}, x^{n,i}),\qquad i=1,\dots,N_{n}

Since our problem is to compute un+1u^{n+1} given unu^{n}, {xn,i,un,i}i=1Nn\left\{ x^{n,i}, u^{n,i} \right\}_{i=1}^{N_{n}} is our given dataset. According to (8)(8), the following must hold for this dataset:

un,i=u1n(xn,i)==uq+1n(xn,i) u^{n,i} = u_{1}^{n}(x^{n,i}) = \cdots = u_{q+1}^{n}(x^{n,i})

So, let’s set the following loss function, the sum of squared error (SSE), for this:

  • It’s unclear why MSEMSE is not used here, but SSESSE is used for the discrete time model. The paper uses MSEMSE for continuous time models and SSESSE for discrete time models, which suggests there might be a reason (even if experimental).

SSEn=j=1q+1i=1Nnujn(xn,i)un,i2 SSE_{n} = \sum\limits_{j=1}^{q+1} \sum\limits_{i=1}^{N_{n}} \left| u_{j}^{n} (x^{n,i}) - u^{n,i} \right|^{2}

Each ujnu_{j}^{n} is computed according to (9)(9), with the calculations involving un+1u^{n+1} and un+cju^{n+c_{j}} being the output of the neural network UU. This loss corresponds to net_U0 defined within the PhysicsInformedNN class in the attached code. Since the output of UU must satisfy the boundary conditions of (12)(12), we set the following loss function:

SSEb=i=1qun+ci(1)un+ci(1)2+un+1(1)un+1(1)2+i=1quxn+ci(1)uxn+ci(1)2+uxn+1(1)uxn+1(1)2 \begin{align*} SSE_{b} &= \sum\limits_{i=1}^{q} \left| u^{n+c_{i}}(-1) - u^{n+c_{i}}(1) \right|^{2} + \left| u^{n+1}(-1) - u^{n+1}(1) \right|^{2} \\ &\quad+ \sum\limits_{i=1}^{q} \left| u_{x}^{n+c_{i}}(-1) - u_{x}^{n+c_{i}}(1) \right|^{2} + \left| u_{x}^{n+1}(-1) - u_{x}^{n+1}(1) \right|^{2} \\ \end{align*}

The final loss is the sum of these two:

SSE=SSEn+SSEb SSE = SSE_{n} + SSE_{b}

figure2.png

Figure 2.

In Fig. 2, the upper image shows the heatmap of the exact solution. The lower image shows the predicted values at t=0.9t=0.9, given the uu at t=0.1t=0.1. In the lower left image, the blue line represents the exact solution, and $\color{red}\mathsf{X

}$ marks the points used as data. In the lower right image, the blue line is the exact solution, and the red line is the predicted solution.

In Implicit Runge-Kutta methods (IRK), solving simultaneous equations for all jj is required to compute un+cju^{n+c_{j}}, meaning that the computational cost increases significantly as qq increases. However, the paper explains that the proposed method does not incur much additional cost even if qq increases. It also explains that while IRK may not be able to make accurate predictions with large time steps Δt\Delta t when qq is small, PINN can still make accurate predictions even with large Δt\Delta t.

4. Data-driven discovery of partial differential equations

This chapter deals with the problem of finding the parameters λ\lambda of the partial differential equation (1)(1) when observational data is available. The details are explained below with examples.

4.1. Continuous time models

Let’s define ff as the left-hand side of (1)(1):

f=ut+N[u;λ] f = u_{t} + \mathcal{N}[u; \lambda]

The difference from (3)(3) in Section 3 is that λ\lambda is no longer a fixed constant but an unknown parameter that needs to be learned.

4.1.1. Example (Navier–Stokes equation)

Section 4.1.1 introduces an example related to real data of an incompressible fluid described by the Navier-Stokes equation. Consider the following 2-dimensional Navier-Stokes equation:

ut+λ1(uux+vuy)=px+λ2(uxx+uyy)vt+λ1(uvx+vvy)=py+λ2(vxx+vyy) \begin{equation} \begin{aligned} u_{t} + \lambda_{1}(uu_{x} + vu_{y}) &= -p_{x} + \lambda_{2}(u_{xx} + u_{yy}) \\ v_{t} + \lambda_{1}(uv_{x} + vv_{y}) &= -p_{y} + \lambda_{2}(v_{xx} + v_{yy}) \end{aligned} \tag{15} \end{equation}

Here, u(t,x,y)u(t,x,y) is the xx component of the fluid’s velocity vector, v(t,x,y)v(t,x,y) is the yy component. And p(t,x,y)p(t,x,y) is the pressure, λ=(λ1,λ2)\lambda = (\lambda_{1}, \lambda_{2}) are unknown parameters. The solution to the Navier-Stokes equation satisfies the condition that the divergence is 00, hence the following holds:

ux+vy=0 \begin{equation} u_{x} + v_{y} = 0 \tag{17} \end{equation}

Let’s assume some latent function ψ(t,x,y)\psi (t, x, y) such that:

u=ψy,v=ψx u = \psi_{y},\quad v = -\psi_{x}

In other words, the fluid’s velocity vector is set as [ψyψx]\begin{bmatrix} \psi_{y} & -\psi_{x}\end{bmatrix}. This naturally satisfies (17)(17) since ux+vy=ψyxψxy=0u_{x} + v_{y} = \psi_{yx} - \psi_{xy} = 0. Instead of obtaining uu and vv individually, we approximate ψ\psi with an artificial neural network and derive u,vu, v as its partial derivatives. Let’s assume that the following measured information is available for the actual velocity vector field:

{ti,xi,yi,ui,vi}i=1N \left\{ t^{i}, x^{i}, y^{i}, u^{i}, v^{i} \right\}_{i=1}^{N}

From this, we set the loss function as follows, remembering that u=ψyu = \psi_{y} and v=ψxv = -\psi_{x}:

1Ni=1N(u(ti,xi,yi)ui2+v(ti,xi,yi)vi2) \dfrac{1}{N} \sum\limits_{i=1}^{N} \left( \left| u(t^{i}, x^{i}, y^{i}) - u^{i} \right|^{2} + \left| v(t^{i}, x^{i}, y^{i}) - v^{i} \right|^{2} \right)

And let’s rearrange the right-hand side of (15)(15) to the left-hand side and define them as ff and gg, respectively.

f:=ut+λ1(uux+vuy)+pxλ2(uxx+uyy)g:=vt+λ1(uvx+vvy)+pyλ2(vxx+vyy) \begin{equation} \begin{aligned} f &:= u_{t} + \lambda_{1}(uu_{x} + vu_{y}) + p_{x} - \lambda_{2}(u_{xx} + u_{yy}) \\ g &:= v_{t} + \lambda_{1}(uv_{x} + vv_{y}) + p_{y} - \lambda_{2}(v_{xx} + v_{yy}) \end{aligned}\tag{18} \end{equation}

Then the values of f,gf, g are expressed with ψ\psi as follows. (Note that pp will also be approximated by a neural network)

f=ψyt+λ1(ψyψyxψxψyy)+pxλ2(ψyxx+ψyyy)g=ψxt+λ1(ψyψxx+ψxψxy)+py+λ2(ψxxx+ψxyy) \begin{align*} f &= \psi_{yt} + \lambda_{1}(\psi_{y} \psi_{yx} - \psi_{x}\psi_{yy}) + p_{x} -\lambda_{2}(\psi_{yxx} + \psi_{yyy}) \\ g &= -\psi_{xt} + \lambda_{1}(-\psi_{y} \psi_{xx} + \psi_{x}\psi_{xy}) + p_{y} + \lambda_{2}(\psi_{xxx} + \psi_{xyy}) \\ \end{align*}

Add the information that f(ti,xi,yi)=0=g(ti,xi,yi)f(t^{i}, x^{i}, y^{i}) = 0 = g(t^{i}, x^{i}, y^{i}) to the loss function, and finally set it as follows:

MSE:=1Ni=1N(u(ti,xi,yi)ui2+v(ti,xi,yi)vi2)+1Ni=1N(f(ti,xi,yi)2+g(ti,xi,yi)2)(19) \begin{aligned} MSE &:= \dfrac{1}{N} \sum\limits_{i=1}^{N} \left( \left| u(t^{i}, x^{i}, y^{i}) - u^{i} \right|^{2} + \left| v(t^{i}, x^{i}, y^{i}) - v^{i} \right|^{2} \right) \\ &\qquad + \dfrac{1}{N} \sum\limits_{i=1}^{N} \left( \left| f(t^{i}, x^{i}, y^{i}) \right|^{2} + \left| g(t^{i}, x^{i}, y^{i}) \right|^{2} \right) \end{aligned} \tag{19}

Now let’s define an artificial neural network with 33 input nodes and 22 output nodes. Let’s assume its output to be [ψ(t,x,y)p(t,x,y)]\begin{bmatrix} \psi (t, x, y) & p(t, x, y) \end{bmatrix}. Then, the above loss function can be computed.

Experiments were conducted for cases with and without noise in the data, and in both cases, it was reported that λ1,λ2\lambda_{1}, \lambda_{2} could be predicted with high accuracy. It was also demonstrated that even if data for the pressure pp was not provided, the neural network could accurately approximate the parameters and pp. The specific experimental settings, results, and how the reference solutions were obtained are detailed in the paper.

5. Conclusions

In this paper, we introduced the physics-informed neural network, a new structure of neural networks that is capable of encoding the physical laws satisfied by given data and can be described by partial differential equations. This result has revealed that deep learning can learn about physical models, which could be applied to various physical simulations.

However, the authors note that the proposed method should not be considered as a replacement for traditional methods of solving partial differential equations, such as the finite element method or spectral methods. In fact, Runge-Kutta methods were utilized in conjunction with PINN in Section 3.2..

The authors also attempted to address questions about the hyperparameters required to implement PINN, such as how deep the neural network should be and how much data is needed. However, they observed that what is effective for one equation might not be effective for another.