Paper Review: Physics-Informed Neural Networks
Overview
- The notation and numbering of references and formulas follow the conventions of the original paper.
Physics-informed neural networks (referred to as PINN) are artificial neural networks designed to numerically solve differential equations, introduced in the 2018 Journal of Computational Physics paper Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. The authors of the paper are M. Raissi, P. Perdikaris, and G.E. Karniadakis from the departments of Applied Mathematics and Mechanical Engineering.
The physics information mentioned in this paper, although it may sound grandiose, simply refers to the given differential equations themselves. In other words, using the given differential equations when solving them with artificial neural networks is essentially the same as saying ‘using physics information’ in this context. When reading machine learning papers, one should be cautious not to be swayed by such seemingly impressive terminology.
The reason PINN is receiving significant attention in the numerical solution of differential equations is likely due to the simplicity and ease of understanding of the idea behind the loss function, as well as its straightforward implementation. In fact, the paper introduces a very simple DNN as an example.
Commonly, the model introduced in Section 3.1 is referred to as PINN.
0. Abstract
The authors describe PINN as ‘an artificial neural network trained to solve supervised learning problems while satisfying a given nonlinear partial differential equation’. The two main issues addressed in this paper are the ‘data-driven solution and data-driven discovery of partial differential equations’. To evaluate performance, problems in fluid mechanics, quantum mechanics, and diffusion equations were solved.
1. Introduction
Although recent advances in machine learning and data analysis have led to innovative results in scientific fields such as image recognition, cognitive science, and genomics, there is a challenge in complex physical, biological, and engineering systems to yield desired results with limited information (due to the high cost of data collection). In such a small data regime, the convergence of advanced technologies like DNNs, CNNs, and RNNs is not guaranteed.
Studies on methods to learn physics information efficiently (i.e., solve differential equations with minimal data) were conducted in [4-6]. The extension to nonlinear problems was proposed in subsequent studies by Raissi, one of the authors of this paper, in [8,9].
2. Problem setup
The function represented by an artificial neural network is determined by its input values (coordinates $x, t$ of the solution $u$ in a partial differential equation) and parameters. Automatic differentiation is utilized to differentiate these two types of variables.
Such neural networks are constrained to respect any symmetries, invariances, or conservation principles originating from the physical laws that govern the observed data, as modeled by general time-dependent and nonlinear partial differential equations.
This sentence from the paper might seem complex, but simply put, it means that the proposed artificial neural network, PINN, must satisfy the given differential equations. This is because the condition of satisfying the differential equations is used as a loss function, as will be discussed later.
The aim of this paper is to present a new modeling and computational paradigm to advance deep learning in mathematical physics. To this end, as mentioned earlier, this paper mainly addresses two issues. One is the data-driven solution of partial differential equations, and the other is the data-driven discovery of partial differential equations. All the codes and datasets used can be found at https://github.com/maziarraissi/PINNs. In this paper, a simple MLP using hyperbolic tangent as the activation function is used without any regularization such as $L1$, $L2$, or dropout, as introduced in the regularization section. The structure of the neural network, optimizer, learning rate, etc., are specifically introduced in each example.
This paper deals with the general form of parameterized and nonlinear partial differential equations as follows:
$$ \begin{equation} u_{t} + \mathcal{N}[u; \lambda] = 0,\quad x \in \Omega,\quad t \in [0,T] \end{equation} $$
Here, $u=u(t,x)$ is the hidden (i.e., not given or unknown) function, the solution of $(1)$ that we seek, and $\mathcal{N}[\cdot; \lambda]$ is a nonlinear operator parameterized by $\lambda$, with $\Omega \subset \mathbb{R}^{D}$. Many problems in mathematical physics can be represented in this form. For instance, consider the one-dimensional viscous Burgers’ equation:
$$ u_{t} + uu_{x} = \nu u_{xx} $$
This corresponds to the case in $(1)$ where $\mathcal{N}[u; \lambda] = \lambda_{1} uu_{x} - \lambda_{2}u_{xx}$ and $\lambda = (\lambda_{1}, \lambda_{2})$. The two problems addressed for the given equation $(1)$ are as follows:
- data-driven solution of PDEs: For a fixed $\lambda$, what is the solution $u(t,x)$ of the system?
- data-driven discovery of PDEs: What are the parameters $\lambda$ that best describe the observed data?
3. Data-driven solutions of partial differential equations
Section 3 discusses the problem of finding data-driven solutions for partial differential equations of the following form:
$$ \begin{equation} u_{t} + \mathcal{N}[u] = 0,\quad x \in \Omega,\quad t \in [0,T] \end{equation} $$
This corresponds to the situation in $(1)$ where the parameter $\lambda$ is fixed. Section 3.1 and Section 3.2 will cover continuous time models and discrete time models respectively. The problem of finding the equations will be addressed in Section 4. The meaning of ‘data’ mentioned here will be explained in detail below.
3.1. Continuous time models
Assuming $(t,x) \in \mathbb{R} \times \mathbb{R}$, then $u : \mathbb{R}^{2} \to \mathbb{R}$. This will be approximated using an artificial neural network, employing a simple MLP implemented as follows. In Julia, it would be:
using Flux
u = Chain(
Dense(2, 10, relu),
Dense(10, 10, relu),
Dense(10, 1)
)
In PyTorch, it would be:
import torch
import torch.nn as nn
import torch.nn.functional as F
layers = [2, 10, 10, 1]
class network(nn.Module):
def __init__(self):
super(network, self).__init__()
layer_list = [nn.Linear(layers[i], layers[i+1]) for i in range(len(layers)-1)]
self.linears = nn.ModuleList(layer_list)
def forward(self, tx):
u = tx
for i in range(len(layers)-2):
u = self.linears[i](u)
u = F.relu(u)
u = self.linears[-1](u)
return u
u = network()
Now, $u$ represents the artificial neural network we’ve defined, with $2$ input nodes and $1$ output node. Let’s define the left-hand side of $(2)$ as a function $f = f(t,x; u)$ as follows:
$$ \begin{equation} f := u_{t} + \mathcal{N}[u] \end{equation} $$
Since $u$ is an artificial neural network, $f$ also becomes a sort of artificial neural network with hidden layer parameters. The $f$ defined in this way is called a physics-informed neural network (PINN), which is, in essence, the given partial differential equation itself. The differentiation included in $f$ is implemented through automatic differentiation and shares the same parameters as $u$. If the artificial neural network $u$ accurately approximates the solution to $(2)$, the function values of $f$ should be zero everywhere. We can infer that we will train the artificial neural network in a direction where $ f \to 0$.
Let’s say $(t_{u}^{i}, x_{u}^{i})$ are points in the domain where the initial and boundary conditions are defined: $$ (t_{u}^{i}, x_{u}^{i}) \in( \Omega \times \left\{ 0 \right\}) \cup (\partial \Omega \times [0, T]) $$ If $u_{\ast}$ is the actual solution, having initial and boundary conditions means that the following values are given:
$$ \left\{ t_{u}^{i}, x_{u}^{i}, u^{i} \right\}_{i=1}^{N_{u}},\quad u^{i} = u_{\ast} (t_{u}^{i}, x_{u}^{i}) $$
Theoretically, we would have an infinite number of such values, but in numerical problems, we can only handle a finite number of points, so let’s say we have $N_{u}$ points. The artificial neural network $u$ should output $u^{i}$ when given $(t_{u}^{i}, x_{u}^{i})$ as input, making these pairs the inputs and corresponding labels:
$$ \text{input} = (t_{u}^{i}, x_{u}^{i}),\qquad \text{label} = u^{i} $$
This is precisely the ‘data’ to be learned in PINN. We can now consider the following as the loss function:
$$ MSE_{u} = \dfrac{1}{N_{u}} \sum\limits_{i=1}^{N_{u}} \left| u(t_{u}^{i},x_{u}^{i}) - u^{i} \right|^{2} $$
Additionally, $f$ should satisfy $(2)$ at appropriate points (ideally at all points where the solution $u_{\ast}$ is defined, but numerically we can only handle a finite number of points) $\left\{ t_{f}^{i}, x_{f}^{i} \right\}_{i=1}^{N_{f}}$. In the paper, these points are referred to as collocation points. We set the following as the loss function for the collocation points:
$$ MSE_{f} = \dfrac{1}{N_{f}}\sum\limits_{i=1}^{N_{f}} \left| f(t_{f}^{i}, x_{f}^{i}) \right|^{2} $$
In other words, $MSE_{f}$ getting closer to $0$ means satisfying the physical information (the partial differential equation). Therefore, the final loss function for training the artificial neural network $u$ is as follows:
$$ MSE = MSE_{u} + MSE_{f} $$
The paper explains that using $MSE_{f}$ as a constraint for physical information, as done here, was first researched in [15, 16]. However, in the PINN paper, it was reviewed using modern computational tools and applied to more challenging dynamic systems.
The term physics-informed machine learning was first used in Wang’s study [17] on turbulence modeling. However, prior to PINN, studies simply employed machine learning algorithms like support vector machines, random forests, and FNNs. PINN is distinguished from these previous approaches by considering not only the derivatives with respect to the parameters commonly used in machine learning
but also the derivatives with respect to the coordinates $x, t$ of the solution. That is, if the solution approximated by an artificial neural network with parameter $w$ is denoted as $u(t,x; w)$, while previously proposed methods only utilized the partial derivatives $u_{w}$, PINN also uses $u_{t}$, $u_{x}$, etc., to find the solution. It explains that this approach allows for finding the solution well even with a small amount of data.
Despite the fact that there is no theoretical guarantee that this procedure converges to a global minimum, our empirical evidence indicates that, if the given partial differential equation is well-posed and its solution is unique, our method is capable of achieving good prediction accuracy given a sufficiently expressive neural network architecture and a sufficient number of collocation points $N_{f}$.
The paper notes that although there is no theoretical guarantee for the convergence of the proposed method, empirical evidence suggests that if the given partial differential equation is well-posed and has a unique solution, and if there are a sufficient number of points, then high prediction accuracy can be achieved.
3.1.1. Example (Schrodinger Equation)
This example focuses on verifying the effectiveness of the proposed method for solutions with periodic boundary conditions and complex values. As an example, the Schrodinger Equation with the following initial and boundary conditions is considered:
$$ \begin{align*} ih_{t} + 0.5h_{xx} + \left| h \right|^{2}h &= 0,\quad x\in [-5, 5], t\in[0, \pi/2], \\ h(0,x) &= 2\operatorname{sech} (x), \\ h(t,-5) &= h(t,5), \\ h_{x}(t,-5) &= h_{x}(t,5) \end{align*} $$
The solution to the problem, $h_{\ast}(t,x)$, is a function with complex-valued function outputs, namely $h_{\ast} : [0, \pi/2] \times [-5, 5] \to \mathbb{C}$. However, instead of defining an artificial neural network that outputs complex numbers, we define it to output a 2-dimensional vector consisting of $u(t,x)$ representing the real part and $v(t,x)$ representing the imaginary part. In simple terms, it is defined as an MLP with 2 input nodes and 2 output nodes:
$$ h(t,x) = \begin{bmatrix} u(t,x) \\[0.5em] v(t,x) \end{bmatrix} $$
In this problem, the PINN $f$ is defined as:
$$ f := ih_{t} + 0.5h_{xx} + \left| h \right|^{2} h $$
The parameters of $h(t,x)$ and $f(t,x)$ are trained to minimize the loss for initial values $MSE_{0}$, the loss for boundary values $MSE_{b}$, and the loss for physical information $MSE_{f}$.
$$ MSE = MSE_{0} + MSE_{b} + MSE_{f} $$
$$ \begin{align*} \text{where } MSE_{0} &= \dfrac{1}{N_{0}}\sum_{i=1}^{N_{0}} \left| h(0, x_{0}^{i}) - h_{0}^{i} \right|^{2} \qquad (h_{0}^{i} = 2\operatorname{sech} (x_{0}^{i})) \\ MSE_{b} &= \dfrac{1}{N_{b}}\sum_{i=1}^{N_{b}} \left( \left| h(t_{b}^{i}, -5) - h(t_{b}^{i}, 5) \right|^{2} + \left| h_{x}(t_{b}^{i},-5) - h_{x}(t_{b}^{i},5) \right|^{2} \right) \\ MSE_{f} &= \dfrac{1}{N_{f}} \sum\limits_{i=1}^{N_{f}} \left| f(t_{f}^{i}, x_{f}^{i}) \right|^{2} \end{align*} $$
- Be aware that there is a typo in the formula for $MSE_{b}$ in the paper.
Here, $\left\{ x_{0}^{i}, h_{0}^{i} \right\}_{i=1}^{N_{0}}$ are the initial value data, $\left\{ t_{b}^{i} \right\}_{i=1}^{N_{b}}$ are the collocation points at the boundary, and $\left\{ t_{f}^{i}, x_{f}^{i} \right\}_{i=1}^{N_{f}}$ are the collocation points for $f$.
For data generation, traditional spectral methods were used. The number of initial value data $N_{0} = 50$ and the number of boundary value data $N_{b} = 50$ were chosen randomly. Additionally, the number of collocation points for $f$ is $N_{f} = 20,000$. The artificial neural network was constructed by stacking
5 linear layers each with 100 nodes, and hyperbolic tangent $\tanh$ was used as the activation function between layers.
Figure 1.
In Figure 1, the upper image shows the heatmap of the predicted solution $\left| h(t, x) \right|$. The lower images show how well the predicted solution matches the actual solution at times $t = 0.59, 0.79, 0.98$, respectively. The relative $L_{2}$-norm is $0.00197 = 1.97 \cdot 10^{-3}$, which means the predicted solution differs by about $0.02\%$ when compared to the accurate solution. Therefore, PINN can accurately capture the nonlinear behavior of the Schrodinger equation even with a small amount of initial data.
The continuous time model being discussed works well even with a few initial values but has a potential limitation in that a large number of collocation points $N_{f}$ are needed. This is not a significant issue when the spatial dimension is 2 or less, but in higher dimensions, the required number of collocation points can increase exponentially, which can be problematic. Therefore, in the next section, a more structured neural network that does not require many collocation points is presented, utilizing the classical Runge–Kutta time-stepping schemes.
3.2. Discrete time models
In Section 3.1, we approximated the solution over continuous time. In that case, the artificial neural network is trained simultaneously over the entire domain, providing an output for any arbitrary point $(x,t)$. In this section, unlike Section 3.1, we deal with discrete time. In other words, we will describe how to approximate the value at $t_{n+1}$ using an artificial neural network, given the value at $t_{n}$. Applying a $q$-stage Runge-Kutta method to $(2)$ yields the following: $$ u(t_{n+1}, x) = u(t_{n}, x) - \Delta t \sum_{j=1}^{q} b_{j}\mathcal{N}\left[ u(t_{n}+c_{j} \Delta t, x) \right] $$
If we denote $u^{n}(x) = u(t_{n}, x)$ and $u^{n+c_{j}} = u(t_{n} + c_{j}\Delta t, x)$, then:
$$ \begin{equation} \begin{aligned} u^{n+1} &= u^{n} - \Delta t \sum_{j=1}^{q} b_{j}\mathcal{N}\left[ u^{n+c_{j}}\right] \\ \text{where } u^{n+c_{j}} &= u^{n} - \Delta t \sum_{i=1}^{q} a_{j,i}\mathcal{N}\left[ u^{n+c_{i}}\right] \quad j=1,\dots,q \end{aligned}\tag{7} \end{equation} $$
In the $q+1$ equations above, let’s move all the $\sum$ terms on the right-hand side to the left-hand side. Then, denote the left-hand side as $u_{i}^{n}$.
$$ \begin{equation} \begin{aligned} u_{q+1}^{n} &:= u^{n+1} + \Delta t \sum_{j=1}^{q} b_{j}\mathcal{N}\left[ u^{n+c_{j}}\right] = u^{n} \\ \\ u_{1}^{n} &:= u^{n+c_{1}} + \Delta t \sum_{i=1}^{q} a_{1,i}\mathcal{N}\left[ u^{n+c_{i}}\right] = u^{n} \\ u_{2}^{n} &:= u^{n+c_{2}} + \Delta t \sum_{i=1}^{q} a_{2,i}\mathcal{N}\left[ u^{n+c_{i}}\right] = u^{n} \\ &\vdots \\ u_{q}^{n} &:= u^{n+c_{q}} + \Delta t \sum_{i=1}^{q} a_{q,i}\mathcal{N}\left[ u^{n+c_{i}}\right] = u^{n} \end{aligned}\tag{9} \end{equation} $$
From this, we can see that all these values should be equal to $u^{n}$.
$$ u^{n} = u_{1}^{n} = u_{2}^{n} = \cdots = u_{q+1}^{n} \tag{8} $$
Therefore, the physics information mentioned in Section 3.2 refers to the given initial & boundary conditions and $(8)$. Now, to compute $u(t_{n+1}, x)$, we define two artificial neural networks. The artificial neural network used in Section 3.1 was $u$ which is expected to converge to the exact solution $u_{\ast}$ and the differential equation $f$ that $u$ must satisfy, but here it’s slightly different. First, let’s define the artificial neural network $U$ as the following function:
$$ U : \mathbb{R} \to \mathbb{R}^{q+1} $$
That is, it’s a neural network with $1$ input node and $q+1$ output nodes. Let’s assume the output of this network is as follows:
$$ U(x) = \begin{bmatrix} u^{n+c_{1}}(x) \\[0.5em] u^{n+c_{2}}(x) \\ \vdots \\[0.5em] u^{n+c_{q}}(x) \\[0.5em] u^{n+1}(x) \end{bmatrix} \tag{10} $$
This network corresponds to the neural_net
defined within the PhysicsInformedNN
class in the attached code.
In the learning process below, the last component of the output of $U$ is expected to converge to $u(t_{n+1}, x)$. The second neural network is defined using the output of $U$ and the definition in $(7)$ as follows.
3.2.1. Example (Allen–Cahn equation)
The example for the discrete time model deals with the Allen-Cahn equation, given the following initial condition and periodic boundary conditions:
$$ \begin{equation} \begin{aligned} &u_{t} - 0.0001u_{xx} + 5 u^{3} - 5u = 0,\qquad x\in [-1, 1], t\in[0, 1], \\ &u(0,x) = x^{2} \cos (\pi x), \\ &u(t,-1) = u(t,1), \\ &u_{x}(t,-1) = u_{x}(t,1) \end{aligned}\tag{12} \end{equation} $$
In this example, the nonlinear operator included in $(9)$ is as follows:
$$ \mathcal{N}[u^{n+c_{j}}] = -0.0001u_{xx}^{n+c_{j}} + 5(u^{n+c_{j}})^{3} - 5u^{n+c_{j}} $$
Let’s denote the value of $u$ at time step $t^{n}$ as $u^{n,i}$:
$$ u^{n,i} = u^{n}(x^{n,i}) = u(t^{n}, x^{n,i}),\qquad i=1,\dots,N_{n} $$
Since our problem is to compute $u^{n+1}$ given $u^{n}$, $\left\{ x^{n,i}, u^{n,i} \right\}_{i=1}^{N_{n}}$ is our given dataset. According to $(8)$, the following must hold for this dataset:
$$ u^{n,i} = u_{1}^{n}(x^{n,i}) = \cdots = u_{q+1}^{n}(x^{n,i}) $$
So, let’s set the following loss function, the sum of squared error (SSE), for this:
- It’s unclear why $MSE$ is not used here, but $SSE$ is used for the discrete time model. The paper uses $MSE$ for continuous time models and $SSE$ for discrete time models, which suggests there might be a reason (even if experimental).
$$ SSE_{n} = \sum\limits_{j=1}^{q+1} \sum\limits_{i=1}^{N_{n}} \left| u_{j}^{n} (x^{n,i}) - u^{n,i} \right|^{2} $$
Each $u_{j}^{n}$ is computed according to $(9)$, with the calculations involving $u^{n+1}$ and $u^{n+c_{j}}$ being the output of the neural network $U$. This loss corresponds to net_U0
defined within the PhysicsInformedNN
class in the attached code. Since the output of $U$ must satisfy the boundary conditions of $(12)$, we set the following loss function:
$$ \begin{align*} SSE_{b} &= \sum\limits_{i=1}^{q} \left| u^{n+c_{i}}(-1) - u^{n+c_{i}}(1) \right|^{2} + \left| u^{n+1}(-1) - u^{n+1}(1) \right|^{2} \\ &\quad+ \sum\limits_{i=1}^{q} \left| u_{x}^{n+c_{i}}(-1) - u_{x}^{n+c_{i}}(1) \right|^{2} + \left| u_{x}^{n+1}(-1) - u_{x}^{n+1}(1) \right|^{2} \\ \end{align*} $$
The final loss is the sum of these two:
$$ SSE = SSE_{n} + SSE_{b} $$
Figure 2.
In Fig. 2, the upper image shows the heatmap of the exact solution. The lower image shows the predicted values at $t=0.9$, given the $u$ at $t=0.1$. In the lower left image, the blue line represents the exact solution, and $\color{red}\mathsf{X
}$ marks the points used as data. In the lower right image, the blue line is the exact solution, and the red line is the predicted solution.
In Implicit Runge-Kutta methods (IRK), solving simultaneous equations for all $j$ is required to compute $u^{n+c_{j}}$, meaning that the computational cost increases significantly as $q$ increases. However, the paper explains that the proposed method does not incur much additional cost even if $q$ increases. It also explains that while IRK may not be able to make accurate predictions with large time steps $\Delta t$ when $q$ is small, PINN can still make accurate predictions even with large $\Delta t$.
4. Data-driven discovery of partial differential equations
This chapter deals with the problem of finding the parameters $\lambda$ of the partial differential equation $(1)$ when observational data is available. The details are explained below with examples.
4.1. Continuous time models
Let’s define $f$ as the left-hand side of $(1)$:
$$ f = u_{t} + \mathcal{N}[u; \lambda] $$
The difference from $(3)$ in Section 3 is that $\lambda$ is no longer a fixed constant but an unknown parameter that needs to be learned.
4.1.1. Example (Navier–Stokes equation)
Section 4.1.1 introduces an example related to real data of an incompressible fluid described by the Navier-Stokes equation. Consider the following 2-dimensional Navier-Stokes equation:
$$ \begin{equation} \begin{aligned} u_{t} + \lambda_{1}(uu_{x} + vu_{y}) &= -p_{x} + \lambda_{2}(u_{xx} + u_{yy}) \\ v_{t} + \lambda_{1}(uv_{x} + vv_{y}) &= -p_{y} + \lambda_{2}(v_{xx} + v_{yy}) \end{aligned} \tag{15} \end{equation} $$
Here, $u(t,x,y)$ is the $x$ component of the fluid’s velocity vector, $v(t,x,y)$ is the $y$ component. And $p(t,x,y)$ is the pressure, $\lambda = (\lambda_{1}, \lambda_{2})$ are unknown parameters. The solution to the Navier-Stokes equation satisfies the condition that the divergence is $0$, hence the following holds:
$$ \begin{equation} u_{x} + v_{y} = 0 \tag{17} \end{equation} $$
Let’s assume some latent function $\psi (t, x, y)$ such that:
$$ u = \psi_{y},\quad v = -\psi_{x} $$
In other words, the fluid’s velocity vector is set as $\begin{bmatrix} \psi_{y} & -\psi_{x}\end{bmatrix}$. This naturally satisfies $(17)$ since $u_{x} + v_{y} = \psi_{yx} - \psi_{xy} = 0$. Instead of obtaining $u$ and $v$ individually, we approximate $\psi$ with an artificial neural network and derive $u, v$ as its partial derivatives. Let’s assume that the following measured information is available for the actual velocity vector field:
$$ \left\{ t^{i}, x^{i}, y^{i}, u^{i}, v^{i} \right\}_{i=1}^{N} $$
From this, we set the loss function as follows, remembering that $u = \psi_{y}$ and $v = -\psi_{x}$:
$$ \dfrac{1}{N} \sum\limits_{i=1}^{N} \left( \left| u(t^{i}, x^{i}, y^{i}) - u^{i} \right|^{2} + \left| v(t^{i}, x^{i}, y^{i}) - v^{i} \right|^{2} \right) $$
And let’s rearrange the right-hand side of $(15)$ to the left-hand side and define them as $f$ and $g$, respectively.
$$ \begin{equation} \begin{aligned} f &:= u_{t} + \lambda_{1}(uu_{x} + vu_{y}) + p_{x} - \lambda_{2}(u_{xx} + u_{yy}) \\ g &:= v_{t} + \lambda_{1}(uv_{x} + vv_{y}) + p_{y} - \lambda_{2}(v_{xx} + v_{yy}) \end{aligned}\tag{18} \end{equation} $$
Then the values of $f, g$ are expressed with $\psi$ as follows. (Note that $p$ will also be approximated by a neural network)
$$ \begin{align*} f &= \psi_{yt} + \lambda_{1}(\psi_{y} \psi_{yx} - \psi_{x}\psi_{yy}) + p_{x} -\lambda_{2}(\psi_{yxx} + \psi_{yyy}) \\ g &= -\psi_{xt} + \lambda_{1}(-\psi_{y} \psi_{xx} + \psi_{x}\psi_{xy}) + p_{y} + \lambda_{2}(\psi_{xxx} + \psi_{xyy}) \\ \end{align*} $$
Add the information that $f(t^{i}, x^{i}, y^{i}) = 0 = g(t^{i}, x^{i}, y^{i})$ to the loss function, and finally set it as follows:
$$ \begin{aligned} MSE &:= \dfrac{1}{N} \sum\limits_{i=1}^{N} \left( \left| u(t^{i}, x^{i}, y^{i}) - u^{i} \right|^{2} + \left| v(t^{i}, x^{i}, y^{i}) - v^{i} \right|^{2} \right) \\ &\qquad + \dfrac{1}{N} \sum\limits_{i=1}^{N} \left( \left| f(t^{i}, x^{i}, y^{i}) \right|^{2} + \left| g(t^{i}, x^{i}, y^{i}) \right|^{2} \right) \end{aligned} \tag{19} $$
Now let’s define an artificial neural network with $3$ input nodes and $2$ output nodes. Let’s assume its output to be $\begin{bmatrix} \psi (t, x, y) & p(t, x, y) \end{bmatrix}$. Then, the above loss function can be computed.
Experiments were conducted for cases with and without noise in the data, and in both cases, it was reported that $\lambda_{1}, \lambda_{2}$ could be predicted with high accuracy. It was also demonstrated that even if data for the pressure $p$ was not provided, the neural network could accurately approximate the parameters and $p$. The specific experimental settings, results, and how the reference solutions were obtained are detailed in the paper.
5. Conclusions
In this paper, we introduced the physics-informed neural network, a new structure of neural networks that is capable of encoding the physical laws satisfied by given data and can be described by partial differential equations. This result has revealed that deep learning can learn about physical models, which could be applied to various physical simulations.
However, the authors note that the proposed method should not be considered as a replacement for traditional methods of solving partial differential equations, such as the finite element method or spectral methods. In fact, Runge-Kutta methods were utilized in conjunction with PINN in Section 3.2..
The authors also attempted to address questions about the hyperparameters required to implement PINN, such as how deep the neural network should be and how much data is needed. However, they observed that what is effective for one equation might not be effective for another.