Paper Review: DeepONet
Overview and Summary
- Follow the references, equation numbers, and notation in the paper as closely as possible.
For accessibility, this review is based on the version available on arXiv rather than the journal published version. Although the problems covered in the experimental section differ slightly, the core focus is not on the experimental results and performance but on the explanation of the DeepONet method itself.
DeepONet is a deep learning technique proposed for learning operators. An operator is a function that maps functions to functions (explained in detail in the main text). Specifically, for a function , an operator is defined as follows.
Here, is also a function and is a function as well. The first key point is that “DeepONet learns operators,” and the second point is that it “approximates as a series.” Given an appropriate function space , let’s call its basis . Then can be expressed as follows.
DeepONet learns and , where the part learning the coefficients is called the branch network and the part learning the basis is called the trunk network.
Implementation
- Implementing with PyTorch
- Implementing with Julia
1 Introduction
The universal approximation theorem guarantees that neural networks can approximate arbitrary continuous functions. This provides a theoretical basis for the effective functioning of artificial neural networks and deep learning techniques, which have been successful across various fields. Even more surprisingly, artificial neural networks can approximate all nonlinear functionals and (nonlinear) operators.
For readers unfamiliar with mathematics, let’s briefly explain functions, functionals, and operators. These are fundamentally functions (which map a single element in the domain to exactly one element in the codomain). However, in contexts where the terms functional and operator are used, they have a slightly special meaning. Typically, a function means mapping numbers (or vectors) to numbers (or vectors). Polynomial functions, trigonometric functions, and other commonly dealt functions fall under this context.
A function that maps functions to numbers (scalars) is specifically called a functional. A concrete example is the definite integral. If we define a functional as , for each given, this functional maps the area under the curve of over the interval . If we consider as an appropriate function space, a functional can be expressed as follows.
Operators map functions to functions. Examples include indefinite integrals and derivatives.
For a function , defining an operator as makes it a differential operator that maps a given function to its derivative. Defining an operator as maps a given function to its indefinite integral.
Now, the terms function, functional, and operator used below have the meanings given in the explanation above. Before delving into our main discussion, let’s introduce the notation used throughout the paper. represents an operator where the variable is the function .
Since is an operator, its function value is also a function, and its variable is denoted as .
Therefore, both and are real numbers.
The goal of this paper is to learn operators, and for that, we consider a neural network that takes both and as inputs and outputs .
Theoretically, operator takes the function itself as a variable, but for computer simulations, discretization is necessary, and a finite number of function values , , , and are used as inputs to the neural network. These are referred to as sensors in the paper. Thus, the proposed neural network has the following structure (Fig. 1A).
Figure 1A
Theorem 1 (Universal Approximation Theorem for Operator) Let be a non-polynomial function. Let be a Banach space, and , be compact sets. Let be a compact set, and be a nonlinear continuous operator.
Then, for any , there exist positive integers , , , and constants , , , , , (, , ) such that the following holds.
In this paper, the above approximation is divided into two parts, called branch and trunk.
Although the approximation theorem suggests that neural networks could learn nonlinear operators, it does not suggest how to effectively train them. Even though the universal approximation theorem implies that any [MLP] should be able to approximate any continuous function, CNNs or other neural network architectures perform better on image-related tasks. A useful network should be easy to train and have good generalization performance. The authors aim to propose a new methodology that makes this possible.
To demonstrate that the proposed method is suitable for learning nonlinear operators, they impose very weak constraints on the data. Specifically, the input data must share the same sensors. However, these sensors don’t necessarily need to be on a uniform grid, and there are no constraints on variable . This condition is well illustrated in Fig. 1B.
Figure 1B
The authors name the proposed architecture DeepONet (Deep Operator Network), which is composed of a branch net for the input function () and a trunk net for the output function variable (). Details are elaborated in Section 2.
The paper considers two types of operators represented by ordinary differential equations (ODE) and partial differential equations (PDE).
2 Methodology
2.1 Deep operator networks (DeepONets)
The authors focus on operator learning in general situations and impose the constraint that input functions () must share the same sensors. The inputs to the proposed neural network are divided into two parts, as seen in Fig. 1A: and . There are no restrictions on the network architecture; the paper uses basic [fully-connected neural networks] (FNNs) to showcase performance capabilities. It is noted that [CNN], RNN architectures, or attention mechanisms could be integrated if desired.
Initially, the trunk network takes as input and outputs . Each of the branch networks takes as input and outputs each (). These are combined as in Equation as follows.
It is notable to mention that the activation function is applied even in the last layer of the trunk net. Although not explicitly evident in these equations, this approach can be viewed as approximating the function as a series. Given an appropriate function space with basis , it can be expressed as follows.
In other words, interpreting as being the basis and the coefficient part of the series, DeepONet approximates by decomposing it into a series instead of approximating it directly. Although Theorem 1 does not require this, adding a bias (constant term) as shown improves generalization performance.
In practice, should be at least 10 or more; a larger increases computational cost. Therefore, the paper introduces Stacked DeepONet, which employs separate branch networks for each (Fig. 1C), and Unstacked DeepONet, where a single network learns all (Fig. 1D). All codes related to DeepONet can be found at https://github.com/lululxvi/deepxde, although it can be challenging to locate specific components among the author’s other works featured there.
Figure 1C and 1D
2.2 Data generation
The paper discusses two function spaces: Gaussian random field (GRF) and [orthogonal polynomial space]. The authors used a GRF with a mean of .
Here, is the covariance kernel. As an orthogonal polynomial space, the Chebyshev polynomials are chosen. Let be and be the first kind of Chebyshev polynomial.
The dataset was generated by random sampling of . For each generated dataset, the Runge-Kutta method solved the ODE systems and the finite difference method was used to find reference solutions for second-order PDEs.
3 Number of sensors for identifying nonlinear dynamic systems
In this section, the need to discuss the number of sensors required to achieve arbitrary accuracy in solving nonlinear dynamic systems using DeepONet is highlighted.
4 Simulation results
In this section, it is first confirmed that DeepONet provides better performance than FNN even for the simplest linear problems, followed by results for three nonlinear ODE and PDE problems. For all problems, the optimizer used is Adam with a learning rate , and unless explicitly mentioned, the network sizes are as shown in the table below.
Table 1 and 2
4.1 A simple 1D dynamic system
The one-dimensional dynamic system is expressed as follows.
The goal is to find the solution for any given .
4.1.1 Linear case:
First, let’s consider a very simple case.
In this case, the operator is the following indefinite integral operator.
To compare, FNN was trained to learn by adjusting depth and width. Increasing the depth doesn’t significantly affect performance, but increasing the width reduces training error; however, generalization performance (test error) doesn’t improve (Fig. 2).
Figure 2
In contrast, DeepONet shows little difference between training and test errors (Fig. 3A). Performance slightly improves by adding a bias . Moreover, Unstacked DeepONet, though having larger training errors, has lower test errors, which are more important. Unstacked DeepONet is also faster and uses considerably less memory due to fewer parameters.
Figure 3
4.1.2 Nonlinear case:
In this case, the focus moves more to comparing Unstacked DeepONet and Stacked DeepONet. By observing the correlation between training and test errors, it’s clear that Unstacked DeepONet is stronger (Fig. 4A). It displayed even stronger correlations when tested with different learning rates and initial values (Fig. 4B).
Figure 4
4.2 Gravity pendulum with an external force
This subsection addresses the pendulum motion with an external force as follows.
The following content demonstrates how well DeepONet works for this problem, discussing the number of sensors, error convergence, etc.
4.3 Diffusion-reaction system with a source term
The following diffusion-reaction equation is addressed.
Unlike previous examples, has a 1D variable, whereas has a 2D variable. It is shown that DeepONet also works well here. The training data for one is as follows.
is the input for DeepONet, and is the final output. Concretely, is the input for the branch, and is for the trunk. Such structured data is generated and used for training for each different .
5 Conclusion
This paper proposes DeepONet, a method for learning nonlinear operators. DeepONet is composed of branches, which learn coefficients, and trunks, which learn the basis. The body of the paper analyzes various factors affecting test errors (e.g., number of sensors, maximum prediction time, complexity of the input function space, size of the training dataset, and network size). It theoretically derives how the approximation error is impacted by various factors and shows that results align with calculations.
However, there is still much to research regarding a theoretical analysis of DeepONet itself. While the paper only uses FNN, potential explorations could involve connections with CNNs or attention mechanisms, as well as other neural network architectures and techniques.