How to Define and Train MLP with the Sequence Model and Functional API in TensorFlow and Keras 📂Machine Learning

How to Define and Train MLP with the Sequence Model and Functional API in TensorFlow and Keras

Overview

In TensorFlow, neural networks can be easily defined using Keras. Below, we introduce how to define and train a simple MLP using Sequential() and the functional API. However, Sequential() is only easy for defining models and can be challenging to use for designing complex structures. Similarly, if you plan to design complex structures using the functional API, it’s better to use the keras.Model class, and for even more complex and customizable designs, implementing at a lower level without Keras might be preferable. Depending on the deep learning task, these methods might not be the primary choice, especially for researchers in STEM fields looking to integrate deep learning into their domain. These methods are more about getting a feel for ’this is how it’s used’ when first learning and practicing deep learning.

Sequential Model

Model Definition

Let’s define an MLP with input and output dimensions of 1 to approximate the sine function $\sin : \mathbb{R} \to \mathbb{R}$ as follows.

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# model define
model = Sequential([Dense(10, input_dim = 1, activation = "relu"),
                    Dense(10, input_dim = 10, activation = "relu"),
                    Dense(1, input_dim = 10)])
model.summary() # output↓
# Model: "sequential_3"
# _________________________________________________________________
# Layer (type)                Output Shape              Param #   
# =================================================================
# dense_9 (Dense)             (None, 10)                20        
#                                                                 
# dense_10 (Dense)            (None, 10)                110       
#                                                                 
# dense_11 (Dense)            (None, 1)                 11        
#                                                                 
# =================================================================
# Total params: 141
# Trainable params: 141
# Non-trainable params: 0
# _________________________________________________________________

One feature of keras.layers.Dense() is that it’s not necessary to specify the input dimensions. The reason for this allowance is unclear, but for readability (especially in code that others might see), it’s better to explicitly state the input dimensions. This results in a characteristic where the output dimensions are on the left and input dimensions on the right. To read the structure of the model, one must read from right to left, which is not the standard in most languages. If we consider a linear layer as a matrix in terms of linear transformation, then it’s natural for the input to be on the right and the output on the left as in $\mathbf{y} = A\mathbf{x}$ . However, TensorFlow wasn’t necessarily designed with such mathematical precision in mind. Even in Julia, known for its mathematical rigor, linear layers are implemented as Dense(in, out), which is naturally read from left to right. After all, it’s more comfortable and easier to understand. Moreover, the notation of a function $f$ from $X$ to $Y$ is $f : X \to Y$ , and there’s no function anywhere (apart from Keras) that is described as mapping from right to left.

Data Generation

Since we are training a sine function, if we take the function values as data and compare the graph of the sine function with the model’s output, it will look like this:

# generating data
from math import pi

x = tf.linspace(0., 2*pi, num=1000)    # 入力データ
y = tf.sin(x)                          # 出力データ(label)

# check output of model
import matplotlib.pyplot as plt

plt.plot(x, model(x), label="model")
plt.plot(x, y, label="sin")
plt.legend()
plt.show()

Training and Results

from tensorflow.keras.optimizers import Adam

model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')

model.compile(optimizer, loss, metric)

The .compile() method specifies the optimizer and loss function. Another key option is metric, which is a function for evaluating the model. It can be the same as or different from the loss. For example, if training an MLP on the MNIST dataset, the loss might be the MSE between the output and the label, while the metric could be the ratio of correctly predicted data out of the total.

> model.fit(x, y, epochs=10000, batch_size=1000, verbose='auto')
.
.
.
Epoch 9998/10000
1/1 [==============================] - 0s 8ms/step - loss: 6.2260e-06
Epoch 9999/10000
1/1 [==============================] - 0s 4ms/step - loss: 6.2394e-06
Epoch 10000/10000
1/1 [==============================] - 0s 3ms/step - loss: 6.2385e-06

The .fit() method takes inputs, labels, epochs, batch sizes, etc., and executes the training. verbose determines how the training progress is output. There are options 0, 1, 2, where 0 outputs nothing. The others output in the following format:

# verbose=1
Epoch (current epoch)/(total epochs)
(current batch)/(total batchs) [==============================] - 0s 8ms/step - loss: 0.7884

# verbose=2
Epoch (current epoch)/(total epochs)
(current batch)/(total batchs) - 0s - loss: 0.7335 - 16ms/epoch - 8ms/step

After training, comparing the sine function with the model’s function values shows that the training was successful.

Functional API

This method directly connects layers using the Input() and Model() functions. For simple models like MLPs, defining them using the Sequential model above is much more convenient. The method to define the same structure as the neural network defined in the Sequential model above is as follows:

from tensorflow.keras import Model
from tensorflow.keras.layers import Input, Dense

input = Input(shape=(10)) # "dim of output = dim of input in 1st layer"
dense1 = Dense(10, activation = "relu")(input)
dense2 = Dense(10, activation = "relu")(dense1)
output = Dense(1)(dense2)

model = Model(inputs=input, outputs=output)
model.summary() # output↓
# Model: "model_10"
# _________________________________________________________________
#  Layer (type)                Output Shape              Param #
# =================================================================
#  input_13 (InputLayer)       [(None, 1)]               0
# 
#  dense_19 (Dense)            (None, 10)                20
# 
#  dense_20 (Dense)            (None, 10)                110
# 
#  dense_21 (Dense)            (None, 1)                 11
# 
# =================================================================
# Total params: 141
# Trainable params: 141
# Non-trainable params: 0
# _________________________________________________________________

Input is a function for defining the input layer. Strictly speaking, it’s not a layer but a tensor, but this is a minor detail. A confusing point is that the output dimension should be input as a variable. In other words, the input dimension of the first layer should be input. After defining this, connect each layer directly and explicitly as input to the Dense function. Finally, input the input and output as arguments to the Model function to define the model.

The subsequent process of compiling the model with the .compile() method and training it with the .fit() method is the same as introduced above.

Environment

OS: Windows11
Version: Python 3.9.13, tensorflow==2.12.0, keras==2.12.0