Implementing MLP in Julia Flux and Learning with MNIST 📂Machine Learning

Implementing MLP in Julia Flux and Learning with MNIST

Loading the MNIST Dataset

In older examples, you might see code using Flux.Data, but this is no longer supported in Flux.

julia> Flux.Data.MNIST.images()
┌ Warning: Flux's datasets are deprecated, please use the package MLDatasets.jl

The official documentation¹ advises using the `MLDatasets.jl’ package.

julia> using Flux

julia> using MLDatasets

julia> imgs = MLDatasets.MNIST.traintensor()
28×28×60000 reinterpret(FixedPointNumbers.N0f8, ::Array{UInt8, 3})

julia> labs = MLDatasets.MNIST.trainlabels()
60000-element Vector{Int64}

We can’t use the images and labels as they are. The images need their datatype changed to 32-bit floating-point, and the labels need to be one-hot encoded.

julia> X = float.(imgs)
28×28×60000 Array{Float32, 3}

julia> Y = Flux.onehotbatch(labs, 0:9)
10×60000 OneHotMatrix(::Vector{UInt32}) with eltype Bool

Now, let’s bundle the data and labels together to create a training set. There are two ways to do this:

repeated((X, Y), n)

julia> using Base.Iterators: repeated

julia> Train_Data = repeated((X,Y), 1)

This repeats (X, Y) n times. So, setting n=10 and creating a dataset for training would be equivalent to training for 10 epochs.

Flux.DataLoader((X, Y); batchsize=1, shuffle=false, partial=true, rng=GLOBAL_RNG)

julia> Train_Data = Flux.DataLoader((X,Y), batchsize=60000)

Given that we’re training on 60,000 data points at once, the batch size is set to 60,000.

Implementing MLP

An MLP can be implemented using the Chain() function. Since we did not flatten our data $X$ , we need to add Flux.flatten as the first layer.

julia> MLP = Chain(Flux.flatten,
                   Dense(28*28, 28, relu),
                   Dense(28, 10),
                   softmax)
Chain(
  Flux.flatten,
  Dense(784, 28, relu),                 # 21_980 parameters
  Dense(28, 10),                        # 290 parameters
  NNlib.softmax,
)                   # Total: 4 arrays, 22_270 parameters, 87.242 KiB.

For reference, if you want to apply flatten() to the data, you can keep the rest the same and modify X and Chain() as follows. However, there is not much difference in speed.

X = flatten(float.(imgs))

julia> MLP = Chain(Dense(28*28, 28, relu),
                   Dense(28, 10),
                   softmax)

Defining the Loss Function and Optimizer

Define the loss function as cross-entropy and the optimizer as ADAM.

julia> LOSS(x,y) = Flux.crossentropy(MLP(x), y)
LOSS (generic function with 1 method)

julia> LOSS(X,Y)
2.3614984f0

julia> opt = ADAM()
ADAM(0.001, (0.9, 0.999), IdDict{Any, Any}())

Training

Use the train!() method to train the model.

julia> Flux.train!(LOSS, params(MLP), Train_Data, opt)

Printing the Training Process²

But this won’t let us know about the training process. Setting a callback like the following will print the loss once every second.

julia> Flux.train!(LOSS, params(MLP), Train_Data, opt, cb = Flux.throttle(() -> @show(LOSS(X, Y)), 1))
LOSS(X, Y) = 2.2364092f0

Setting Epochs

Use the @epochs macro to set epochs.

julia> using Flux: @epochs

julia>  @epochs 5 Flux.train!(LOSS, params(MLP), Train_Data, opt, cb = Flux.throttle(() -> @show(LOSS(X, Y)), 1))
[ Info: Epoch 1
LOSS(X, Y) = 2.1822426f0
[ Info: Epoch 2
LOSS(X, Y) = 2.13155f0
[ Info: Epoch 3
LOSS(X, Y) = 2.0831218f0
[ Info: Epoch 4
LOSS(X, Y) = 2.0359747f0
[ Info: Epoch 5
LOSS(X, Y) = 1.9894044f0

Measuring Training Time

Use the @time macro to measure training time.

julia> @epochs 5 @time Flux.train!(LOSS, params(MLP), Train_Data, opt, cb = Flux.throttle(() -> @show(LOSS(X, Y)), 1))
[ Info: Epoch 1
LOSS(X, Y) = 1.9431362f0
  0.366819 seconds (238.95 k allocations: 269.987 MiB, 33.97% compilation time)
[ Info: Epoch 2
LOSS(X, Y) = 1.8971274f0
  0.300713 seconds (418 allocations: 257.373 MiB, 18.77% gc time)
[ Info: Epoch 3
LOSS(X, Y) = 1.8515143f0
  0.247089 seconds (418 allocations: 257.373 MiB)
[ Info: Epoch 4
LOSS(X, Y) = 1.8064859f0
  0.284937 seconds (418 allocations: 257.373 MiB, 13.61% gc time)
[ Info: Epoch 5
LOSS(X, Y) = 1.7620988f0
  0.258424 seconds (418 allocations: 257.373 MiB)

Full Code

using Flux
using MLDatasets

using Base.Iterators: repeated
using Flux: @epochs

# Loading MNIST data set
imgs = MLDatasets.MNIST.traintensor()
labs = MLDatasets.MNIST.trainlabels()

# Convert to usable
X = float.(imgs)
Y = Flux.onehotbatch(labs, 0:9)

# Making training set
Train_Data = Flux.DataLoader((X,Y), batchsize=60000)
# or Train_Data = repeated((X,Y), 1)

# Define MLP, loss function, optimizer
MLP = Chain(Flux.flatten,
            Dense(28*28, 28, relu),
            Dense(28, 10),
            softmax)

LOSS(x,y) = Flux.crossentropy(MLP(x), y)

opt = ADAM()

# Training
@epochs 5 @time Flux.train!(LOSS, params(MLP), Train_Data, opt, cb = Flux.throttle(() -> @show(LOSS(X, Y)), 5))

Environment

OS: Windows10
Version: Julia 1.7.1, Flux 0.12.8, MLDatasets 0.5.14