Implementing MLP in Julia Flux and Learning with MNIST
Loading the MNIST Dataset
In older examples, you might see code using Flux.Data
, but this is no longer supported in Flux.
julia> Flux.Data.MNIST.images()
┌ Warning: Flux's datasets are deprecated, please use the package MLDatasets.jl
The official documentation1 advises using the `MLDatasets.jl’ package.
julia> using Flux
julia> using MLDatasets
julia> imgs = MLDatasets.MNIST.traintensor()
28×28×60000 reinterpret(FixedPointNumbers.N0f8, ::Array{UInt8, 3})
julia> labs = MLDatasets.MNIST.trainlabels()
60000-element Vector{Int64}
We can’t use the images and labels as they are. The images need their datatype changed to 32-bit floating-point, and the labels need to be one-hot encoded.
julia> X = float.(imgs)
28×28×60000 Array{Float32, 3}
julia> Y = Flux.onehotbatch(labs, 0:9)
10×60000 OneHotMatrix(::Vector{UInt32}) with eltype Bool
Now, let’s bundle the data and labels together to create a training set. There are two ways to do this:
repeated((X, Y), n)
julia> using Base.Iterators: repeated
julia> Train_Data = repeated((X,Y), 1)
This repeats (X, Y)
n times. So, setting n=10
and creating a dataset for training would be equivalent to training for 10 epochs.
Flux.DataLoader((X, Y); batchsize=1, shuffle=false, partial=true, rng=GLOBAL_RNG)
julia> Train_Data = Flux.DataLoader((X,Y), batchsize=60000)
Given that we’re training on 60,000 data points at once, the batch size is set to 60,000.
Implementing MLP
An MLP can be implemented using the Chain()
function. Since we did not flatten our data $X$, we need to add Flux.flatten
as the first layer.
julia> MLP = Chain(Flux.flatten,
Dense(28*28, 28, relu),
Dense(28, 10),
softmax)
Chain(
Flux.flatten,
Dense(784, 28, relu), # 21_980 parameters
Dense(28, 10), # 290 parameters
NNlib.softmax,
) # Total: 4 arrays, 22_270 parameters, 87.242 KiB.
For reference, if you want to apply flatten()
to the data, you can keep the rest the same and modify X
and Chain()
as follows. However, there is not much difference in speed.
X = flatten(float.(imgs))
julia> MLP = Chain(Dense(28*28, 28, relu),
Dense(28, 10),
softmax)
Defining the Loss Function and Optimizer
Define the loss function as cross-entropy and the optimizer as ADAM.
julia> LOSS(x,y) = Flux.crossentropy(MLP(x), y)
LOSS (generic function with 1 method)
julia> LOSS(X,Y)
2.3614984f0
julia> opt = ADAM()
ADAM(0.001, (0.9, 0.999), IdDict{Any, Any}())
Training
Use the train!()
method to train the model.
julia> Flux.train!(LOSS, params(MLP), Train_Data, opt)
Printing the Training Process2
But this won’t let us know about the training process. Setting a callback like the following will print the loss once every second.
julia> Flux.train!(LOSS, params(MLP), Train_Data, opt, cb = Flux.throttle(() -> @show(LOSS(X, Y)), 1))
LOSS(X, Y) = 2.2364092f0
Setting Epochs
Use the @epochs
macro to set epochs.
julia> using Flux: @epochs
julia> @epochs 5 Flux.train!(LOSS, params(MLP), Train_Data, opt, cb = Flux.throttle(() -> @show(LOSS(X, Y)), 1))
[ Info: Epoch 1
LOSS(X, Y) = 2.1822426f0
[ Info: Epoch 2
LOSS(X, Y) = 2.13155f0
[ Info: Epoch 3
LOSS(X, Y) = 2.0831218f0
[ Info: Epoch 4
LOSS(X, Y) = 2.0359747f0
[ Info: Epoch 5
LOSS(X, Y) = 1.9894044f0
Measuring Training Time
Use the @time
macro to measure training time.
julia> @epochs 5 @time Flux.train!(LOSS, params(MLP), Train_Data, opt, cb = Flux.throttle(() -> @show(LOSS(X, Y)), 1))
[ Info: Epoch 1
LOSS(X, Y) = 1.9431362f0
0.366819 seconds (238.95 k allocations: 269.987 MiB, 33.97% compilation time)
[ Info: Epoch 2
LOSS(X, Y) = 1.8971274f0
0.300713 seconds (418 allocations: 257.373 MiB, 18.77% gc time)
[ Info: Epoch 3
LOSS(X, Y) = 1.8515143f0
0.247089 seconds (418 allocations: 257.373 MiB)
[ Info: Epoch 4
LOSS(X, Y) = 1.8064859f0
0.284937 seconds (418 allocations: 257.373 MiB, 13.61% gc time)
[ Info: Epoch 5
LOSS(X, Y) = 1.7620988f0
0.258424 seconds (418 allocations: 257.373 MiB)
Full Code
using Flux
using MLDatasets
using Base.Iterators: repeated
using Flux: @epochs
# Loading MNIST data set
imgs = MLDatasets.MNIST.traintensor()
labs = MLDatasets.MNIST.trainlabels()
# Convert to usable
X = float.(imgs)
Y = Flux.onehotbatch(labs, 0:9)
# Making training set
Train_Data = Flux.DataLoader((X,Y), batchsize=60000)
# or Train_Data = repeated((X,Y), 1)
# Define MLP, loss function, optimizer
MLP = Chain(Flux.flatten,
Dense(28*28, 28, relu),
Dense(28, 10),
softmax)
LOSS(x,y) = Flux.crossentropy(MLP(x), y)
opt = ADAM()
# Training
@epochs 5 @time Flux.train!(LOSS, params(MLP), Train_Data, opt, cb = Flux.throttle(() -> @show(LOSS(X, Y)), 5))
Environment
- OS: Windows10
- Version: Julia 1.7.1, Flux 0.12.8, MLDatasets 0.5.14