Using Machine Learning Datasets in Julia
Description
The MLDatasets.jl
1 2 package allows for the use of the following datasets. Datasets with links have their usage explained in their respective documents.
Vision
- CIFAR10
- CIFAR100
- EMNIST
- FashionMNIST
- MNIST
- Omniglot
- SVHN2
- convert2image
Mesh
- FAUST
Miscellaneous
- BostonHousing
- Iris
- Mutagenesis
- Titanic
Text
- PTBLM
- SMSSpamCollection
- UD_English
Graphs
- CiteSeer
- Cora
- Graph
- HeteroGraph
- KarateClub
- MovieLens
- OGBDataset
- OrganicMaterialsDB
- PolBlogs
- PubMed
- TUDataset
For one-hot encoding this data or training methods, refer to the following.
Example
CIFAR10
julia> Train_X2, Train_Y2 = CIFAR10.traindata()
julia> size(Train_X2)
(32, 32, 3, 50000)
julia>typeof(Train_X2)
Base.ReinterpretArray{N0f8, 4, UInt8, Array{UInt8, 4}, false}
julia> size(Train_Y2)
(50000,)
julia>typeof(Train_Y2)
Vector{Int64} (alias for Array{Int64, 1})
julia> for i in 1:7
save("CIFAR10_$i.png", colorview(RGB, CIFAR10.convert2image(CIFAR10.traintensor(i))))
end
Picking the first 7 pictures would look like this.
Environment
- OS: Windows11
- Version: Julia v1.8.2, MLDatasets v0.7.6