logo

Using Machine Learning Datasets in Julia 📂Machine Learning

Using Machine Learning Datasets in Julia

Description

The MLDatasets.jl1 2 package allows for the use of the following datasets. Datasets with links have their usage explained in their respective documents.

Vision

  • CIFAR10
  • CIFAR100
  • EMNIST
  • FashionMNIST
  • MNIST
  • Omniglot
  • SVHN2
  • convert2image

Mesh

  • FAUST

Miscellaneous

  • BostonHousing
  • Iris
  • Mutagenesis
  • Titanic

Text

  • PTBLM
  • SMSSpamCollection
  • UD_English

Graphs

  • CiteSeer
  • Cora
  • Graph
  • HeteroGraph
  • KarateClub
  • MovieLens
  • OGBDataset
  • OrganicMaterialsDB
  • PolBlogs
  • PubMed
  • Reddit
  • TUDataset

For one-hot encoding this data or training methods, refer to the following.

Example

CIFAR10

julia> Train_X2, Train_Y2 = CIFAR10.traindata()

julia> size(Train_X2)
(32, 32, 3, 50000)

julia>typeof(Train_X2)
Base.ReinterpretArray{N0f8, 4, UInt8, Array{UInt8, 4}, false}

julia> size(Train_Y2)
(50000,)

julia>typeof(Train_Y2)
Vector{Int64} (alias for Array{Int64, 1})

julia> for i in 1:7
        save("CIFAR10_$i.png", colorview(RGB, CIFAR10.convert2image(CIFAR10.traintensor(i))))
        end

Picking the first 7 pictures would look like this.

Environment

  • OS: Windows11
  • Version: Julia v1.8.2, MLDatasets v0.7.6