Iris Dataset
Overview1
The Iris dataset refers to a dataset about the observation records of iris flowers, created by the American botanist, Edgar Anderson, and introduced by the British statistician, Ronald Fisher2.
Description

It is the most commonly used dataset in machine learning and data analysis practice.3
It consists of data from observing 50 flowers each of the three species of iris: setosa, versicolor, and virginica. It measures each flower’s petal length, petal width, sepal length, and sepal width, making it information on 150 flowers’ species, petal length, petal width, sepal length, and sepal width, formatted as a 150-row and 5-column data frame.
How to Use
Julia
In Julia, it can be used with the machine learning datasets package, MLDatasets.jl. Installation is required when using it for the first time. Additionally, as loading it as a data frame is the default setting, DataFrames.jl is also needed.
julia> using MLDatasets
julia> using DataFrames
julia> X = Iris()
dataset Iris:
  metadata   =>    Dict{String, Any} with 4 entries
  features   =>    150×4 DataFrame
  targets    =>    150×1 DataFrame
  dataframe  =>    150×5 DataFrame
julia> X[:]
(features = 150×4 DataFrame
 Row │ sepallength  sepalwidth  petallength  petalwidth
     │ Float64      Float64     Float64      Float64
─────┼──────────────────────────────────────────────────
   1 │         5.1         3.5          1.4         0.2
   2 │         4.9         3.0          1.4         0.2
   3 │         4.7         3.2          1.3         0.2
  ⋮  │      ⋮           ⋮            ⋮           ⋮
 149 │         6.2         3.4          5.4         2.3
 150 │         5.9         3.0          5.1         1.8
                                        145 rows omitted, targets = 150×1 DataFrame
 Row │ class
     │ String15
─────┼────────────────
   1 │ Iris-setosa
   2 │ Iris-setosa
   3 │ Iris-setosa
  ⋮  │       ⋮
 149 │ Iris-virginica
 150 │ Iris-virginica
      145 rows omitted)
Setting the option to as_df=false allows getting the data not as a data frame but as a tuple.
julia> X = Iris(as_df=false)[:]
(features = [5.1 4.9 … 6.2 5.9; 3.5 3.0 … 3.4 3.0; 1.4 1.4 … 5.4 5.1; 0.2 0.2 … 2.3 1.8], targets = InlineStrings.String15[InlineStrings.String15("Iris-setosa") InlineStrings.String15("Iris-setosa") … InlineStrings.String15("Iris-virginica") InlineStrings.String15("Iris-virginica")])
julia> typeof(X)
NamedTuple{(:features, :targets), Tuple{Matrix{Float64}, Matrix{InlineStrings.String15}}}
Iris().features, Iris().targets, Iris().dataframes can be used to obtain the features, classes, and a data frame combining these two, respectively.
julia> Iris().dataframe
150×5 DataFrame
 Row │ sepallength  sepalwidth  petallength  petalwidth  class
     │ Float64      Float64     Float64      Float64     String15
─────┼──────────────────────────────────────────────────────────────────
   1 │         5.1         3.5          1.4         0.2  Iris-setosa
   2 │         4.9         3.0          1.4         0.2  Iris-setosa
   3 │         4.7         3.2          1.3         0.2  Iris-setosa
  ⋮  │      ⋮           ⋮            ⋮           ⋮             ⋮
 149 │         6.2         3.4          5.4         2.3  Iris-virginica
 150 │         5.9         3.0          5.1         1.8  Iris-virginica
                                                        145 rows omitted
Environment
- OS: Windows11
 - Version: Julia v1.8.2, MLDatasets v0.7.6, DataFrames v1.3.6
 
