Iris Dataset
The Iris dataset refers to a dataset about the observation records of iris flowers, created by the American botanist, Edgar Anderson, and introduced by the British statistician, Ronald Fisher2.
It is the most commonly used dataset in machine learning and data analysis practice.3
It consists of data from observing 50 flowers each of the three species of iris: setosa, versicolor, and virginica. It measures each flower’s petal length, petal width, sepal length, and sepal width, making it information on 150 flowers’ species, petal length, petal width, sepal length, and sepal width, formatted as a 150-row and 5-column data frame.
How to Use
In Julia, it can be used with the machine learning datasets package, MLDatasets.jl
. Installation is required when using it for the first time. Additionally, as loading it as a data frame is the default setting, DataFrames.jl
is also needed.
julia> using MLDatasets
julia> using DataFrames
julia> X = Iris()
dataset Iris:
metadata => Dict{String, Any} with 4 entries
features => 150×4 DataFrame
targets => 150×1 DataFrame
dataframe => 150×5 DataFrame
julia> X[:]
(features = 150×4 DataFrame
Row │ sepallength sepalwidth petallength petalwidth
│ Float64 Float64 Float64 Float64
1 │ 5.1 3.5 1.4 0.2
2 │ 4.9 3.0 1.4 0.2
3 │ 4.7 3.2 1.3 0.2
⋮ │ ⋮ ⋮ ⋮ ⋮
149 │ 6.2 3.4 5.4 2.3
150 │ 5.9 3.0 5.1 1.8
145 rows omitted, targets = 150×1 DataFrame
Row │ class
│ String15
1 │ Iris-setosa
2 │ Iris-setosa
3 │ Iris-setosa
⋮ │ ⋮
149 │ Iris-virginica
150 │ Iris-virginica
145 rows omitted)
Setting the option to as_df=false
allows getting the data not as a data frame but as a tuple.
julia> X = Iris(as_df=false)[:]
(features = [5.1 4.9 … 6.2 5.9; 3.5 3.0 … 3.4 3.0; 1.4 1.4 … 5.4 5.1; 0.2 0.2 … 2.3 1.8], targets = InlineStrings.String15[InlineStrings.String15("Iris-setosa") InlineStrings.String15("Iris-setosa") … InlineStrings.String15("Iris-virginica") InlineStrings.String15("Iris-virginica")])
julia> typeof(X)
NamedTuple{(:features, :targets), Tuple{Matrix{Float64}, Matrix{InlineStrings.String15}}}
, Iris().targets
, Iris().dataframes
can be used to obtain the features, classes, and a data frame combining these two, respectively.
julia> Iris().dataframe
150×5 DataFrame
Row │ sepallength sepalwidth petallength petalwidth class
│ Float64 Float64 Float64 Float64 String15
1 │ 5.1 3.5 1.4 0.2 Iris-setosa
2 │ 4.9 3.0 1.4 0.2 Iris-setosa
3 │ 4.7 3.2 1.3 0.2 Iris-setosa
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
149 │ 6.2 3.4 5.4 2.3 Iris-virginica
150 │ 5.9 3.0 5.1 1.8 Iris-virginica
145 rows omitted
- OS: Windows11
- Version: Julia v1.8.2, MLDatasets v0.7.6, DataFrames v1.3.6