How to k-means cluster in Julia
Description
k-means clustering is a clustering algorithm that divides the given $n$ data points into $k$ clusters. In Julia, it can be easily implemented using the Clustering.jl
package.
Code
The following is a code to perform clustering with $k=3$ on the Iris dataset. Since data loaded from RDatasets.jl
are by default data frames, they are converted into arrays, and transposed so that each column becomes a single data point. It converged in 4 iterations.
using Clustering
using RDatasets
X = dataset("datasets", "iris")[:, 1:4]
X = Array(X)'
results = kmeans(X, 3, display=:iter)
# Iters objv objv-change | affected
# -------------------------------------------------------------
# 0 9.002000e+01
# 1 7.934436e+01 -1.067564e+01 | 2
# 2 7.892131e+01 -4.230544e-01 | 2
# 3 7.885567e+01 -6.564390e-02 | 0
# 4 7.885567e+01 0.000000e+00 | 0
The results
returned by kmeans
contains 9 properties. Among them, the centers of each cluster can be accessed with .centers
, the cluster assignment of each data point with .assignments
, and the number of data points in each cluster with .counts
.
julia> propertynames(results)
(:centers, :assignments, :costs, :counts, :wcounts, :totalcost, :iterations, :converged, :cweights)
julia> results.centers
4×3 Matrix{Float64}:
5.006 6.85385 5.88361
3.428 3.07692 2.74098
1.462 5.71538 4.38852
0.246 2.05385 1.43443
julia> results.assignments
150-element Vector{Int64}:
1
1
1
⋮
2
2
3
julia> results.counts
3-element Vector{Int64}:
50
39
61
Visualizing the sepal length and sepal width reveals the following:
Environment
- OS: Windows11
- Version: Julia 1.10.0, Clustering v0.15.7, RDatasets v0.7.7