How to k-means cluster in Julia
Description
k-means clustering is a clustering algorithm that divides the given data points into clusters. In Julia, it can be easily implemented using the Clustering.jl
package.
Code
The following is a code to perform clustering with on the Iris dataset. Since data loaded from RDatasets.jl
are by default data frames, they are converted into arrays, and transposed so that each column becomes a single data point. It converged in 4 iterations.
using Clustering
using RDatasets
X = dataset("datasets", "iris")[:, 1:4]
X = Array(X)'
results = kmeans(X, 3, display=:iter)
# Iters objv objv-change | affected
# -------------------------------------------------------------
# 0 9.002000e+01
# 1 7.934436e+01 -1.067564e+01 | 2
# 2 7.892131e+01 -4.230544e-01 | 2
# 3 7.885567e+01 -6.564390e-02 | 0
# 4 7.885567e+01 0.000000e+00 | 0
The results
returned by kmeans
contains 9 properties. Among them, the centers of each cluster can be accessed with .centers
, the cluster assignment of each data point with .assignments
, and the number of data points in each cluster with .counts
.
julia> propertynames(results)
(:centers, :assignments, :costs, :counts, :wcounts, :totalcost, :iterations, :converged, :cweights)
julia> results.centers
4×3 Matrix{Float64}:
5.006 6.85385 5.88361
3.428 3.07692 2.74098
1.462 5.71538 4.38852
0.246 2.05385 1.43443
julia> results.assignments
150-element Vector{Int64}:
1
1
1
⋮
2
2
3
julia> results.counts
3-element Vector{Int64}:
50
39
61
Visualizing the sepal length and sepal width reveals the following:
Environment
- OS: Windows11
- Version: Julia 1.10.0, Clustering v0.15.7, RDatasets v0.7.7