logo

How to Use Clustering Packages in Julia 📂Julia

How to Use Clustering Packages in Julia

Overview

In Julia, the package for clustering offered is Clustering.jl1. The algorithms implemented include:

  • K-means
  • K-medoids
  • Affinity Propagation
  • Density-based spatial clustering of applications with noise (DBSCAN)
  • Markov Clustering Algorithm (MCL)
  • Fuzzy C-Means Clustering
  • Hierarchical Clustering
    • Single Linkage
    • Average Linkage
    • Complete Linkage
    • Ward’s Linkage

Code

DBSCAN

iris.png

DBSCAN (Density-based spatial clustering of applications with noise) is implemented with the dbscan() function. If there are $n$ pieces of data in $p$ dimensions, a matrix of size $p \times n$ and a radius should be given as arguments.

julia> points = [iris.PetalLength iris.PetalWidth]'
2×150 adjoint(::Matrix{Float64}) with eltype Float64:
 1.4  1.4  1.3  1.5  1.4  1.7  1.4  1.5  1.4  1.5  1.5  …  5.4  5.6  5.1  5.1  5.9  5.7  5.2  5.0  5.2  5.4  5.1      
 0.2  0.2  0.2  0.2  0.2  0.4  0.3  0.2  0.2  0.1  0.2     2.1  2.4  2.3  1.9  2.3  2.5  2.3  1.9  2.0  2.3  1.8      

julia> dbscaned = dbscan(points, 0.5)
DbscanResult(DbscanCluster[DbscanCluster(50, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  41, 42, 43, 44, 45, 46, 47, 48, 49, 50], Int64[]), DbscanCluster(100, [51, 52, 53, 54, 55, 56, 57, 58, 59, 60  …  141, 142, 143, 144, 145, 146, 147, 148, 149, 150], Int64[])], [1, 51], [50, 100], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

julia> dbscaned |> propertynames
(:clusters, :seeds, :counts, :assignments)

The result of DBSCAN is returned as a structure called DbscanResult. .assignments and .cluster are important for us.

How each data point belongs to each cluster can be obtained through the getproperty() function as follows.

julia> getproperty.(dbscaned.clusters, :core_indices)
2-element Vector{Vector{Int64}}:
 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60  …  141, 142, 143, 144, 145, 146, 147, 148, 149, 150]

Which cluster each data point belongs to can be known through the .assignments property as follows.

julia> dbscaned.assignments
150-element Vector{Int64}:
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 ⋮
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2

As a tip for visualization, since clusters are assigned arbitrary integers, putting *.assignments directly into the color option when drawing a scatter plot assigns colors corresponding to each cluster as follows.

scatter(iris.PetalLength, iris.PetalWidth,
xlabel = "PetalLength", ylabel = "PetalWidth",
color = dbscaned.assignments)

iris_dbscan.png

It can be confirmed that the clustering has been performed well.

Full Code

using Clustering
using RDatasets
iris = dataset("datasets", "iris")

scatter(iris.PetalLength, iris.PetalWidth, xlabel = "PetalLength", ylabel = "PetalWidth")
png("iris")

points = [iris.PetalLength iris.PetalWidth]'
dbscaned = dbscan(points, 0.5)
dbscaned |> propertynames

getproperty.(dbscaned.clusters, :core_indices)
dbscaned.assignments

scatter(iris.PetalLength, iris.PetalWidth,
xlabel = "PetalLength", ylabel = "PetalWidth",
color = dbscaned.assignments)

Environment

  • OS: Windows
  • julia: v1.9.0
  • Clustering v0.15.4