How to Use Clustering Packages in Julia
Overview
In Julia, the package for clustering offered is Clustering.jl
1. The algorithms implemented include:
- K-means
- K-medoids
- Affinity Propagation
- Density-based spatial clustering of applications with noise (DBSCAN)
- Markov Clustering Algorithm (MCL)
- Fuzzy C-Means Clustering
- Hierarchical Clustering
- Single Linkage
- Average Linkage
- Complete Linkage
- Ward’s Linkage
Code
DBSCAN
DBSCAN (Density-based spatial clustering of applications with noise) is implemented with the dbscan()
function. If there are $n$ pieces of data in $p$ dimensions, a matrix of size $p \times n$ and a radius should be given as arguments.
julia> points = [iris.PetalLength iris.PetalWidth]'
2×150 adjoint(::Matrix{Float64}) with eltype Float64:
1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 … 5.4 5.6 5.1 5.1 5.9 5.7 5.2 5.0 5.2 5.4 5.1
0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 0.2 2.1 2.4 2.3 1.9 2.3 2.5 2.3 1.9 2.0 2.3 1.8
julia> dbscaned = dbscan(points, 0.5)
DbscanResult(DbscanCluster[DbscanCluster(50, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 41, 42, 43, 44, 45, 46, 47, 48, 49, 50], Int64[]), DbscanCluster(100, [51, 52, 53, 54, 55, 56, 57, 58, 59, 60 … 141, 142, 143, 144, 145, 146, 147, 148, 149, 150], Int64[])], [1, 51], [50, 100], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
julia> dbscaned |> propertynames
(:clusters, :seeds, :counts, :assignments)
The result of DBSCAN is returned as a structure called DbscanResult
. .assignments
and .cluster
are important for us.
How each data point belongs to each cluster can be obtained through the getproperty()
function as follows.
julia> getproperty.(dbscaned.clusters, :core_indices)
2-element Vector{Vector{Int64}}:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60 … 141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
Which cluster each data point belongs to can be known through the .assignments
property as follows.
julia> dbscaned.assignments
150-element Vector{Int64}:
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
⋮
2
2
2
2
2
2
2
2
2
2
2
2
2
2
As a tip for visualization, since clusters are assigned arbitrary integers, putting *.assignments
directly into the color
option when drawing a scatter plot assigns colors corresponding to each cluster as follows.
scatter(iris.PetalLength, iris.PetalWidth,
xlabel = "PetalLength", ylabel = "PetalWidth",
color = dbscaned.assignments)
It can be confirmed that the clustering has been performed well.
Full Code
using Clustering
using RDatasets
iris = dataset("datasets", "iris")
scatter(iris.PetalLength, iris.PetalWidth, xlabel = "PetalLength", ylabel = "PetalWidth")
png("iris")
points = [iris.PetalLength iris.PetalWidth]'
dbscaned = dbscan(points, 0.5)
dbscaned |> propertynames
getproperty.(dbscaned.clusters, :core_indices)
dbscaned.assignments
scatter(iris.PetalLength, iris.PetalWidth,
xlabel = "PetalLength", ylabel = "PetalWidth",
color = dbscaned.assignments)
Environment
- OS: Windows
- julia: v1.9.0
- Clustering v0.15.4