logo

Hierarchical Clustering 📂Machine Learning

Hierarchical Clustering

Algorithm

Input

Given data of dimension $p$ and $N$ instances with a distance $d$.

Slide2.png


Step 1.

Consider each point as a single cluster. Combine the two closest clusters into one.

Slide3.png


Step 2.

Again, combine the two closest clusters into one.

Slide4.png


Step 3.

Repeat until only one cluster remains.

Slide5.png

Slide6.png


Output

Returns which cluster each data belongs to and the distance between clusters.


  • The tree structure obtained on the right is called a dendrogram.

Explanation1

It belongs to clustering algorithms, which is a type of unsupervised learning method. As the tree diagram shows, it is characterized by creating hierarchically represented clusters. The number of clusters can be obtained depending on where the tree is cut.

The biggest difference from $k$ average clustering is that it is possible to cluster without specifying the number of clusters. Therefore, the result is always constant.

Pros and Cons

It produces the same results regardless of the number of clusters to be distinguished and the initial values. However, it has a high time complexity of $O(n^{3})$.2

Distance Functions Used

hierarchical_clustering.png

Depending on the way the distance between clusters is measured, different clustering results are obtained.

A cluster is said to be compact when each element within the cluster is relatively closer to each other than to elements in other clusters. If each of the clusters is compact and well separated from other clusters, all three below methods are said to produce similar results.

Let’s denote two given clusters as $G, H$, $i \in G$, and $j \in H$.

Single Linkage

Choose the shortest distance among all distances between elements of two clusters. It is also known as the nearest-neighbor technique.

$$ d_{\text{SL}}(G, H) = \min\limits_{\substack{i\in G \\ j\in H}} d(i,j) $$

Complete Linkage

Opposite to SL, choose the longest distance among all distances between elements of two clusters. It is also known as the furthest-neighbor technique.

$$ d_{\text{CL}}(G, H) = \max\limits_{\substack{i\in G \\ j\in H}} d(i,j) $$

Group Average

As the name suggests, take the average of all pairwise distances as the distance between clusters. When $N$ is the number of elements in each cluster,

$$ d_{\text{GA}}(G, H) = \dfrac{1}{N_{G} N_{H}} \sum\limits_{\substack{i\in G \\ j\in H}} d(i,j) $$


  1. Trvor Hastie, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd Edition), p520-528 ↩︎

  2. hierarchical clustering -wikipedia ↩︎