logo

Derivation of Zomorodian's Algorithm 📂Topological Data Analysis

Derivation of Zomorodian's Algorithm

Overview

The paper “Computing Persistent Homology” by Zomorodian and Carlsson introduces an algorithm for deriving $\mathcal{P}$-intervals from a Filtered Complex created by an Abstract Simplicial Complex, bypassing the construction of Persistent Modules that are challenging to handle computationally, and computing persistent homology through matrix reduction1.

Derivation

Part 0. Preliminary Investigation

필터드_컴플렉스.png

The derivation of the algorithm starts by examining the form of a Persistence Complex depicted as above, crucial for understanding the subsequent mathematical expressions.

  • The numbers at the bottom are denoted as $\deg$, forming a Filtered Complex as they increase from $0$ to $5$. $$ \left\{ a,b \right\} = K_{0} \subset K_{1} \subset K_{2} \subset K_{3} \subset K_{4} \subset \left( K_{4} \cup \left\{ acd \right\} \right) = K_{5} $$
  • Regardless of $\deg$, $K$, as a $2$-Simplex, forms a Chain Complex in the context of Homology. $$ \mathsf{C}_{2} \overset{\partial_{2}}{\longrightarrow} \mathsf{C}_{1} \overset{\partial_{1}}{\longrightarrow} \mathsf{C}_{0} $$

The algorithm aims to compute how algebraic topological information provided by data, such as Betti Numbers $\beta_{k}$, emerges and disappears over $\deg$. $$ \begin{align*} L_{0} =& \left\{ \left( 0, \infty \right) , \left( 0,1 \right) , \left( 1,1 \right) , \left( 1,2 \right) \right\} \\ L_{1} =& \left\{ \left( 2,5 \right) , \left( 3,4 \right) \right\} \end{align*} $$

20220710_213052.png

$L_{0}$ consists of $\mathcal{P}$-intervals showing when components appear and vanish, while $L_{1}$ comprises $\mathcal{P}$-intervals indicating the emergence and disappearance of ‘holes’ in space.


Part 1. $\partial_{1}$

The paper suggests that the computation is possible over any Field, focusing on calculations within a $\mathbb{Z}_{2} [t]$-module, an Graded Module. The homogeneous basis for $\mathsf{C}_{k}$ is denoted as $\left\{ e_{j} \right\}$ and for $\mathsf{C}_{k-1}$ as $\left\{ \hat{e}_{i} \right\}$, where ‘homogeneous’ implies single-term expressions, not polynomials like $t^{2} + t$ but monomials like $t^{4}$.

필터드_컴플렉스2.png

$$ \deg M_{k} (i,j) = \deg e_{j} - \deg \hat{e}_{i} $$ Familiarity with Homological Algebra implies constructing the Boundary Matrix $M_{k}$ corresponding to $\partial_{k}$ and finding its Smith Normal Form $\tilde{M}_{1}$. For $k=1$, the unique $M_{1}$ can be obtained, considering the matrix bases are homogeneous.

20220710_202643.png

Constructing a matrix using these bases can be seen as the inverse operation of applying $t^{n}$, a group Action raising the degree in the graded module. Let’s perform a few direct calculations to grasp the concept. $$ \begin{align*} \deg M_{1} (2,5) =& \deg ac - \deg c = 3 - 1 = 2 = \deg t^{2} \\ \deg M_{1} (4,5) =& \deg ac - \deg a = 3 - 0 = 3 = \deg t^{3} \\ \deg M_{1} (2,2) =& \deg bc - \deg c = 1 - 1 = 0 = \deg t^{0} = \deg 1 \end{align*} $$

As mentioned before, once we form the Echelon Form, specifically the Column-Echelon Form, it appears as shown below:

20220710_204107.png

Recalling linear algebra from undergraduate studies, the elements at the top of each column that are non-zero, marked with rectangles in the image, are known as pivots. Here, we introduce two auxiliary lemmas:

  • (1): The diagonal elements of the Column-Echelon Form are the same as those in the Smith Normal Form.
  • (2): If the pivot of the $i$th row in $\tilde{M}_{k}$ is $\tilde{M}_{k} (i,j) = t^{n}$, it corresponds to $\sum^{\deg \hat{e}_{i}} F[t] / t^{n}$ in the Homology group $H_{k-1}$, otherwise it corresponds to $\sum^{\deg \hat{e}_{i}} F[t]$ in $H_{k-1}$. This equates to $L_{k-1}$ being composed of intervals $( \deg \hat{e}_{i} , \deg \hat{e}_{i} + n )$ and $( \deg \hat{e}_{i} , \infty )$.

In other words,

  • By auxiliary lemma (1), only column operations are needed to compute persistent homology, eliminating the need for row operations.
  • According to auxiliary lemma (2), $L_{k-1}$ consists of intervals $( \deg \hat{e}_{i} , \deg \hat{e}_{i} + n )$ and $( \deg \hat{e}_{i} , \infty )$.
    • Since the pivot of the first row is $t^{1}$ and $\deg d = 1$, we obtain $(1,1+1)$.
    • Since the pivot of the second row is $t^{0}$ and $\deg c = 1$, we obtain $(1,1+0)$.
    • Since the pivot of the third row is $t^{1}$ and $\deg b = 0$, we obtain $(0,0+1)$.
    • Since there is no pivot for the fourth row and $\deg a = 0$, we obtain $(0,\infty)$.

This matches precisely with the previously mentioned $L_{0}$, expressed as: $$ L_{0} = \left\{ (0, \infty) , (0,1) , (1,1) , (1,2) \right\} $$


Part 2. $\partial_{2}$

20220710_210216.png

To find $L_{1}$, the matrix form $M_{2}$ of $\partial_{2}$ is as shown above. However, with the following auxiliary lemma, we can simplify and ease the calculation process:

  • (3): To represent the standard basis of $\mathsf{C}_{k+1}$ and $\mathsf{Z}_{k}$ for $\partial_{k+1}$, rows corresponding to $\tilde{M}_{k}$ can simply be removed from $M_{k+1}$.

In our specific context, this means that from the $1$-simplices $ab, bc, cd, ad, ac$ in $\tilde{M}_{1}$, only the pivots $cd, bc, ab$ remain, so we can directly delete these from $M_{2}$. Intuitively, this implies that since these elements have already been used in dimension $k$, they are not needed for dimension $k+1$. By omitting the direct construction of column-echelon form $\tilde{M}_{2}$ and removing those three rows, we obtain the truncated form $\check{M}_{2}$ as follows:

20220710_210933.png $$ \begin{align*} z_{2} =& ac - bc - ab \ z_{1} =& ad - bc - cd - ab \end{align*} $$

Following auxiliary lemma (2) again for the computation:

  • The pivot of the first row is $t^{1}$, and since $$ \deg z_{2} = \deg ( ac - bc - ab ) = \max \deg { ac , bc , ab } = 3, $$ we obtain the interval $(3,3+1)$.
  • The pivot of the second row is $t^{3}$, and since $$ \deg z_{1} = \deg ( ad - bc - cd - ab ) = \max \deg { ad , bc , cd , ab } = 2, $$ we obtain the interval $(2,2+3)$.

This exactly matches the previously mentioned $L_{1}$, expressed as: $$ L_{1} = { (2,5), (3,4) } $$

Repeating this process for each dimension $\dim K$ of the complex $K$ yields the desired algorithm. The dimensions of the matrix follow $\partial_{k}$, and its elements are filled according to $\deg$, which should help clarify the procedure.

Lemma (1) indicates that only column operations are necessary, suggesting no strict need for matrix representation. Moreover, lemma (3) allows for efficient procedure by boldly removing rows already computed, requiring capabilities like ‘marking’ non-pivot columns. Consequently, the actual algorithm’s pseudocode is likely described using higher-level data structures, such as dictionaries or dataframes, rather than matrices directly, which can be quite challenging and confusing in practice.

Implementation


  1. Zomorodian. (2005). Computing Persistent Homology: ch4 ↩︎