logo

Derivation of Zomorodian's Algorithm 📂Topological Data Analysis

Derivation of Zomorodian's Algorithm

Overview

The paper “Computing Persistent Homology” by Zomorodian and Carlsson introduces an algorithm for deriving P\mathcal{P}-intervals from a Filtered Complex created by an Abstract Simplicial Complex, bypassing the construction of Persistent Modules that are challenging to handle computationally, and computing persistent homology through matrix reduction1.

Derivation

Part 0. Preliminary Investigation

필터드_컴플렉스.png

The derivation of the algorithm starts by examining the form of a Persistence Complex depicted as above, crucial for understanding the subsequent mathematical expressions.

  • The numbers at the bottom are denoted as deg\deg, forming a Filtered Complex as they increase from 00 to 55. {a,b}=K0K1K2K3K4(K4{acd})=K5 \left\{ a,b \right\} = K_{0} \subset K_{1} \subset K_{2} \subset K_{3} \subset K_{4} \subset \left( K_{4} \cup \left\{ acd \right\} \right) = K_{5}
  • Regardless of deg\deg, KK, as a 22-Simplex, forms a Chain Complex in the context of Homology. C22C11C0 \mathsf{C}_{2} \overset{\partial_{2}}{\longrightarrow} \mathsf{C}_{1} \overset{\partial_{1}}{\longrightarrow} \mathsf{C}_{0}

The algorithm aims to compute how algebraic topological information provided by data, such as Betti Numbers βk\beta_{k}, emerges and disappears over deg\deg. L0={(0,),(0,1),(1,1),(1,2)}L1={(2,5),(3,4)} \begin{align*} L_{0} =& \left\{ \left( 0, \infty \right) , \left( 0,1 \right) , \left( 1,1 \right) , \left( 1,2 \right) \right\} \\ L_{1} =& \left\{ \left( 2,5 \right) , \left( 3,4 \right) \right\} \end{align*}

20220710_213052.png

L0L_{0} consists of P\mathcal{P}-intervals showing when components appear and vanish, while L1L_{1} comprises P\mathcal{P}-intervals indicating the emergence and disappearance of ‘holes’ in space.


Part 1. 1\partial_{1}

The paper suggests that the computation is possible over any Field, focusing on calculations within a Z2[t]\mathbb{Z}_{2} [t]-module, an Graded Module. The homogeneous basis for Ck\mathsf{C}_{k} is denoted as {ej}\left\{ e_{j} \right\} and for Ck1\mathsf{C}_{k-1} as {e^i}\left\{ \hat{e}_{i} \right\}, where ‘homogeneous’ implies single-term expressions, not polynomials like t2+tt^{2} + t but monomials like t4t^{4}.

필터드_컴플렉스2.png

degMk(i,j)=degejdege^i \deg M_{k} (i,j) = \deg e_{j} - \deg \hat{e}_{i} Familiarity with Homological Algebra implies constructing the Boundary Matrix MkM_{k} corresponding to k\partial_{k} and finding its Smith Normal Form M~1\tilde{M}_{1}. For k=1k=1, the unique M1M_{1} can be obtained, considering the matrix bases are homogeneous.

20220710_202643.png

Constructing a matrix using these bases can be seen as the inverse operation of applying tnt^{n}, a group Action raising the degree in the graded module. Let’s perform a few direct calculations to grasp the concept. degM1(2,5)=degacdegc=31=2=degt2degM1(4,5)=degacdega=30=3=degt3degM1(2,2)=degbcdegc=11=0=degt0=deg1 \begin{align*} \deg M_{1} (2,5) =& \deg ac - \deg c = 3 - 1 = 2 = \deg t^{2} \\ \deg M_{1} (4,5) =& \deg ac - \deg a = 3 - 0 = 3 = \deg t^{3} \\ \deg M_{1} (2,2) =& \deg bc - \deg c = 1 - 1 = 0 = \deg t^{0} = \deg 1 \end{align*}

As mentioned before, once we form the Echelon Form, specifically the Column-Echelon Form, it appears as shown below:

20220710_204107.png

Recalling linear algebra from undergraduate studies, the elements at the top of each column that are non-zero, marked with rectangles in the image, are known as pivots. Here, we introduce two auxiliary lemmas:

  • (1): The diagonal elements of the Column-Echelon Form are the same as those in the Smith Normal Form.
  • (2): If the pivot of the iith row in M~k\tilde{M}_{k} is M~k(i,j)=tn\tilde{M}_{k} (i,j) = t^{n}, it corresponds to dege^iF[t]/tn\sum^{\deg \hat{e}_{i}} F[t] / t^{n} in the Homology group Hk1H_{k-1}, otherwise it corresponds to dege^iF[t]\sum^{\deg \hat{e}_{i}} F[t] in Hk1H_{k-1}. This equates to Lk1L_{k-1} being composed of intervals (dege^i,dege^i+n)( \deg \hat{e}_{i} , \deg \hat{e}_{i} + n ) and (dege^i,)( \deg \hat{e}_{i} , \infty ).

In other words,

  • By auxiliary lemma (1), only column operations are needed to compute persistent homology, eliminating the need for row operations.
  • According to auxiliary lemma (2), Lk1L_{k-1} consists of intervals (dege^i,dege^i+n)( \deg \hat{e}_{i} , \deg \hat{e}_{i} + n ) and (dege^i,)( \deg \hat{e}_{i} , \infty ).
    • Since the pivot of the first row is t1t^{1} and degd=1\deg d = 1, we obtain (1,1+1)(1,1+1).
    • Since the pivot of the second row is t0t^{0} and degc=1\deg c = 1, we obtain (1,1+0)(1,1+0).
    • Since the pivot of the third row is t1t^{1} and degb=0\deg b = 0, we obtain (0,0+1)(0,0+1).
    • Since there is no pivot for the fourth row and dega=0\deg a = 0, we obtain (0,)(0,\infty).

This matches precisely with the previously mentioned L0L_{0}, expressed as: L0={(0,),(0,1),(1,1),(1,2)} L_{0} = \left\{ (0, \infty) , (0,1) , (1,1) , (1,2) \right\}


Part 2. 2\partial_{2}

20220710_210216.png

To find L1L_{1}, the matrix form M2M_{2} of 2\partial_{2} is as shown above. However, with the following auxiliary lemma, we can simplify and ease the calculation process:

  • (3): To represent the standard basis of Ck+1\mathsf{C}_{k+1} and Zk\mathsf{Z}_{k} for k+1\partial_{k+1}, rows corresponding to M~k\tilde{M}_{k} can simply be removed from Mk+1M_{k+1}.

In our specific context, this means that from the 11-simplices ab,bc,cd,ad,acab, bc, cd, ad, ac in M~1\tilde{M}_{1}, only the pivots cd,bc,abcd, bc, ab remain, so we can directly delete these from M2M_{2}. Intuitively, this implies that since these elements have already been used in dimension kk, they are not needed for dimension k+1k+1. By omitting the direct construction of column-echelon form M~2\tilde{M}_{2} and removing those three rows, we obtain the truncated form Mˇ2\check{M}_{2} as follows:

20220710_210933.png z2=acbcab z1=adbccdab \begin{align*} z_{2} =& ac - bc - ab \ z_{1} =& ad - bc - cd - ab \end{align*}

Following auxiliary lemma (2) again for the computation:

  • The pivot of the first row is t1t^{1}, and since degz2=deg(acbcab)=maxdegac,bc,ab=3, \deg z_{2} = \deg ( ac - bc - ab ) = \max \deg { ac , bc , ab } = 3, we obtain the interval (3,3+1)(3,3+1).
  • The pivot of the second row is t3t^{3}, and since degz1=deg(adbccdab)=maxdegad,bc,cd,ab=2, \deg z_{1} = \deg ( ad - bc - cd - ab ) = \max \deg { ad , bc , cd , ab } = 2, we obtain the interval (2,2+3)(2,2+3).

This exactly matches the previously mentioned L1L_{1}, expressed as: L1=(2,5),(3,4) L_{1} = { (2,5), (3,4) }

Repeating this process for each dimension dimK\dim K of the complex KK yields the desired algorithm. The dimensions of the matrix follow k\partial_{k}, and its elements are filled according to deg\deg, which should help clarify the procedure.

Lemma (1) indicates that only column operations are necessary, suggesting no strict need for matrix representation. Moreover, lemma (3) allows for efficient procedure by boldly removing rows already computed, requiring capabilities like ‘marking’ non-pivot columns. Consequently, the actual algorithm’s pseudocode is likely described using higher-level data structures, such as dictionaries or dataframes, rather than matrices directly, which can be quite challenging and confusing in practice.

Implementation


  1. Zomorodian. (2005). Computing Persistent Homology: ch4 ↩︎