logo

Implementation of Zomorodian's Algorithm 📂Topological Data Analysis

Implementation of Zomorodian's Algorithm

Overview

This section explains and implements the pseudocode of an algorithm introduced in the paper “Computing Persistent Homology” by Zomorodian and Carlsson1. It takes a filtered complex constructed from an abstract simplicial complex and returns P\mathcal{P}-intervals, omitting the construction of computationally challenging persistent modules and calculating persistent homology through matrix reduction. Furthermore, the actual implementation does not even use matrix operations.

Derivation

  • Derivation of Zomorodian’s algorithm: It’s certain that without understanding the theoretical aspects of the algorithm, one could not comprehend the algorithm just by looking at the pseudocode. Even if one does not understand it perfectly, one should study enough to grasp why matrices suddenly disappear and why marking is necessary before moving on to implementation.

Algorithm

Filtered_Complex.png

Let’s assume we have received a filtered complex as data before the algorithm operates.

20220710_234704.png

Let’s create a dictionary or table TT like the one above to store data and record information from the algorithm. For example, in a Julia DataFrame, you would copy the numbers in the data to epsilon, the alphabetic part to simplex, and add a boolean column marked to store the marking status, a slot to store the chains of the chain complex, and a column J to store integers that come out during the calculation.

20220710_234904.png

  • Julia arrays start at 11, not 00, and there are parts where indices are not considered for convenience of implementation, so there are cases where numbers are all wrong by 1 21~2 like the screenshot above, but it’s not important at all, so don’t worry about it.
  • Note that the slot for storing chains can freely switch between representations as a set and as a chain. For example, the calculation is on Z2\mathbb{Z}_{2} so a+(ab)=2ab=b(mod2) a + (a - b) = 2 a - b = b \pmod{2} such calculation takes place, which is also the same as the following set operation {a}{a,b}={b} \left\{ a \right\} \cup \left\{ a, -b \right\} = \left\{ b \right\} In algebraic operations, the element 00 is treated as ’nonexistent’ in the set and should be accepted as such. It might be unpleasant for those who like precise and meticulous notation, but it’s not entirely unreasonable, so let’s just move on.
  • Also, the original paper derives the algorithm over a general field FF, meaning every qFq \in F has an inverse q1Fq^{-1} \in F, so calculations like d=dq1T[i] d = d - q^{-1} T[i] are performed, but in this implementation, the binary field Z2\mathbb{Z}_{2} is sufficient, so q1q^{-1} is not calculated separately and is replaced for Δ\Delta defined as AΔB:=(AB)(AB)A \Delta B := \left( A \cup B \right) \setminus \left( A \cap B \right) like the following. d=dΔT[i] d = d \Delta T[i]
  • The expression deg\deg appears too frequently as a (program) function, an index, and the degree of a polynomial function, causing confusion. Therefore, in TT, it’s denoted as epsilon instead of deg. In fact, in simple level topological data analysis, the value of that column, the radius ε>0\varepsilon > 0, usually increases to construct a filtered complex.
  • TT assumes a perfect alignment according to the dimensions of the simplices, and accordingly, epsilon is expected to have a partial order.

Pseudocode

{Lk}\left\{ L_{k} \right\} = COMPUTEINTERVALS(K)(K)

  • Input: Receives a filtered complex KK. The filtered complex must have at least two pieces of information: at what timing degZ\deg \in \mathbb{Z} a certain simplex σ\sigma was added.
  • Output: Obtains a set {Lk}k=0dimK\left\{ L_{k} \right\}_{k=0}^{\dim K} of sets LkL_{k} of P\mathcal{P}-intervals for k=0,,dimKk = 0, \cdots , \dim K.
  • Side Effect: Modifies marked in table TT where data is recorded.

dd = REMOVEPIVOTROWS(σ)(\sigma)

  • Input: Receives a kk-dimensional simplex σ\sigma.
  • Output: Obtains an element of some Ck1\mathsf{C}_{k-1}, which is a (k1)(k-1)-dimensional chain, meaning it’s operated on by kk-dimensional simplices.

ii = maxindexdd

  • Input: Receives the chain dd.
  • Output: Returns the largest index ii among all simplex included in chain dd in table TT. For example, for maxindex(abc), it should return the largest 99 among 55 of ab, 66 of bc, and 99 of ac.

kk = dimdd

  • Input: Receives a kk-dimensional chain dd.
  • Output: Returns the integer kk.

kk = dimσ\sigma

  • Input: Receives a kk-dimensional simplex σ\sigma.
  • Output: Returns the integer kk.

kk = deg(σ)(\sigma)

  • Input: Receives the simplex σ\sigma.
  • Output: Returns the integer epsilon corresponding to simplex σ\sigma in table TT. For example, if deg(cd), it should return 22 since epsilon of cd is 22.

Keywords

  • Mark is used in the form Mark σ\sigma to change the marked of the corresponding simplex σ\sigma to true.
  • Store is used in the form Store jj and dd in T[i]T[i] to store the integer jj in J of T[i]T[i] and the chain dd in slot.
  • Remove is used in the form Remove xx in dd to remove the term xx in the chain dd.

σi\sigma^{i} is the simplex at the iith position in table TT, and mm is the length of TT.

function COMPUTEINTERVALS(K)(K)
  # Initialization
  for k0:dimKk \in 0:\dim K
    Lk:=L_{k} := \emptyset
  end for

  for j0:(m1)j \in 0:(m-1)
    dd = REMOVEPIVOTROWS(σj)\left( \sigma^{j} \right)
    if d=d = \emptyset
      # dd being empty means it’s a candidate for a (non-pivot) zero column
      mark σj\sigma^{j}
    else
      # Must calculate the degree of dd, so it must be the max of all terms
      # dd is a chain one dimension lower than σj\sigma^{j} and must be i<ji < j
      ii = maxindexdd
      kk = dimdd
      LkL_{k} = Lk{(degσi,degσj)}L_{k} \cup \left\{ \left( \deg \sigma^{i}, \deg \sigma^{j} \right) \right\}
    end if
  end for

  for j0:(m1)j \in 0:(m-1)
    # If it’s still not marked, it’s definitely a zero column
    if σj\sigma^{j} ismarked and T[j]T[j] isempty
      kk = dimdd
      # Corresponds to Hk1H_{k-1} to e^iF[t]\sum^{\hat{e}_{i}} F[t], handling infinity
      LkL_{k} = Lk{(degσi,)}L_{k} \cup \left\{ \left( \deg \sigma^{i}, \infty \right) \right\}
    end if
  end for
  return {Lk}\left\{ L_{k} \right\}
end function

function REMOVEPIVOTROWS(σ)(\sigma)
  kk = dimσ\dim \sigma
  # Assuming abc=abbc+ca\partial abc = ab - bc + ca, ∂("abc") = ["ab", "bc", "ca"]
  dd = kσ\partial_{k} \sigma
  Remove not marked (k1)(k-1)-dimensional simplex in dd
  while dd \ne \emptyset
    ii = maxindexdd
    if T[i]T[i] isempty
      break
    end if
    # Since it’s Z2\mathbb{Z}_{2}, replace with symdiff (symmetric difference)
    dd = dΔT[i]d \Delta T[i]
  end while
  return dd
end function


Implementation

The result of the algorithm for the given example should be as follows: L0={(0,),(0,1),(1,1),(1,2)}L1={(2,5),(3,4)} \begin{align*} L_{0} =& \left\{ \left( 0, \infty \right) , \left( 0,1 \right) , \left( 1,1 \right) , \left( 1,2 \right) \right\} \\ L_{1} =& \left\{ \left( 2,5 \right) , \left( 3,4 \right) \right\} \end{align*}

The result of the implementation in Julia is as follows. Except for the parts where the indices are precisely incorrect, it can be seen that it has been implemented correctly.

20220711_005505.png

Full Code

As you can see, the code is written almost exactly following the notation of the original paper. For instance, the Julia-like code for Lk=Lk{(degσi,degσj)} L_{k} = L_{k} \cup \left\{ \left( \deg \sigma^{i}, \deg \sigma^{j} \right) \right\} is push!(L_[k], (deg(σⁱ), deg(σʲ))), but to make it look almost identical to the paper, it’s implemented as L_[k] = L_[k] ∪ [(deg(σⁱ), deg(σʲ))].

using DataFrames

data = DataFrame([
0 "a"
0 "b"
1 "c"
1 "d"
1 "ab"
1 "bc"
2 "cd"
2 "ad"
3 "ac"
4 "abc"
5 "acd"
], ["epsilon", "simplex"])
T = copy(data)
T[!, :"marked"] .= false
T[!, :"slot"] .= [[]]
T[!, :"J"] .= 0

dimK = 2
m = nrow(T)

test_σ = "abc"
dim(σ) = length(σ)

function deg(σ)
    return T.epsilon[findfirst(T.simplex .== σ)]
end
deg(test_σ)

function ∂(σ)
    k = dim(σ)
    return [σ[(1:k)[Not(t)]] for t = 1:k]
end
∂(test_σ)

function maxindex(chain)
    return (T.simplex .∈ Ref(chain)) |> findall |> maximum
end
maxindex(∂(test_σ))

function REMOVEPIVOTROWS(σ)
    k = dim(σ); d = ∂(σ)
    d = d[d .∈ Ref(T[T.marked,:simplex])] # Remove unmarked terms in ▷eq029◁
    while !(d |> isempty)
        i = maxindex(d)
        if T[i,:slot] |> isempty break end
        d = symdiff(d, T[i,:slot])
        # print("d in empty")
    end
    return d
end
REMOVEPIVOTROWS(test_σ)

L_ = [[] for k = 0:dimK]
for j0 = 0:(m-1)
    j = j0+1
    σʲ = T[j,:simplex]
    d = REMOVEPIVOTROWS(σʲ)
    if d |> isempty
        T[j,:marked] = true
    else
        i = maxindex(d); k = dim(σʲ)
        σⁱ = T[i,:simplex]
        T[i,[:J,:slot]] = j0,d
        L_[k] = L_[k] ∪ [(deg(σⁱ), deg(σʲ))]
    end
end
for j0 = 0:(m-1)
    j = j0+1
    σʲ = T[j,:simplex]
    if (T[j,:marked]) && (T[j,:slot] |> isempty) && (T[j,:J] |> iszero)
        k = dim(σʲ); L_[k] = L_[k] ∪ [(deg(σʲ), Inf)]
        print("j: $j")
    end
end

  1. Zomorodian. (2005). Computing Persistent Homology: ch4 ↩︎