Implementation of Zomorodian's Algorithm
Overview
This section explains and implements the pseudocode of an algorithm introduced in the paper “Computing Persistent Homology” by Zomorodian and Carlsson1. It takes a filtered complex constructed from an abstract simplicial complex and returns $\mathcal{P}$-intervals, omitting the construction of computationally challenging persistent modules and calculating persistent homology through matrix reduction. Furthermore, the actual implementation does not even use matrix operations.
Derivation
- Derivation of Zomorodian’s algorithm: It’s certain that without understanding the theoretical aspects of the algorithm, one could not comprehend the algorithm just by looking at the pseudocode. Even if one does not understand it perfectly, one should study enough to grasp why matrices suddenly disappear and why marking is necessary before moving on to implementation.
Algorithm
Let’s assume we have received a filtered complex as data before the algorithm operates.
Let’s create a dictionary or table $T$ like the one above to store data and record information from the algorithm. For example, in a Julia DataFrame, you would copy the numbers in the data to epsilon
, the alphabetic part to simplex
, and add a boolean column marked
to store the marking status, a slot
to store the chains of the chain complex, and a column J
to store integers that come out during the calculation.
- Julia arrays start at $1$, not $0$, and there are parts where indices are not considered for convenience of implementation, so there are cases where numbers are all wrong by $1~2$ like the screenshot above, but it’s not important at all, so don’t worry about it.
- Note that the
slot
for storing chains can freely switch between representations as a set and as a chain. For example, the calculation is on $\mathbb{Z}_{2}$ so $$ a + (a - b) = 2 a - b = b \pmod{2} $$ such calculation takes place, which is also the same as the following set operation $$ \left\{ a \right\} \cup \left\{ a, -b \right\} = \left\{ b \right\} $$ In algebraic operations, the element $0$ is treated as ’nonexistent’ in the set and should be accepted as such. It might be unpleasant for those who like precise and meticulous notation, but it’s not entirely unreasonable, so let’s just move on. - Also, the original paper derives the algorithm over a general field $F$, meaning every $q \in F$ has an inverse $q^{-1} \in F$, so calculations like $$ d = d - q^{-1} T[i] $$ are performed, but in this implementation, the binary field $\mathbb{Z}_{2}$ is sufficient, so $q^{-1}$ is not calculated separately and is replaced for $\Delta$ defined as $A \Delta B := \left( A \cup B \right) \setminus \left( A \cap B \right)$ like the following. $$ d = d \Delta T[i] $$
- The expression $\deg$ appears too frequently as a (program) function, an index, and the degree of a polynomial function, causing confusion. Therefore, in $T$, it’s denoted as
epsilon
instead ofdeg
. In fact, in simple level topological data analysis, the value of that column, the radius $\varepsilon > 0$, usually increases to construct a filtered complex. - $T$ assumes a perfect alignment according to the dimensions of the simplices, and accordingly,
epsilon
is expected to have a partial order.
Pseudocode
$\left\{ L_{k} \right\}$ = COMPUTEINTERVALS
$(K)$
- Input: Receives a filtered complex $K$. The filtered complex must have at least two pieces of information: at what timing $\deg \in \mathbb{Z}$ a certain simplex $\sigma$ was added.
- Output: Obtains a set $\left\{ L_{k} \right\}_{k=0}^{\dim K}$ of sets $L_{k}$ of $\mathcal{P}$-intervals for $k = 0, \cdots , \dim K$.
- Side Effect: Modifies
marked
in table $T$ where data is recorded.
$d$ = REMOVEPIVOTROWS
$(\sigma)$
- Input: Receives a $k$-dimensional simplex $\sigma$.
- Output: Obtains an element of some $\mathsf{C}_{k-1}$, which is a $(k-1)$-dimensional chain, meaning it’s operated on by $k$-dimensional simplices.
$i$ = maxindex
$d$
- Input: Receives the chain $d$.
- Output: Returns the largest index $i$ among all
simplex
included in chain $d$ in table $T$. For example, formaxindex(abc)
, it should return the largest $9$ among $5$ ofab
, $6$ ofbc
, and $9$ ofac
.
$k$ = dim
$d$
- Input: Receives a $k$-dimensional chain $d$.
- Output: Returns the integer $k$.
$k$ = dim
$\sigma$
- Input: Receives a $k$-dimensional simplex $\sigma$.
- Output: Returns the integer $k$.
$k$ = deg
$(\sigma)$
- Input: Receives the simplex $\sigma$.
- Output: Returns the integer
epsilon
corresponding to simplex $\sigma$ in table $T$. For example, ifdeg(cd)
, it should return $2$ sinceepsilon
ofcd
is $2$.
Keywords
Mark
is used in the formMark
$\sigma$ to change themarked
of the corresponding simplex $\sigma$ totrue
.Store
is used in the formStore
$j$ and $d$ in $T[i]$ to store the integer $j$ inJ
of $T[i]$ and the chain $d$ inslot
.Remove
is used in the formRemove
$x$ in $d$ to remove the term $x$ in the chain $d$.
$\sigma^{i}$ is the simplex
at the $i$th position in table $T$, and $m$ is the length of $T$.
function COMPUTEINTERVALS
$(K)$
# Initialization
for $k \in 0:\dim K$
$L_{k} := \emptyset$
end for
for $j \in 0:(m-1)$
$d$ = REMOVEPIVOTROWS
$\left( \sigma^{j} \right)$
if $d = \emptyset$
# $d$ being empty means it’s a candidate for a (non-pivot) zero column
mark
$\sigma^{j}$
else
# Must calculate the degree of $d$, so it must be the max
of all terms
# $d$ is a chain one dimension lower than $\sigma^{j}$ and must be $i < j$
$i$ = maxindex
$d$
$k$ = dim
$d$
$L_{k}$ = $L_{k} \cup \left\{ \left( \deg \sigma^{i}, \deg \sigma^{j} \right) \right\}$
end if
end for
for $j \in 0:(m-1)$
# If it’s still not marked, it’s definitely a zero column
if $\sigma^{j}$ ismarked
and $T[j]$ isempty
$k$ = dim
$d$
# Corresponds to $H_{k-1}$ to $\sum^{\hat{e}_{i}} F[t]$, handling infinity
$L_{k}$ = $L_{k} \cup \left\{ \left( \deg \sigma^{i}, \infty \right) \right\}$
end if
end for
return $\left\{ L_{k} \right\}$
end function
function REMOVEPIVOTROWS
$(\sigma)$
$k$ = $\dim \sigma$
# Assuming $\partial abc = ab - bc + ca$, ∂("abc") = ["ab", "bc", "ca"]
$d$ = $\partial_{k} \sigma$
Remove
not marked
$(k-1)$-dimensional simplex
in $d$
while $d \ne \emptyset$
$i$ = maxindex
$d$
if $T[i]$ isempty
break
end if
# Since it’s $\mathbb{Z}_{2}$, replace with symdiff (symmetric difference)
$d$ = $d \Delta T[i]$
end while
return $d$
end function
Implementation
The result of the algorithm for the given example should be as follows: $$ \begin{align*} L_{0} =& \left\{ \left( 0, \infty \right) , \left( 0,1 \right) , \left( 1,1 \right) , \left( 1,2 \right) \right\} \\ L_{1} =& \left\{ \left( 2,5 \right) , \left( 3,4 \right) \right\} \end{align*} $$
The result of the implementation in Julia is as follows. Except for the parts where the indices are precisely incorrect, it can be seen that it has been implemented correctly.
Full Code
As you can see, the code is written almost exactly following the notation of the original paper. For instance, the Julia-like code for
$$
L_{k} = L_{k} \cup \left\{ \left( \deg \sigma^{i}, \deg \sigma^{j} \right) \right\}
$$
is push!(L_[k], (deg(σⁱ), deg(σʲ)))
, but to make it look almost identical to the paper, it’s implemented as L_[k] = L_[k] ∪ [(deg(σⁱ), deg(σʲ))]
.
using DataFrames
data = DataFrame([
0 "a"
0 "b"
1 "c"
1 "d"
1 "ab"
1 "bc"
2 "cd"
2 "ad"
3 "ac"
4 "abc"
5 "acd"
], ["epsilon", "simplex"])
T = copy(data)
T[!, :"marked"] .= false
T[!, :"slot"] .= [[]]
T[!, :"J"] .= 0
dimK = 2
m = nrow(T)
test_σ = "abc"
dim(σ) = length(σ)
function deg(σ)
return T.epsilon[findfirst(T.simplex .== σ)]
end
deg(test_σ)
function ∂(σ)
k = dim(σ)
return [σ[(1:k)[Not(t)]] for t = 1:k]
end
∂(test_σ)
function maxindex(chain)
return (T.simplex .∈ Ref(chain)) |> findall |> maximum
end
maxindex(∂(test_σ))
function REMOVEPIVOTROWS(σ)
k = dim(σ); d = ∂(σ)
d = d[d .∈ Ref(T[T.marked,:simplex])] # Remove unmarked terms in ▷eq029◁
while !(d |> isempty)
i = maxindex(d)
if T[i,:slot] |> isempty break end
d = symdiff(d, T[i,:slot])
# print("d in empty")
end
return d
end
REMOVEPIVOTROWS(test_σ)
L_ = [[] for k = 0:dimK]
for j0 = 0:(m-1)
j = j0+1
σʲ = T[j,:simplex]
d = REMOVEPIVOTROWS(σʲ)
if d |> isempty
T[j,:marked] = true
else
i = maxindex(d); k = dim(σʲ)
σⁱ = T[i,:simplex]
T[i,[:J,:slot]] = j0,d
L_[k] = L_[k] ∪ [(deg(σⁱ), deg(σʲ))]
end
end
for j0 = 0:(m-1)
j = j0+1
σʲ = T[j,:simplex]
if (T[j,:marked]) && (T[j,:slot] |> isempty) && (T[j,:J] |> iszero)
k = dim(σʲ); L_[k] = L_[k] ∪ [(deg(σʲ), Inf)]
print("j: $j")
end
end
Zomorodian. (2005). Computing Persistent Homology: ch4 ↩︎