Implementation of Zomorodian's Algorithm
Overview
This section explains and implements the pseudocode of an algorithm introduced in the paper “Computing Persistent Homology” by Zomorodian and Carlsson1. It takes a filtered complex constructed from an abstract simplicial complex and returns $\mathcal{P}$-intervals, omitting the construction of computationally challenging persistent modules and calculating persistent homology through matrix reduction. Furthermore, the actual implementation does not even use matrix operations.
Derivation
- Derivation of Zomorodian’s algorithm: It’s certain that without understanding the theoretical aspects of the algorithm, one could not comprehend the algorithm just by looking at the pseudocode. Even if one does not understand it perfectly, one should study enough to grasp why matrices suddenly disappear and why marking is necessary before moving on to implementation.
Algorithm

Let’s assume we have received a filtered complex as data before the algorithm operates.

Let’s create a dictionary or table $T$ like the one above to store data and record information from the algorithm. For example, in a Julia DataFrame, you would copy the numbers in the data to epsilon, the alphabetic part to simplex, and add a boolean column marked to store the marking status, a slot to store the chains of the chain complex, and a column J to store integers that come out during the calculation.

- Julia arrays start at $1$, not $0$, and there are parts where indices are not considered for convenience of implementation, so there are cases where numbers are all wrong by $1~2$ like the screenshot above, but it’s not important at all, so don’t worry about it.
- Note that the
slotfor storing chains can freely switch between representations as a set and as a chain. For example, the calculation is on $\mathbb{Z}_{2}$ so $$ a + (a - b) = 2 a - b = b \pmod{2} $$ such calculation takes place, which is also the same as the following set operation $$ \left\{ a \right\} \cup \left\{ a, -b \right\} = \left\{ b \right\} $$ In algebraic operations, the element $0$ is treated as ’nonexistent’ in the set and should be accepted as such. It might be unpleasant for those who like precise and meticulous notation, but it’s not entirely unreasonable, so let’s just move on. - Also, the original paper derives the algorithm over a general field $F$, meaning every $q \in F$ has an inverse $q^{-1} \in F$, so calculations like $$ d = d - q^{-1} T[i] $$ are performed, but in this implementation, the binary field $\mathbb{Z}_{2}$ is sufficient, so $q^{-1}$ is not calculated separately and is replaced for $\Delta$ defined as $A \Delta B := \left( A \cup B \right) \setminus \left( A \cap B \right)$ like the following. $$ d = d \Delta T[i] $$
- The expression $\deg$ appears too frequently as a (program) function, an index, and the degree of a polynomial function, causing confusion. Therefore, in $T$, it’s denoted as
epsiloninstead ofdeg. In fact, in simple level topological data analysis, the value of that column, the radius $\varepsilon > 0$, usually increases to construct a filtered complex. - $T$ assumes a perfect alignment according to the dimensions of the simplices, and accordingly,
epsilonis expected to have a partial order.
Pseudocode
$\left\{ L_{k} \right\}$ = COMPUTEINTERVALS$(K)$
- Input: Receives a filtered complex $K$. The filtered complex must have at least two pieces of information: at what timing $\deg \in \mathbb{Z}$ a certain simplex $\sigma$ was added.
- Output: Obtains a set $\left\{ L_{k} \right\}_{k=0}^{\dim K}$ of sets $L_{k}$ of $\mathcal{P}$-intervals for $k = 0, \cdots , \dim K$.
- Side Effect: Modifies
markedin table $T$ where data is recorded.
$d$ = REMOVEPIVOTROWS$(\sigma)$
- Input: Receives a $k$-dimensional simplex $\sigma$.
- Output: Obtains an element of some $\mathsf{C}_{k-1}$, which is a $(k-1)$-dimensional chain, meaning it’s operated on by $k$-dimensional simplices.
$i$ = maxindex$d$
- Input: Receives the chain $d$.
- Output: Returns the largest index $i$ among all
simplexincluded in chain $d$ in table $T$. For example, formaxindex(abc), it should return the largest $9$ among $5$ ofab, $6$ ofbc, and $9$ ofac.
$k$ = dim$d$
- Input: Receives a $k$-dimensional chain $d$.
- Output: Returns the integer $k$.
$k$ = dim$\sigma$
- Input: Receives a $k$-dimensional simplex $\sigma$.
- Output: Returns the integer $k$.
$k$ = deg$(\sigma)$
- Input: Receives the simplex $\sigma$.
- Output: Returns the integer
epsiloncorresponding to simplex $\sigma$ in table $T$. For example, ifdeg(cd), it should return $2$ sinceepsilonofcdis $2$.
Keywords
Markis used in the formMark$\sigma$ to change themarkedof the corresponding simplex $\sigma$ totrue.Storeis used in the formStore$j$ and $d$ in $T[i]$ to store the integer $j$ inJof $T[i]$ and the chain $d$ inslot.Removeis used in the formRemove$x$ in $d$ to remove the term $x$ in the chain $d$.
$\sigma^{i}$ is the simplex at the $i$th position in table $T$, and $m$ is the length of $T$.
function COMPUTEINTERVALS$(K)$
# Initialization
for $k \in 0:\dim K$
$L_{k} := \emptyset$
end for
for $j \in 0:(m-1)$
$d$ = REMOVEPIVOTROWS$\left( \sigma^{j} \right)$
if $d = \emptyset$
# $d$ being empty means it’s a candidate for a (non-pivot) zero column
mark $\sigma^{j}$
else
# Must calculate the degree of $d$, so it must be the max of all terms
# $d$ is a chain one dimension lower than $\sigma^{j}$ and must be $i < j$
$i$ = maxindex$d$
$k$ = dim$d$
$L_{k}$ = $L_{k} \cup \left\{ \left( \deg \sigma^{i}, \deg \sigma^{j} \right) \right\}$
end if
end for
for $j \in 0:(m-1)$
# If it’s still not marked, it’s definitely a zero column
if $\sigma^{j}$ ismarked and $T[j]$ isempty
$k$ = dim$d$
# Corresponds to $H_{k-1}$ to $\sum^{\hat{e}_{i}} F[t]$, handling infinity
$L_{k}$ = $L_{k} \cup \left\{ \left( \deg \sigma^{i}, \infty \right) \right\}$
end if
end for
return $\left\{ L_{k} \right\}$
end function
function REMOVEPIVOTROWS$(\sigma)$
$k$ = $\dim \sigma$
# Assuming $\partial abc = ab - bc + ca$, ∂("abc") = ["ab", "bc", "ca"]
$d$ = $\partial_{k} \sigma$
Remove not marked $(k-1)$-dimensional simplex in $d$
while $d \ne \emptyset$
$i$ = maxindex$d$
if $T[i]$ isempty
break
end if
# Since it’s $\mathbb{Z}_{2}$, replace with symdiff (symmetric difference)
$d$ = $d \Delta T[i]$
end while
return $d$
end function
Implementation
The result of the algorithm for the given example should be as follows: $$ \begin{align*} L_{0} =& \left\{ \left( 0, \infty \right) , \left( 0,1 \right) , \left( 1,1 \right) , \left( 1,2 \right) \right\} \\ L_{1} =& \left\{ \left( 2,5 \right) , \left( 3,4 \right) \right\} \end{align*} $$
The result of the implementation in Julia is as follows. Except for the parts where the indices are precisely incorrect, it can be seen that it has been implemented correctly.

Full Code
As you can see, the code is written almost exactly following the notation of the original paper. For instance, the Julia-like code for
$$
L_{k} = L_{k} \cup \left\{ \left( \deg \sigma^{i}, \deg \sigma^{j} \right) \right\}
$$
is push!(L_[k], (deg(σⁱ), deg(σʲ))), but to make it look almost identical to the paper, it’s implemented as L_[k] = L_[k] ∪ [(deg(σⁱ), deg(σʲ))].
using DataFrames
data = DataFrame([
0 "a"
0 "b"
1 "c"
1 "d"
1 "ab"
1 "bc"
2 "cd"
2 "ad"
3 "ac"
4 "abc"
5 "acd"
], ["epsilon", "simplex"])
T = copy(data)
T[!, :"marked"] .= false
T[!, :"slot"] .= [[]]
T[!, :"J"] .= 0
dimK = 2
m = nrow(T)
test_σ = "abc"
dim(σ) = length(σ)
function deg(σ)
return T.epsilon[findfirst(T.simplex .== σ)]
end
deg(test_σ)
function ∂(σ)
k = dim(σ)
return [σ[(1:k)[Not(t)]] for t = 1:k]
end
∂(test_σ)
function maxindex(chain)
return (T.simplex .∈ Ref(chain)) |> findall |> maximum
end
maxindex(∂(test_σ))
function REMOVEPIVOTROWS(σ)
k = dim(σ); d = ∂(σ)
d = d[d .∈ Ref(T[T.marked,:simplex])] # Remove unmarked terms in ▷eq029◁
while !(d |> isempty)
i = maxindex(d)
if T[i,:slot] |> isempty break end
d = symdiff(d, T[i,:slot])
# print("d in empty")
end
return d
end
REMOVEPIVOTROWS(test_σ)
L_ = [[] for k = 0:dimK]
for j0 = 0:(m-1)
j = j0+1
σʲ = T[j,:simplex]
d = REMOVEPIVOTROWS(σʲ)
if d |> isempty
T[j,:marked] = true
else
i = maxindex(d); k = dim(σʲ)
σⁱ = T[i,:simplex]
T[i,[:J,:slot]] = j0,d
L_[k] = L_[k] ∪ [(deg(σⁱ), deg(σʲ))]
end
end
for j0 = 0:(m-1)
j = j0+1
σʲ = T[j,:simplex]
if (T[j,:marked]) && (T[j,:slot] |> isempty) && (T[j,:J] |> iszero)
k = dim(σʲ); L_[k] = L_[k] ∪ [(deg(σʲ), Inf)]
print("j: $j")
end
end
Zomorodian. (2005). Computing Persistent Homology: ch4 ↩︎
