Graph Edit Distance Between Graphs
Definition 1
Let’s denote a finite set of vertices as $X$ and a set of a finite alphabet as $\alpha$, where the alphabet is considered to include spaces or nulls. A graph is defined as a triple $G = (X, V, E)$ concerning vertex labeling $V : X \to \alpha$ and edge labeling $E : X \times X \to \alpha$. That $\hat{G} = \left( \hat{X}, \hat{V}, \hat{E} \right)$ is a subgraph of $G$ means satisfying the following three conditions:
- $\hat{X} \subset X$
- For $\forall x \in \hat{X}$, there is $\hat{V}(x) = V(x)$
- For $\forall (x, y) \in \hat{X} \times \hat{X}$, there is $\hat{E} \left( x , y \right) = E \left( x , y \right)$
ecgm
Let $G_{1} = \left( X_{1}, V_{1}, E_{1} \right)$ and $G_{2} = \left( X_{2}, V_{2}, E_{2} \right)$ be two graphs. An error-correcting graph matching (ecgm) from $G_{1}$ to $G_{2}$, denoted as $f : \hat{X}_{1} \to \hat{X}_{2}$, is a bijective function concerning the subgraphs $\hat{G}_{1} = \left( \hat{X}_{1}, \hat{V}_{1}, \hat{E}_{1} \right)$ and $\hat{G}_{2} = \left( \hat{X}_{2}, \hat{V}_{2}, \hat{E}_{2} \right)$ of these graphs.
Cost of ecgm
For convenience, let’s denote the range of edge labeling $\hat{E}_{k} \left( \hat{X}_{k} \times \hat{X}_{k} \right)$ as $\hat{E}_{k}$. The cost $c(f)$ of an ecgm $f: \hat{X}_{1} \to \hat{X}_{2}$ from $G_{1}$ to $G_{2}$ is defined as follows:
$$ \begin{align*} c(f) :=& \sum_{x \in \hat{X}_{1}} c_{\text{vs}} (x) + \sum_{x \in X_{1} \setminus \hat{X}_{1}} c_{\text{vd}} (x) + \sum_{x \in X_{2} \setminus \hat{X}_{2}} c_{\text{vi}} (x) \\ & + \sum_{e \in \hat{E}_{1}} c_{\text{es}} (x) + \sum_{e \in E_{1} \setminus \hat{E}_{1}} c_{\text{ed}} (x) + \sum_{e \in E_{2} \setminus \hat{E}_{2}} c_{\text{ei}} (x) \end{align*} $$
- $c_{\text{vs}} (x)$ is the cost of substituting $x \in \hat{X}_{1}$ to $f(x) \in \hat{X}_{2}$.
- $c_{\text{vd}} (x)$ is the cost of deleting $x \in X_{1} \setminus \hat{X}_{1}$ from $G_{1}$.
- $c_{\text{vi}} (x)$ is the cost of adding $x \in X_{2} \setminus \hat{X}_{2}$ to $G_{2}$.
- $c_{\text{es}} (x)$ is the cost of substituting $e = (x,y) \in \hat{E}_{1}$ to $\left( f(x), f(y) \right) \in \hat{E}_{2}$.
- $c_{\text{ed}} (x)$ is the cost of deleting $e \in E_{1} \setminus \hat{E}_{1}$ from $G_{1}$.
- $c_{\text{ei}} (x)$ is the cost of adding $e \in E_{2} \setminus \hat{E}_{2}$ to $G_{2}$.
Edit Distance
The edit distance $d \left( G_{1} , G_{2} \right)$ between two graphs $G_{1}$ and $G_{2}$ is defined as the minimum cost among the costs of ecgms from $G_{1}$ to $G_{2}$: $$ d \left( G_{1} , G_{2} \right) := \min \left\{ c(f) : f \text{ is an ecgm from } G_{1} \text{ to } G_{2} \right\} $$
Explanation
While the formal definition might seem overly verbose, the concept is quite simple and straightforward.
For example, consider the two graphs shown above. If the costs for addition, deletion, and replacement are all $1$, then the graph edit distance is the minimum number of times vertices and edges need to be added, deleted, or replaced to transform one graph into the other. In this case, to transform the graph on the left into the one on the right, one vertex needs to be added, and one of the existing edges needs to be replaced to connect to the new node, making the graph edit distance $2$.
See Also
Bunke. (1997). On a relation between graph edit distance and maximum common subgraph. https://doi.org/10.1016/S0167-8655(97)00060-3 ↩︎