logo

Graph Edit Distance Between Graphs 📂Graph Theory

Graph Edit Distance Between Graphs

Definition 1

Let’s denote a finite set of vertices as XX and a set of a finite alphabet as α\alpha, where the alphabet is considered to include spaces or nulls. A graph is defined as a triple G=(X,V,E)G = (X, V, E) concerning vertex labeling V:XαV : X \to \alpha and edge labeling E:X×XαE : X \times X \to \alpha. That G^=(X^,V^,E^)\hat{G} = \left( \hat{X}, \hat{V}, \hat{E} \right) is a subgraph of GG means satisfying the following three conditions:

  • X^X\hat{X} \subset X
  • For xX^\forall x \in \hat{X}, there is V^(x)=V(x)\hat{V}(x) = V(x)
  • For (x,y)X^×X^\forall (x, y) \in \hat{X} \times \hat{X}, there is E^(x,y)=E(x,y)\hat{E} \left( x , y \right) = E \left( x , y \right)

ecgm

Let G1=(X1,V1,E1)G_{1} = \left( X_{1}, V_{1}, E_{1} \right) and G2=(X2,V2,E2)G_{2} = \left( X_{2}, V_{2}, E_{2} \right) be two graphs. An error-correcting graph matching (ecgm) from G1G_{1} to G2G_{2}, denoted as f:X^1X^2f : \hat{X}_{1} \to \hat{X}_{2}, is a bijective function concerning the subgraphs G^1=(X^1,V^1,E^1)\hat{G}_{1} = \left( \hat{X}_{1}, \hat{V}_{1}, \hat{E}_{1} \right) and G^2=(X^2,V^2,E^2)\hat{G}_{2} = \left( \hat{X}_{2}, \hat{V}_{2}, \hat{E}_{2} \right) of these graphs.

Cost of ecgm

For convenience, let’s denote the range of edge labeling E^k(X^k×X^k)\hat{E}_{k} \left( \hat{X}_{k} \times \hat{X}_{k} \right) as E^k\hat{E}_{k}. The cost c(f)c(f) of an ecgm f:X^1X^2f: \hat{X}_{1} \to \hat{X}_{2} from G1G_{1} to G2G_{2} is defined as follows:

c(f):=xX^1cvs(x)+xX1X^1cvd(x)+xX2X^2cvi(x)+eE^1ces(x)+eE1E^1ced(x)+eE2E^2cei(x) \begin{align*} c(f) :=& \sum_{x \in \hat{X}_{1}} c_{\text{vs}} (x) + \sum_{x \in X_{1} \setminus \hat{X}_{1}} c_{\text{vd}} (x) + \sum_{x \in X_{2} \setminus \hat{X}_{2}} c_{\text{vi}} (x) \\ & + \sum_{e \in \hat{E}_{1}} c_{\text{es}} (x) + \sum_{e \in E_{1} \setminus \hat{E}_{1}} c_{\text{ed}} (x) + \sum_{e \in E_{2} \setminus \hat{E}_{2}} c_{\text{ei}} (x) \end{align*}

  • cvs(x)c_{\text{vs}} (x) is the cost of substituting xX^1x \in \hat{X}_{1} to f(x)X^2f(x) \in \hat{X}_{2}.
  • cvd(x)c_{\text{vd}} (x) is the cost of deleting xX1X^1x \in X_{1} \setminus \hat{X}_{1} from G1G_{1}.
  • cvi(x)c_{\text{vi}} (x) is the cost of adding xX2X^2x \in X_{2} \setminus \hat{X}_{2} to G2G_{2}.
  • ces(x)c_{\text{es}} (x) is the cost of substituting e=(x,y)E^1e = (x,y) \in \hat{E}_{1} to (f(x),f(y))E^2\left( f(x), f(y) \right) \in \hat{E}_{2}.
  • ced(x)c_{\text{ed}} (x) is the cost of deleting eE1E^1e \in E_{1} \setminus \hat{E}_{1} from G1G_{1}.
  • cei(x)c_{\text{ei}} (x) is the cost of adding eE2E^2e \in E_{2} \setminus \hat{E}_{2} to G2G_{2}.

Edit Distance

The edit distance d(G1,G2)d \left( G_{1} , G_{2} \right) between two graphs G1G_{1} and G2G_{2} is defined as the minimum cost among the costs of ecgms from G1G_{1} to G2G_{2}: d(G1,G2):=min{c(f):f is an ecgm from G1 to G2} d \left( G_{1} , G_{2} \right) := \min \left\{ c(f) : f \text{ is an ecgm from } G_{1} \text{ to } G_{2} \right\}

Explanation

While the formal definition might seem overly verbose, the concept is quite simple and straightforward.

For example, consider the two graphs shown above. If the costs for addition, deletion, and replacement are all 11, then the graph edit distance is the minimum number of times vertices and edges need to be added, deleted, or replaced to transform one graph into the other. In this case, to transform the graph on the left into the one on the right, one vertex needs to be added, and one of the existing edges needs to be replaced to connect to the new node, making the graph edit distance 22.

See Also


  1. Bunke. (1997). On a relation between graph edit distance and maximum common subgraph. https://doi.org/10.1016/S0167-8655(97)00060-3 ↩︎