Graph coarsening: Representing macro structure

j.andries.j.steenkamp

Graph coarsening: Representing macro structure

Some information is best represented in network format (i.e., nodes and links). However, large networks are cumbersome to navigate, and macro patterns can be buried under the clutter of many nodes.

One method to reveal a large network structure is called graph coarsening. It involves the generation of successively smaller (i.e., with fewer nodes) graphs that retain similar properties to the original graph. Owing to their similarity, they give insight into the original graph, and owing to their small size, they are easy to interpret.

One can think of graph coarsening as being analogous to how artists extract the “essential” feature of an image, thereby representing a similar message with fewer details. “Le Taureau” is a series of lithographs produced by Pablo Picasso, where he successively removes details from a bull till only a “linear rendering” remains.

Definitions

We begin by stating some context and definitions. Consider the graph.

$$G=(V:=\{1,2,…,n\},~E \subseteq V \times V \setminus \{\{i,i\}~|~i \in V\})$$

From graphs, we can derive a subsequent structure revealing mathematical objects.

  • Adjacency matrix $A := {\Huge [} A_{i,j}:= \begin{cases} 1 & \text{if}~\{i,j\} \in E \\ 0 & \text{else}\end{cases} {\Huge ]}_{i,j \in V}$
  • Degree matrix $D := {\Huge [} D_{i,j}:= \begin{cases} \deg_i & \text{if}~i=j \\ 0 & \text{else}\end{cases} {\Huge ]}_{i,j \in V}$
  • $\deg_i := |\{j \in V ~|~ \{i,j\} \in E\}|$
  • Combinatorial Laplacian $L:=D – A$
  • With eigenvalues $\lambda(L): = \lambda_1,\lambda_2,…,\lambda_n$, where $\lambda_1(L) = 0$ $\because L [1, 1, …,1]^T = [0, 0,…, 0]^T$

Given a vertex partition $\cal{C}={C_1,\ldots,C_k}$ (i.e., $V = {\huge \cup}_{i=1}^kC_i$ and $C_i \cap C_j = \emptyset$ for all $i\neq j \in [k]$), the partitioning sets $C_1,\ldots,C_k$ are called “communities“, and from these communities we can define another “coarser” graph as follows:

$$G/c:= {\Huge(} V_{c} := \{1,…,k\},~E_{c}:= \{\{i, j\} ~|~\exists \{u,v\} \in E ~s.t.~ u \in C_i,~v \in C_j\} {\Huge )}.$$


This graph is called a “quotient graph”. In the quotient graph $G/c$ we treat the communities of $G$ as units. Hence, details like connections within communities are lost, but we also have a simpler (i.e., fewer vertices) graph to consider. Note that the degree to which the quotient graph $G/c$ captures the macro structure of $G$ is dependent on how well the communities capture “groupings” of vertices. We consider two metrics of community quality:

Modularity: $Q(G, C_1,\ldots,C_k) := \frac{1}{2 |E|} \sum_{i,j}{\huge (} A_{ij} – \frac{\deg_i\deg_j}{2|E|} {\huge )} \delta(i, j)$, where $\delta(i,j):= \begin{cases} 1 & \text{if}~c(i)=c(j)\\ 0 & \text{else} \end{cases}$ and $c: V \in i \mapsto C \in \cal{C}$ is the map that assigns each vertex uniquely to community. Modularity, hence, measures how much more densely connected the communities are than one would expect by chance.

Conductance: $\phi(S)=\frac{\text{cut}(S,V\setminus S)}{\min(\text{vol}(S),\text{vol}(V\setminus S))}$, where, $\text{cut}(S,V\setminus S) := |\{(i,j) \in E\ ~|~ i \in S,~ j \in V\setminus S\}|$ and $\text{vol}(S) := \sum_{i \in S}\deg_i$ for any set $S \subseteq V$. A small value of $\phi(S)$ means there are relatively few edges leaving S compared to the amount of connectivity inside $S$, making $S$ a good candidate community.

We consider two algorithmic approaches (without explaining any details) to finding communities:

  • Louvain/Leiden: explicitly optimize modularity (or a closely related objective)
  • Spectral clustering: approximately finds low-conductance cuts through Laplacian eigenvectors.

An Example Application

Consider the following 39-node network.

Without ordering or highlighting, it looks like a jumbled mess. After ordering the vertices (painstakingly by hand), we see that there are distinct groupings. The groupings are more interconnected internally than they are connected to other groupings, i.e., communities.

The ordering also transfers to the Laplacian; we can see that there are a few connections between non-adjacent (according to the ordered graph) communities.

Using colors, we can show the grouping on the two levels of coarseness: 9 communities and 3 communities.

These communities respectively introduce quotient graphs with 9 and 3 vertices:

If we chose not to reveal/construct the communities by hand, we could use the detection method mentioned earlier. For example here are the results of the Louvain/Leiden algorithm. Note that we don’t set the number of communities, and that the algorithm of its own accord found seven.

The spectral community detection algorithm has a parameter for the number of communities. These are the results when searching for three and nine communities, respectively:

These results are not quite in line with the constructed truth we use to build the quotient graphs. However, they are automatic and do optimize according to the metrics in their design. We collect below the community quality metrics for the five respective partitions.

Conclusion

There are often macro structures on multiple levels that are drowned out by the minutiae of details.If you put in the effort of ordering and grouping like with like, you can make sense of the madness. If you want a mathematician of fortune to guide you through the noise of messy networks, consider hiring the MathMerc.

Leave a Reply

Your email address will not be published. Required fields are marked *