How to compute the Planar Maximally Filtered Graph (PMFG)
How to compute the Planar Maximally Filtered Graph (PMFG)
The Planar Maximally Filtered Graph (PMFG) is a complex network which was first introduced in:
Its main purpose is to filter out complex networks by keeping only the main representative links. Before this paper, the main tool for this task was the Minimum Spanning Tree (MST). Its use on stocks correlation was first proposed (in `modern’ times) by Mantegna in:
The main difference between the PMFG and the MST is that the PMFG will retain a bit more information: If n is the number of nodes, then the MST has n - 1 edges whereas the PMFG has 3(n - 2) edges (compared to the n(n-1)/2 of the complete graph K_n). Besides, the MST is always contained in the PMFG. For these reasons, the ``complex networks in finance’’ literature is (more or less slowly) shifting toward this technique. One reason that could prevent some to adopt the PMFG is the lack of a readily available implementation unlike the MST which is both easier to implement and readily available in many libraries (e.g. NetworkX). This is why we provide here a Python implementation for the PMFG.
According to the seminal paper:
A given complex system [is] composed by n elements where a similarity measure S between pairs of elements is defined, e.g., the weight of links in the original network or the correlation coefficient matrix of the system. An ordered list S_ord of pair of nodes can be constructed by arranging them in descending order according to the value of the similarity s_ij between elements i and j.
The construction algorithm for such graphs is as follows: Following the ordered list S_ord starting from the couple of elements with larger similarity, one adds an edge between element i and element j if and only if the resulting graph can still be embedded on a surface of genus g ≤ k after such edge insertion. This algorithm generates simple, undirected, connected graphs embedded on a surface of genus g = k.
The PMFG corresponds (by definition) to the case genus k = 0, i.e. the PMFG is a planar graph. This characterization provides us with a way of implementing concretely this algorithm: The Kuratowski’s theorem.
Kuratowski’s theorem:
A finite graph is planar if and only if it does not contain a subgraph that is homeomorphic to K_5 or K_3,3.
N.B. Since 2004, there is a computationally more efficient method, the Boyer-Myrvold planarity testing, cf. their article On the Cutting Edge: Simplified O(n) Planarity by Edge Addition describing the algorithm, and a Python wrapper of a C implementation that we will use.
def sort_graph_edges(G):
sorted_edges = []
for source, dest, data in sorted(G.edges(data=True),
key=lambda x: x[2]['weight']):
sorted_edges.append({'source': source,
'dest': dest,
'weight': data['weight']})
return sorted_edges
def compute_PMFG(sorted_edges, nb_nodes):
PMFG = nx.Graph()
for edge in sorted_edges:
PMFG.add_edge(edge['source'], edge['dest'])
if not planarity.is_planar(PMFG):
PMFG.remove_edge(edge['source'], edge['dest'])
if len(PMFG.edges()) == 3*(nb_nodes-2):
break
return PMFG
Let’s apply it on a small toy example with n = 150 nodes. We thus have initially a complete graph of 150*149/2 = 11175 edges. The weights of the edges (here, interpreted as distances) are drawn from the uniform distribution on [0,1]. Then, we sort them from the smallest (distance) to the biggest. And, we try to add them one by one, by increasing value (distance), until the PMFG is built.
import numpy as np
from scipy.spatial.distance import squareform
import networkx as nx
import planarity
nb_nodes = 150
distances = squareform(np.random.uniform(
size=int(nb_nodes * (nb_nodes - 1) / 2)))
distances[np.diag_indices(nb_nodes)] = np.ones(nb_nodes)
complete_graph = nx.Graph()
for i in range(nb_nodes):
for j in range(i+1, nb_nodes):
complete_graph.add_edge(i, j, weight=distances[i,j])
sorted_edges = sort_graph_edges(complete_graph)
PMFG = compute_PMFG(sorted_edges, len(complete_graph.nodes))
We finally obtain the PMFG which has only 3(n-2) edges instead of n(n-1)/2.
print(len(PMFG.edges), "edges instead of", int(nb_nodes*(nb_nodes-1)/2))
444 edges instead of 11175