Combination of Rankings - A Proper Merging of Experts Views

Let’s assume that several experts have a view on the rankings of some variables. Take for example the temperature in many cities in one month. There is only one true ranking that will be revealed in the next month by measuring the temperature at the locations. One will find, for example, that Hong Kong > Ho Chi Minh City > Singapore > Paris > London > … > Edinburgh. Depending on their quantitative models or their convictions and experiences, our experts will end up with different rankings. Moreover, they may prefer to rank only some of the cities they know well (e.g. South East Asia), or for which their models work. Experts (or external party) may also have a more or less strong confidence in their views.

That is, we have:

  • one true ranking that is unknown


  • several rankings that potentially span only a subset of the entire list
  • for each ranking, a percentage that expresses the confidence in the ranking

Question: How do we combine all these rankings to estimate the one true ranking?

Let’s define as a baseline this simple strategy:

  1. Transform each ranking into a uniform defined on [0,1], i.e. for ranking #i, u_k = rank_k / length(ranking #i) in [0,1]
  2. For each element to be ranked, (weight) average its transformed ranks u_k to obtain v_k
  3. Rank the v_k variables to obtain the final ranking

Can we do better?

Here, I propose an approach based on transforming the input rankings into a markov chain.

The algorithm:

Consider a directed graph G = (V, E), where G = the elements to be ranked. E is initially empty.

  1. For each ranking with confidence w, add to E the edges (i, j, weight=w) such that i > j in the ranking
  2. For each vertex v in V, for each edge e in E such that e = (v, a in V, weight = w_e), replace this edge by e’ = (v, a in V, weight = w_e / sum(w_b for e’’ in (v, b in V, weight=w_b); in other words, normalizing edges exiting a given node such that their weights sum to 1
  3. Random walk this markov chain to find its stationary distribution

The idea there is that elements at the top of the ranking will be traversed much less times (because few or no other elements are pointing toward them) than elements at the bottom of the ranking (because most of the other elements are pointing toward them).

To obtain the final ranking, we sort the elements by the number of times they were traversed in the graph during the random walks.

Now, some code. Let’s apply it on a small example. I will detail and evaluate thoroughly in subsequent notebooks.

The true ranking to recover:

import numpy as np
import pandas as pd
import networkx as nx

true_ranking = {
    1: 'Kuwait',
    2: 'Cairo',
    3: 'Bucharest',
    4: 'Rome',
    5: 'Singapore',
    6: 'Paris',
    7: 'Hong Kong',
    8: 'Prague',
    9: 'Oslo',
    10: 'Lima',
    11: 'Mexico City',
    12: 'Vancouver',
    13: 'Buenos Aires',
    14: 'Perth',
    15: 'Canberra'

true_ranking = pd.Series([rank for rank, city in true_ranking.items()],
                         index=[city for rank, city in true_ranking.items()])
ranking_1 = (
    {1.0: 'Kuwait',
     2.0: 'Singapore',
     3.0: 'Rome',
     4.0: 'Prague',
     5.0: 'Vancouver'},

ranking_2 = (
    {1.0: 'Oslo',
     2.0: 'Mexico City',
     3.0: 'Singapore',
     4.0: 'Prague',
     5.0: 'Hong Kong',
     6.0: 'Paris',
     7.0: 'Rome',
     8.0: 'Lima',
     9.0: 'Kuwait',
     10.0: 'Vancouver',
     11.0: 'Buenos Aires',
     12.0: 'Cairo',
     13.0: 'Bucharest',
     14.0: 'Canberra',
     15.0: 'Perth'},

ranking_3 = (
    {1.0: 'Kuwait',
     2.0: 'Singapore',
     3.0: 'Prague',
     4.0: 'Hong Kong',
     5.0: 'Lima',
     6.0: 'Vancouver',
     7.0: 'Rome',
     8.0: 'Bucharest',
     9.0: 'Cairo',
     10.0: 'Buenos Aires'},
expert_rankings = [

Ranking using Random Walks

def build_graph(rankings):
    rank_graph = nx.DiGraph()
    for ranking, ic in rankings:
        for node in ranking.values():
    for ranking, ic in rankings:
        for rank_up in range(1, len(ranking)):
            for rank_dn in range(rank_up + 1, len(ranking) + 1):
                u = ranking[rank_up]
                v = ranking[rank_dn]

                if u in rank_graph and v in rank_graph[u]:
                    rank_graph.edges[u, v]['weight'] += ic
                    rank_graph.add_edge(u, v, weight=ic)
    return rank_graph

def build_markov_chain(rank_graph):
    for node in rank_graph.nodes():
        sum_weights = 0

        for adj_node in rank_graph[node]:
            sum_weights += rank_graph[node][adj_node]['weight']

        for adj_node in rank_graph[node]:
            rank_graph.edges[node, adj_node]['weight'] /= sum_weights
    return rank_graph

def random_walk(markov_chain, counts, max_random_steps=1000):
    cur_step = 0
    current_node = graph_nodes[np.random.randint(len(graph_nodes))]
    while cur_step < max_random_steps:
        counts[current_node] += 1
        neighbours = [(adj, markov_chain[current_node][adj]['weight'])
                      for adj in markov_chain[current_node]]

        adj_nodes = [adj for adj, w in neighbours]
        adj_weights = [w for adj, w in neighbours]

        if not adj_nodes:
            return counts
        current_node = np.random.choice(adj_nodes, p=adj_weights)

        cur_step += 1
    return counts
rank_graph = build_graph(expert_rankings)
markov_chain = build_markov_chain(rank_graph)

graph_nodes = list(markov_chain.nodes())

max_random_steps = 10000
counts = {node: 0 for node in graph_nodes}

nb_jumps = 300
for jump in range(nb_jumps):
    counts = random_walk(markov_chain, counts, max_random_steps)
# rank graph method
final_ranking = sorted([(node, count) for node, count in counts.items()],
                       key=lambda x: x[1])

graph_ranking = (pd.Series([count for node, count in final_ranking],
                          index=[node for node, count in final_ranking])
Paris            1.0
Kuwait           2.0
Singapore        3.0
Oslo             4.0
Mexico City      5.0
Hong Kong        6.0
Lima             7.0
Prague           8.0
Rome             9.0
Vancouver       10.0
Canberra        11.0
Bucharest       12.0
Cairo           13.0
Buenos Aires    14.0
Perth           15.0
dtype: float64

Ranking using Average of the Ranks

# average ranking method
rankings = pd.DataFrame(np.nan,

for idrank in range(len(expert_rankings)):
    for rank, asset in expert_rankings[idrank][0].items():
        rankings[asset].loc[idrank] = rank / len(expert_rankings[idrank][0])
sum_weights = sum([expert_rankings[idrank][1]
                   for idrank in range(len(expert_rankings))])
average_rankings = (rankings
                    .multiply([expert_rankings[idrank][1] / sum_weights
                               for idrank in range(len(expert_rankings))],

average_ranking = average_rankings.rank(ascending=True).sort_values()
Oslo             1.0
Mexico City      2.0
Paris            3.0
Canberra         4.0
Perth            5.0
Hong Kong        6.0
Lima             7.0
Kuwait           8.0
Bucharest        9.0
Singapore       10.0
Cairo           11.0
Buenos Aires    12.0
Prague          13.0
Rome            14.0
Vancouver       15.0
dtype: float64

Comparing the Average of Ranks, and the Random Walks Ranks to the True Ranking

Let’s compare the rank (Spearman) correlation of the obtained ranking by each method with the true ranking:

    average_ranking.corr(true_ranking, method='spearman'),
    graph_ranking.corr(true_ranking, method='spearman'),
pd.concat([true_ranking, average_ranking, graph_ranking], axis=1,
          keys=['true', 'w_avg', 'graph'], sort=False)
true w_avg graph
Kuwait 1 8.0 2.0
Cairo 2 11.0 13.0
Bucharest 3 9.0 12.0
Rome 4 14.0 9.0
Singapore 5 10.0 3.0
Paris 6 3.0 1.0
Hong Kong 7 6.0 6.0
Prague 8 13.0 8.0
Oslo 9 1.0 4.0
Lima 10 7.0 7.0
Mexico City 11 2.0 5.0
Vancouver 12 15.0 10.0
Buenos Aires 13 12.0 14.0
Perth 14 5.0 15.0
Canberra 15 4.0 11.0

Conclusion: The Random Walks method for combining several rankings with potentially very different support seems to outperform the standard and naive way of merging rankings. This method alleviates several problems one can encounter in practice such as biases in missing data when performing ranking, etc. More on this later.