Graph similarity estimation

Encadrants : 

Occurrences : 

2020

Nombre d'étudiants minimum: 

2

Nombre d'étudiants maximum: 

2

Nombre d'instances : 

1

Context

Graph isomorphism refers to the problem of deciding whether two graphs are identical. It has many applications in various domains, ranging from biology to cybersecurity. While NP-hard, this problem can be solved in most cases by simple heuristics, like Weisfeiler-Lehman's algorithm. This algorithm turns out to be useful not only to solve the problem of graph isomorphism but also to derive a distance between two graphs or to propagate labels in the context of semi-supervised learning.

Algorithm [1]

The algorithm works as follows :

  • Initialize all nodes with the same color.
  • Assign a hash to each node base on his neighbourhood "3 blue neighbours and 1 red neighbour".
  • Assign the same color to nodes with same hash.
  • Repeat until convergence.

Application

Any working algorithm could be included in the scikit-network [4] library. This would involve testing those algorithms on real-world large-scale data.

Project

The tasks for this proposal include:

  • familiarising with the scikit-network API
  • implementing the Weisfeiler-Lehman algorithm [2] for graph isomorphism
  • implementing the similarity measure from Shervashidze et al. [3]
  • testing those algorithms on relevant data

The reference language is Python, with emphasis on the SciPy and NumPy libraries.

References

[1] https://blog.smola.org/post/33412570425/the-weisfeiler-lehman-algorithm-and-estimation-on

[2] https://arxiv.org/abs/1101.5211

[3] https://people.mpi-inf.mpg.de/~mehlhorn/ftp/genWLpaper.pdf

[4] https://scikit-network.readthedocs.io