Exploratory analysis of network data based on the bag-of-paths framework: clustering & embedding
Files
Christoph_38041700_2022.pdf
Open access - Adobe PDF
- 1.99 MB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- The study of graphs is a field that has attracted a lot of interest in recent years. This thesis is based on the graph embedding technique which allows analyzing a graph by converting it into its representation in a low-dimensional space. This representation must be computed in such a way that the properties and structure of the graph are mostly preserved. The goal of this work is to introduce a new similarity distance called the Poisson surprisal distance as well as a new embedding technique the Bag-of-paths embedding and to compare their performances to the performances of well-defined distance measures used with two different embedding methods, namely Multidimensional scaling and t-Distributed Stochastic Neighbor Embedding. Three evaluation techniques are used to evaluate performance. The first is clustering, or the ability of the embedding to detect the communities of a graph. The second is the rank-based criteria, or the ability to preserve neighbors. The last is the combined divergence score, which evaluates the ability to preserve edge densities between the ground-truth communities and the ability to predict the presence of edges between nodes. To perform these comparisons and evaluate the performances of the introduced methods, 17 datasets for which the ground-truth partition is known will be used.