Communities in Networks : an empirical and comparative study of some detection methods
Files
Jungers_17180400_2016.pdf
Open access - Adobe PDF
- 1.72 MB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- Graphs or networks are mathematical structures that consist of elements that can be pairwise linked if some sort of interaction exists between them. Therefore, they can be used as an abstraction to represent complex systems in various fields of research. Several statistical properties are common to many real-world networks, either being natural or man-made. A specific feature also shared by many networks is that they often exhibit internal substructures, called communities, consisting of cohesive groups of elements which are densely connected to each other but only sparsely or not connected at all to other communities of the graph. Community detection is today an active and interdisciplinary field of research. This thesis focuses on criteria that could allow a reliable detection of these substructures and on methods based on these criteria. In the first research question, we will cover and compare the performance of methods that return an optimal number of communities: The so-called Louvain Method based on the Newman-Girvan modularity criterion and the Relation Clustering based on (a) local properties of the graph and on (b) a kernel matrix representing pairwise similarities between the elements of the graph. In a second time for the second research question, we will consider a clustering method, the famous k-means algorithm coming from the field of Data Mining, where the number of clusters (i.e., communities) to be returned by the algorithm is supposed to be known in advance. More specifically, we will consider two variants of this algorithm, (a) a distance- and (b) a kernel-based version. The two variants of the k-means will be compared with the Louvain Method and Relational Clustering this time coerced into returning a quantity of clusters equal to the ground truth quantity. In this perspective, the first part of the thesis provides the reader with a theoretical background of some useful concepts. It also contains some concrete applications of graph study, clustering techniques and community detection methods in the managerial world. In the second part, we present the methodology used in the experiments and the results obtained from the two research questions. A third part concludes the thesis by providing the key findings, the research limitations and proposals of further work.