ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Ceci est la version de TEST de DIAL.mem. Veuillez ne pas soumettre votre mémoire sur ce site mais bien à l'URL suivante: 'https://thesis.dial.uclouvain.be'.
This is the TEST version of DIAL.mem. Please use the following URL to submit your master thesis: 'https://thesis.dial.uclouvain.be'.
 

Unsupervised topic modeling for short documents

(2021)

Files

LIESSENS_86041200_2021.pdf
  • Open access
  • Adobe PDF
  • 2.55 MB

LIESSENS_86041200_2021_APPENDIX1.zip
  • Open access
  • Unknown
  • 20.03 KB

Details

Supervisors
Faculty
Degree label
Abstract
It is hard to summarize information. This is especially true when the information is high-dimensional, such as text data. Topic models are tools to summarize this kind of data by assuming the existence of underlying abstract topics. Unfortunately, they face different issues when applied on short documents, like an increased sparsity. In this thesis, we review existing topic models and compare different solutions that were proposed to address the short document issue. After carefully selecting models, we fit them on multiple simulated datasets, which we generate with various settings, and real-word data consisting of tweets related to the #BlackLivesMatter movement. The results of these comparisons are discussed, along with the limits of the experiments and perspectives for future work.