ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Ceci est la version de TEST de DIAL.mem. Veuillez ne pas soumettre votre mémoire sur ce site mais bien à l'URL suivante: 'https://thesis.dial.uclouvain.be'.
This is the TEST version of DIAL.mem. Please use the following URL to submit your master thesis: 'https://thesis.dial.uclouvain.be'.
 

Tabular data synthesis using Generative Adversarial Networks : an application to table augmentation

(2021)

Files

Couplet_54711600_2021.pdf
  • Open access
  • Adobe PDF
  • 1.62 MB

Details

Supervisors
Degree label
Abstract
In this master thesis, we design an efficient tabular data synthesizer and study the use of synthetic data for table augmentation. While tabular data is the most common data modality, qualitative tabular data is not always easy to obtain or access, hence the need for synthetic data. Nevertheless, because of its high complexity, synthesizing tabular data is not an easy task. The main difficulties lie in the simultaneous processing of numerical and categorical columns, and in the modeling of the intricate relationships between them. State-of-the-art tabular data synthesizers are based on generative adversarial networks (GANs). In particular, CTGAN introduces several techniques to deal with complex multi-modal numerical columns and uses one-hot encoding to represent categorical attributes. However, it struggles to effectively capture associations between columns. In this thesis, we propose an enhanced version of CTGAN with a novel encoding-decoding structure as an alternative to one-hot encoding. Through rigorous evaluation, we show that it significantly improves the quality of the synthesized data. We notably achieve a 55% increase in machine learning efficacy and obtain encouraging results for data augmentation in the context of imbalanced learning.