ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Ceci est la version de TEST de DIAL.mem. Veuillez ne pas soumettre votre mémoire sur ce site mais bien à l'URL suivante: 'https://thesis.dial.uclouvain.be'.
This is the TEST version of DIAL.mem. Please use the following URL to submit your master thesis: 'https://thesis.dial.uclouvain.be'.
 

Fairness in supervised classification: investigation of three different techniques

(2022)

Files

BEGHEIN_KNEIP_65101700_25831700_2022.pdf
  • UCLouvain restricted access
  • Adobe PDF
  • 2.39 MB

Details

Supervisors
Faculty
Degree label
Abstract
As society is becoming more and more data-driven, classification models are increasingly used to support decision-making. However, one must be aware that their predictions may cause unfairness, especially if the models are built on biased data. Indeed, algorithms detect any link between data without being able to distinguish acceptable links from unfair ones. As a result, they are likely to reproduce or even reinforce discrimination. These concerns have been observed in various contexts such as the selection of job applicants or the allocation of loans. It is therefore necessary to find ways to ensure fairness in the decisions made by classification models. In this thesis, we analyse three techniques to decorrelate the predictions of various models with respect to the sensitive variable (ethnicity, gender, age, etc.). The first investigated method is the pre-processing that projects the data into a space orthogonal to the sensitive variable before training the model on this data. Second, our in-processing technique integrates fairness constraints during the models’ learning phase. Finally, the post-processing technique studied is a simple least squares with fairness constraints that modifies the models’ predictions to make them more fair. The results obtained show that these techniques reach a good trade-off between fairness and accuracy, for some classification models. They therefore seem to manage the unfairness present in different datasets.