Fairness in multi-class classification: investigation of post-processing techniques with Insights from the Datagotchi project
Files
Giahi_09692100_2024.pdf
Embargoed access until 2025-03-03 - Adobe PDF
- 2.24 MB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- As machine learning increasingly influences decision-making across various domains, the importance of fair and unbiased classification models has become paramount. However, these models, often trained on biased data, can inadvertently perpetuate or even amplify existing societal inequalities. This concern has been observed in critical areas such as criminal justice, loan approvals, and employment screenings, highlighting the urgent need for fairness-aware machine learning techniques. This thesis investigates fairness in multi-class classification, with a specific focus on post-processing techniques designed to mitigate biases in machine learning models. We address a significant research gap by developing and evaluating novel fairness interventions for multi-class scenarios, moving beyond the prevalent binary classification focus in the existing literature. The research utilizes the Datagotchi dataset, an innovative tool for predicting political party preferences based on lifestyle characteristics. This dataset provides a unique multi-class classification challenge in the context of political science, allowing us to explore fairness implications in a complex, real-world scenario. We introduce two innovative multi-class post-processing methods: Simultaneous Post-processing and EquiClass Fairness Optimizer (EFO). These techniques are compared against an existing multi-class Black-box post-processing approach from the literature. We employ rigorous statistical analysis, including Friedman and Nemenyi tests, to compare the effectiveness of different classifier and post-processing combinations. Our findings reveal that our proposed methods outperform the Black-box approach in balancing fairness and performance across multi-class settings. We observe a clear trade-off between fairness enhancements and classification performance, highlighting the complex challenge of optimizing both simultaneously.