Files
Nikraftar_42611900_2024.pdf
Open access - Adobe PDF
- 25.33 MB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- The expert-advice Multi-Armed Bandit (MAB) is a framework in which we rely on the advice gathered from the experts to perform a task. In this thesis, we discuss a classification problem where the experts serve as classifiers. The goal is to optimize an objective function describing the correct classification rate as well as the cost associated with consulting the experts over a series of turns. We study the case where a series of stochastic experts, each with different costs and classification capabilities, must be used to perform binary classification on a large group of patients. The expert-advice MAB problem is modeled as a Reinforcement Learning problem (RL), where a deep-learning agent is responsible for making the decision of which experts to consult at each turn based on the data available on the experts. The RL algorithms are trained to maximize the cumulative reward signal, thus maximizing the objective function. Different implementations of the MAB as RL environments are designed to exploit the symmetries of the problem and to incorporate some RL tricks, such as reward shaping, to improve the performance of RL based algorithms. The performance of these RL based approaches is then compared to that of a pre-existing algorithm for this type of problem. The results indicate that while RL-based approaches can effectively optimize decision-making in small-scale problems under strong assumptions, they encounter challenges related to scalability and robustness. The RL methods successfully identified local optima with higher expected rewards, but this came at the cost of increased computational demands. These findings contribute to the understanding of how RL can be applied to expert-advice MAB problems and suggest areas for future research to address these limitations and improve the practical applicability of RL-based solutions.