Details
- Supervisors
- Faculty
- Degree label
- Abstract
- This thesis aims to study how reinforcement learning uses the structured representation of an environment offered by Markov decision processes (MDPs), and how MDPs, in turn, benefit from the learning capabilities of reinforcement learning algorithms. First, we will cover the basics of reinforcement learning by explaining some important concepts such as the bandit problem and the exploration-exploitation trade-off. Then, we will present Markov Decision Processes, their properties, their relationship with the Bellman equation and value functions. We will also present some of the most popular reinforcement learning algorithms that are part of the fundamental knowledge on the relationship between reinforcement learning and MDPs. Finally, we will apply the algorithms presented in the thesis to real-life use cases, training intelligent systems using the different algorithms presented and evaluating their performance. We will also have a literature review to understand how reinforcement learning is used in finance and apply reinforcement learning for option pricing. This thesis aims to provide the reader with all the tools and knowledge needed to understand the fundamental concepts of reinforcement learning and MDPs. It also aims to analyze how reinforcement learning can be used in finance and if it can be an alternative to the famous Black & Scholes (B&S) model. Consequently, after reading the thesis, the reader should be armed with all the necessary tools to deepen their knowledge about reinforcement learning and its use of Markov decision processes.