Files
Aji_88991600_2024.pdf
Open access - Adobe PDF
- 2.38 MB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- This master's thesis aims to study and analyze High-Availability solutions for the workload manager Slurm in an HPC cluster using the container orchestration framework Kubernetes. The primary objective is to explore the possibility of ensuring high availability for Slurm through Kubernetes. This involves both integrating Slurm into a Kubernetes cluster and assessing Kubernetes' capability to guarantee high availability for Slurm. The study focuses on integrating Slurm's controller daemon and database daemon into the Kubernetes environment. It delves into Kubernetes and its potential to provide high availability to Slurm components running in the Kubernetes cluster. Additionally, the thesis covers the selection of Kubernetes-Slurm cluster implementation, the high availability strategy, and presents the results of high availability tests, including simulations of various failure scenarios, to evaluate Kubernetes' capability to ensure High availability and data persistence for Slurm.