ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Ceci est la version de TEST de DIAL.mem. Veuillez ne pas soumettre votre mémoire sur ce site mais bien à l'URL suivante: 'https://thesis.dial.uclouvain.be'.
This is the TEST version of DIAL.mem. Please use the following URL to submit your master thesis: 'https://thesis.dial.uclouvain.be'.
 

High availability solution for slurm in an HPC cluster using kubernetes

(2024)

Files

Aji_88991600_2024.pdf
  • Open access
  • Adobe PDF
  • 2.38 MB

Details

Supervisors
Faculty
Degree label
Abstract
This master's thesis aims to study and analyze High-Availability solutions for the workload manager Slurm in an HPC cluster using the container orchestration framework Kubernetes. The primary objective is to explore the possibility of ensuring high availability for Slurm through Kubernetes. This involves both integrating Slurm into a Kubernetes cluster and assessing Kubernetes' capability to guarantee high availability for Slurm. The study focuses on integrating Slurm's controller daemon and database daemon into the Kubernetes environment. It delves into Kubernetes and its potential to provide high availability to Slurm components running in the Kubernetes cluster. Additionally, the thesis covers the selection of Kubernetes-Slurm cluster implementation, the high availability strategy, and presents the results of high availability tests, including simulations of various failure scenarios, to evaluate Kubernetes' capability to ensure High availability and data persistence for Slurm.