Rareness quantification of groups of cells and application to large-scale single-cell (bi)clustering
Files
Dubuisson_63021700_2024.pdf
Open access - Adobe PDF
- 1.45 MB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- Identifying rare subpopulations of cells is critical in the field of medicine. It enhances our understanding of diseases, enables more accurate diagnostics, and facilitates earlier detection of illnesses. In the past decade, single-cell RNA sequencing techniques (scRNA-seq) have emerged. They provide a measurement of the gene expression profile of individual cells, information which was previously only observable for a bulk sample of cells. Researchers have used scRNA-seq data to develop algorithms that analyse the rareness of cells based on their respective gene expression compared to a given population of cells. Finder of rare entities (FiRE) is a method that assigns a rareness score to each cell by comparing its gene expression against the expression profile of the rest of the population. MicroCellClust 2 (MCC2) is a beam search-based algorithm that returns a small subpopulation of cells that express highly specific genes. MCC2 uses the results of FiRE to prune the cells to consider, in order to make the algorithm more efficient. In this thesis, we propose FiRE-n, an algorithm that expands the FiRE methodology by enabling it to assign a rareness score to a group of 1 to n cells. This score is based both on the rareness of the cells forming the group and on their relative homogeneity. The results show that FiRE-n can identify homogeneous subpopulations of rare cells correctly. Following the development of FiRE-n, we introduce MCC2*, a version of the MCC2 algorithm that uses FiRE-n to prune the groups at each level of the beam search. MCC2* returns slightly better solutions than MCC2 with a runtime reduced by 20% to 25%. Finally, we propose a novel method that retrieves homogeneous groups made of rare cells. This method is called "retriever of critical clusters" (ReCC). Results show that ReCC returns rare and homogeneous clusters effectively.