Random intersection trees for genomic data analysis

Supervisors: Dupont, Pierre
Faculty: Ecole polytechnique de Louvain
Degree label: Master [120] : ingénieur civil en informatique
Abstract: In Machine Learning classification, searching for informative interactions in large high-dimensional datasets is computationally intensive. Most algorithms that attempt this usually start with an empty set of variables and greedily add to that set. The drawback is that those approaches tend to miss some informative interactions due to their greedy behaviour. On the other hand, the brute force approach does not exhibit this greedy behaviour but has a high computational cost that renders problems with even moderate numbers of variables infeasible. In 2014, Rajen Dinesh Shah and Nicolai Meinshausen published an article on an alternative approach called Random Intersection Trees. This new approach starts from the full set of variables and removes them by taking intersections with randomly chosen instances. This algorithm boasts a reduced computational cost compared to the brute force method while avoiding the drawbacks of greedy behaviour. The algorithm, as described in the article, remains limited to datasets with binary variables. In this thesis, we will propose modifications to the algorithm in order to generalise it to problems with both continuous and categorical variables. We will also propose classification rules inspired by Random Ferns and Naive Bayes that make use of the interactions from the Random Intersection Trees algorithms. Each variant proposed in this thesis will be analysed from a classification and feature selection perspective. The focus will be placed on how these algorithms perform on high-dimensional (genomic) datasets and why some of them perform better or worse than others.

ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Random intersection trees for genomic data analysis

Files

Ballarini_33581100_2015.pdf

Ballarini_33581100_2015_Annexe1.pdf

Ballarini_33581100_2015_Annexe2.pdf

Ballarini_33581100_2015_Annexe3.pdf

Ballarini_33581100_2015_Annexe4tar.gz

Details