Files
Ballarini_33581100_2015.pdf
Open access - Adobe PDF
- 982.42 KB
Ballarini_33581100_2015_Annexe1.pdf
Open access - Adobe PDF
- 106.33 KB
Ballarini_33581100_2015_Annexe2.pdf
Open access - Adobe PDF
- 103.51 KB
Ballarini_33581100_2015_Annexe3.pdf
Open access - Adobe PDF
- 272.59 KB
Ballarini_33581100_2015_Annexe4tar.gz
Open access - Unknown
- 24.85 KB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- In Machine Learning classification, searching for informative interactions in large high-dimensional datasets is computationally intensive. Most algorithms that attempt this usually start with an empty set of variables and greedily add to that set. The drawback is that those approaches tend to miss some informative interactions due to their greedy behaviour. On the other hand, the brute force approach does not exhibit this greedy behaviour but has a high computational cost that renders problems with even moderate numbers of variables infeasible. In 2014, Rajen Dinesh Shah and Nicolai Meinshausen published an article on an alternative approach called Random Intersection Trees. This new approach starts from the full set of variables and removes them by taking intersections with randomly chosen instances. This algorithm boasts a reduced computational cost compared to the brute force method while avoiding the drawbacks of greedy behaviour. The algorithm, as described in the article, remains limited to datasets with binary variables. In this thesis, we will propose modifications to the algorithm in order to generalise it to problems with both continuous and categorical variables. We will also propose classification rules inspired by Random Ferns and Naive Bayes that make use of the interactions from the Random Intersection Trees algorithms. Each variant proposed in this thesis will be analysed from a classification and feature selection perspective. The focus will be placed on how these algorithms perform on high-dimensional (genomic) datasets and why some of them perform better or worse than others.