ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Ceci est la version de TEST de DIAL.mem. Veuillez ne pas soumettre votre mémoire sur ce site mais bien à l'URL suivante: 'https://thesis.dial.uclouvain.be'.
This is the TEST version of DIAL.mem. Please use the following URL to submit your master thesis: 'https://thesis.dial.uclouvain.be'.
 

Resolution of the big-data problem related to a dimension reduction algorithm based on multi-scale similarities in stochastic neighbor embedding

(2015)

Files

Souris_32970900_2015.pdf
  • Open access
  • Adobe PDF
  • 1.79 MB

Details

Supervisors
Faculty
Degree label
Abstract
Data visualization has always been a necessity. That is why the dimension reduction field is an important part of machine learning. One of the best algorithms to do data visualization is the multi-scale stochastic neighbor embedding (Ms.~SNE). But because of its time complexity of O(N^2 \log(N)), it is not suitable for large databases. In order to solve this Big Data problem, the solution proposed here is an accelerated version of Ms. SNE. It uses metric trees to approximate the data cloud into clusters and to reduce the cost to a O(N \log^2(N)) time complexity. This is a new research and the resulted solution is not perfect yet but the results prove that the approximations added to the original algorithm allow the code to run on larger databases with a minimum loss of precision.