ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Ceci est la version de TEST de DIAL.mem. Veuillez ne pas soumettre votre mémoire sur ce site mais bien à l'URL suivante: 'https://thesis.dial.uclouvain.be'.
This is the TEST version of DIAL.mem. Please use the following URL to submit your master thesis: 'https://thesis.dial.uclouvain.be'.
 

On the transferability of adversarial examples in Machine Learning : the case of email spam detection

(2021)

Files

Jacquet_05121500_2021.pdf
  • Open access
  • Adobe PDF
  • 1.97 MB

Jacquet_05121500_2021_Appendix.zip
  • Open access
  • Unknown
  • 638.03 KB

Details

Supervisors
Degree label
Abstract
Nowadays, machine learning techniques are used widely. As they are more and more employed, the question of safety comes into play: are machine learning techniques robust to attacks from malicious users? It has been shown that they are not. In particular, a number of articles have shown that even state-of-the-art machine learning models are weak to adversarial examples attacks. So-called adversarial examples are inputs modified as to be misclassified by a model. In image classification, these adversarial examples have also been shown to transfer across models: adversarial examples crafted by attacking a model A often fools another model B. This phenomenon called transferability has been less studied in the case of text classification. In this thesis, we study the transferability of adversarial examples for email spam detection. For this purpose, we choose six models: (i) multinomial naive Bayes classifier, (ii) logistic regression, (iii) linear support vector machine, (iv) random forest, (v) k-nearest neighbors and (vi) convolution neural networks. We also chose to focus on two attacks: (i) text fooler, a black-box attack, and (ii) hot-flip, a white-box attack. As our main result, we evidence that, for our two chosen attacks, adversarial examples do not transfer when one takes into account natural constraints on the transformations that an email can undergo.