ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Ceci est la version de TEST de DIAL.mem. Veuillez ne pas soumettre votre mémoire sur ce site mais bien à l'URL suivante: 'https://thesis.dial.uclouvain.be'.
This is the TEST version of DIAL.mem. Please use the following URL to submit your master thesis: 'https://thesis.dial.uclouvain.be'.
 

Unsupervised clustering machine learning on packed executable

(2022)

Files

WATHELET_94351800_2022.pdf
  • Open access
  • Adobe PDF
  • 1016.52 KB

Details

Supervisors
Faculty
Degree label
Abstract
In today’s threat landscape, packers are increasingly used by malware creators to hinder static analysis. Using a specialised tool for each packer family to analyse them is becoming essential. In order to do this, properly classifying packed executable is important to apply the right unpacking tool. While supervised machine learning get good result with already known packers, they struggle with new packers. In this report, we test two new features and two unsupervised clustering algorithms, DBSCAN and OPTICS. We start by creating a dataset of packed executable, with its ground truth. Using previous works, we extract fifty-six features and find which combination gives the best classification results. Using this result, we tested our two new features. Our first new feature is the semantic sequence, representing the semantic of the first 100 mnemonics executed by each sample. The semantic of a mnemonic is one of four categories: Data movement, Control, Arithmetic/logic and Other. We also experimented with the length of these mnemonic and semantic sequences. Our second feature was looking for hidden import of 16 malicious API functions. And finally, we compared OPTICS with the results from DBSCAN. In our tests, none of these new features improved our classification results. And OPTICS gave us worse result than DBSCAN, all while being much slower.