Unsupervised clustering machine learning on packed executable

Wathelet, Jolan

Files

WATHELET_94351800_2022.pdf

Open access
Adobe PDF
1016.52 KB

Download

Details

Supervisors: Legay, Axel
Faculty: Ecole polytechnique de Louvain
Degree label: Master [120] en cybersécurité, à finalité spécialisée: conception et analyse de systèmes
Abstract: In today’s threat landscape, packers are increasingly used by malware creators to hinder static analysis. Using a specialised tool for each packer family to analyse them is becoming essential. In order to do this, properly classifying packed executable is important to apply the right unpacking tool. While supervised machine learning get good result with already known packers, they struggle with new packers. In this report, we test two new features and two unsupervised clustering algorithms, DBSCAN and OPTICS. We start by creating a dataset of packed executable, with its ground truth. Using previous works, we extract fifty-six features and find which combination gives the best classification results. Using this result, we tested our two new features. Our first new feature is the semantic sequence, representing the semantic of the first 100 mnemonics executed by each sample. The semantic of a mnemonic is one of four categories: Data movement, Control, Arithmetic/logic and Other. We also experimented with the length of these mnemonic and semantic sequences. Our second feature was looking for hidden import of 16 malicious API functions. And finally, we compared OPTICS with the results from DBSCAN. In our tests, none of these new features improved our classification results. And OPTICS gave us worse result than DBSCAN, all while being much slower.

ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Unsupervised clustering machine learning on packed executable

Files

WATHELET_94351800_2022.pdf

Details