Andro-AutoPsy : Anti-malware system based on similarity matching of malware and malware creator-centric information

1. Introduction

Andro-AutoPsy is an anti-malware system based on similarity matching of malware-centric and malware creator-centric information. Our system classifies malware samples into similar subgroups by exploiting the profiles extracted from integrated footprints, which are implicitly equivalent to distinct behavior characteristics. Andro-AutoPsy is capable of distinguishing benign and malicious applications and classifying malicious applications into similar behavior groups. Furthermore, Andro-AutoPsy is capable of detecting zero-day threats, which are missed by antivirus scanners.

2. Publication

Jae-wook Jang, Hyunjae Kang, Jiyoung Woo, Aziz Mohaisen, and Huy Kang Kim, “Andro-AutoPsy: Anti-malware system based on similarity matching of malware and malware creator-centric information,” Digital Investigation, vol. 14, pp. 17–35, 2015.

3. Dataset Release

For academic purposes, we are happy to release our dataset. However, to avoid indiscriminate distribution of mobile malware, you need the password to unzip the dataset. Please send us a request sent by your official email account. If you use our dataset for your experiment, please cite our paper.

Contact : Huy Kang Kim (cenda at

    • Please connect to the link below to download this dataset.

        • Before downloading it, please read the following instructions carefully.

          • (1) The most of samples are zipped using 7zip.

          • (2) Then send e-mail to cenda at to get the decompress password. (Please identify your name, affiliation and purpose.)

          • (3) Please use these samples at your own risk.

    • Dataset Download Link: Download

4. Acknowledgement

Andro-AutoPsy is developed by Hacking and Countermeasure Research Lab in the Graduate School of Information Security at the Korea University of Korea.

Please contact “Huy Kang Kim” (cenda at if you have any question.

This paper's dataset is used for the AI/ML based malicious app detection track in '2017 Information Security R&D dataset challenge' in South Korea.

You can find additional resources and tutorials (written in Korean) in the above URLs.

Textual description of dataset .csv