HCRL - [HIDE]APIMDS-dataset

APIMDS (API-based malware detection system)

Related paper: Ki et al., "A Novel Approach to Detect Malware Based on API Call Sequence Analysis", International Journal of Distributed Sensor Networks, vol. 2015, Article ID 659101, 9 pages, 2015. doi:10.1155/2015/659101

1. Introduction

Various approaches have been proposed for malware detection [1–3]. Detection techniques proposed earlier were based on static analysis. Static analysis examines the binary code, analyzes all possible execution paths, and identifies malicious code without execution [4]. However, analyzing binary code turns out to be difficult nowadays. As obfuscation techniques become more sophisticated, static analysis can be bypassed by various obfuscation techniques, such as polymorphism, encryption, or packing [4]. In addition, as static analysis relies on a prebuilt signature database, it cannot easily detect new unknown malware until the signature is updated [4, 5]. Besides, some execution paths can be only explored after execution [4, 5]. To overcome these limitations of static analysis and complement it, dynamic analysis has been proposed and is widely used to achieve more effective malware detection.

Techniques based on dynamic analysis execute malware and trace its behaviors. Two major approaches in dynamic analysis are control flow analysis and API call analysis. Both approaches detect malware based on analysis of similarity between the behavior of the new and the known ones. However, malware authors try to circumvent those techniques through inserting meaningless codes or shuffling the sequence of programs.

Many of currently available API call analysis techniques fail to detect malware of such circumvention. Some techniques focus on extracting APIs that are frequently observed in malware in each class [6, 7]. They monitor APIs that are called and calculate the frequency and total number of events that certain API function called. Even though they quickly reveal the characteristics of malware in the same class, they fail to show the sequence of malware behavior and can be easily evaded by malware authors’ inserting and executing dummy and redundant API calls.

Others extract API call sequence for each class and develop static signature based on it [8–11]. They are better from the semantic view because they monitor the sequence of calls and the flow of programs. However, simply creating signatures from the extracting frequently found that call sequence for malware in each class does not allow them to detect malware in polymorphic or unknown form. It can be also evaded by malware authors’ evading tricks such as inserting redundant API calls. This incurs the need for new approaches in API call sequence analysis.

Recent few studies focus on the fact that unless the main purpose or functions of the malware are not changed, the critical low-level system call sequence does not change. Therefore, instead of extracting API call sequence for malware in each class, they propose to focus on API call sequence for certain functions of malware [12–14]. However, such approach has not been empirically well studied comparing to the API call sequence analysis techniques proposed previously. In this study, with a large set of data, we empirically study whether such approach generates superior results comparing with the previous ones.

In this study, we adopt sequence alignment algorithm which is known to perform well in extracting the similar subsequences from the different sequences. Sequence alignment algorithm will make us less confused by meaningless codes inserted in malware in its detection. Sequence alignment algorithms have been applied in various areas such as natural language processing and biometrics and have proven their excellence [15]. In this paper, we propose a new approach in API call sequence analysis with introducing sequence alignment algorithm. The rest of the paper is organized as follows. In Section 2, we review the related literature. In Section 3, we present our methodology and experiment. In Section 4, we conclude our research and suggest future research direction.

2. Publication

How to Cite this Article

Youngjoon Ki, Eunjin Kim, and Huy Kang Kim, “A Novel Approach to Detect Malware Based on API Call Sequence Analysis,” International Journal of Distributed Sensor Networks, vol. 2015, Article ID 659101, 9 pages, 2015. doi:10.1155/2015/659101

Download full paper
Download citation as EndNote

3. Dataset Release

For academic purposes, we are happy to release our dataset. However, to avoid indiscriminate distribution of malware, you need the password to unzip the dataset. Please send us a request sent by your official email account. If you use our dataset for your experiment, please cite our paper.

We do not provide malware file itself, instead, we provide full list of API sequences, hash information. You can download malware original file from VirusTotal or malwares.com by using the provided hash information. In addition, there are many crawler to download malware. (e.g. https://github.com/Xen0ph0n/VirusTotal_API_Tool/)
In the dataset, we don't include benign files API sequences. But you can easily generate API call sequences of the benign files by using the NtTrace utility. (https://rogerorr.github.io/NtTrace/)

└ Dataset Download Link: Download

Contact: Huy Kang Kim (cenda at korea.ac.kr)

4. Acknowledgement

APIMDS is developed by the Hacking and Countermeasure Research Lab in the Graduate School of Information Security of the Korea University, Seoul, Korea.

Please contact “Huy Kang Kim” if you have any question.

5. References

N. Idika and A. P. Mathur, A survey of malware detection techniques [Predoctoral Fellowship, and Purdue Doctoral Fellowship], Purdue University, 2007.
P. Vinod, R. Jaipur, V. Laxmi, and M. S. Gaur, “Survey on malware detection methods,” in Proceedings of the 3rd Hackers' Workshop on Computer and Internet Security (IITKHACK '09), 2009.
S. Cesare and Y. Xiang, Software Similarity and Classification, Springer Science & Business Media, 2012.
P. Okane, S. Sezer, and K. McLaughlin, “Obfuscation: the hidden malware,” IEEE Security & Privacy, vol. 9, no. 5, pp. 41–47, 2011. View at Publisher · View at Google Scholar · View at Scopus
A. Moser, C. Kruegel, and E. Kirda, “Limits of static analysis for malware detection,” in Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC '07), pp. 421–430, December 2007.View at Publisher · View at Google Scholar · View at Scopus
V. S. Sathyanarayan, P. Kohli, and B. Bruhadeshwar, “Signature generation and detection of malware families,” in Information Security and Privacy, Springer, Berlin, Germany, 2008. View at Google Scholar
R. Tian, M. R. Islam, L. Batten, and S. Versteeg, “Differentiating malware from cleanware using behavioural analysis,” in Proceedings of the 5th International Conference on Malicious and Unwanted Software (MALWARE '10), pp. 23–30, Nancy, France, October 2010. View at Publisher · View at Google Scholar · View at Scopus
M. Shankarapani, K. Kancherla, S. Ramammoorthy, R. Movva, and S. Mukkamala, “Kernel machines for malware classification and similarity analysis,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '10), pp. 1–6, July 2010. View at Publisher · View at Google Scholar · View at Scopus
M. K. Shankarapani, S. Ramamoorthy, R. S. Movva, and S. Mukkamala, “Malware detection using assembly and API call sequences,” Journal in Computer Virology, vol. 7, no. 2, pp. 107–119, 2011. View at Publisher · View at Google Scholar · View at Scopus
A. Sami, B. Yadegari, H. Rahimi, N. Peiravian, S. Hashemi, and A. Hamze, “Malware detection based on mining API calls,” in Proceedings of the 25th Annual ACM Symposium on Applied Computing (SAC '10), pp. 1020–1025, ACM, March 2010. View at Publisher · View at Google Scholar · View at Scopus
Y. Ye, D. Wang, T. Li, and D. Ye, “IMDS: intelligent malware detection system,” in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1043–1047, ACM, August 2007. View at Publisher · View at Google Scholar · View at Scopus
S. Peisert, M. Bishop, S. Karin, and K. Marzullo, “Analysis of computer intrusions using sequences of function calls,” IEEE Transactions on Dependable and Secure Computing, vol. 4, no. 2, pp. 137–150, 2007.View at Publisher · View at Google Scholar · View at Scopus
J. Bergeron, M. Debbabi, J. Desharnais, M. M. Erhioui, Y. Lavoie, and N. Tawbi, “Static detection of malicious code in executable programs,” in Proceedings of the Symposium on Requirements Engineering for Information Security (SREIS '01), 2001.
H.-M. Sun, Y.-H. Lin, and M.-F. Wu, “API monitoring system for defeating worms and exploits in MS-Windows system,” in Information Security and Privacy, vol. 4058 of Lecture Notes in Computer Science, pp. 159–170, Springer, Berlin, Germany, 2006. View at Publisher · View at Google Scholar
Sequence Alignment, http://en.wikipedia.org/wiki/Sequence_alignment.