APIMDS (API-based malware detection system)

Related paper:  Ki et al., "A Novel Approach to Detect Malware Based on API Call Sequence Analysis",  International Journal of Distributed Sensor Networks, vol. 2015, Article ID 659101, 9 pages, 2015. doi:10.1155/2015/659101


1. Introduction

Various approaches have been proposed for malware detection [13]. Detection techniques proposed earlier were based on static analysis. Static analysis examines the binary code, analyzes all possible execution paths, and identifies malicious code without execution [4]. However, analyzing binary code turns out to be difficult nowadays. As obfuscation techniques become more sophisticated, static analysis can be bypassed by various obfuscation techniques, such as polymorphism, encryption, or packing [4]. In addition, as static analysis relies on a prebuilt signature database, it cannot easily detect new unknown malware until the signature is updated [4, 5]. Besides, some execution paths can be only explored after execution [4, 5]. To overcome these limitations of static analysis and complement it, dynamic analysis has been proposed and is widely used to achieve more effective malware detection.

Techniques based on dynamic analysis execute malware and trace its behaviors. Two major approaches in dynamic analysis are control flow analysis and API call analysis. Both approaches detect malware based on analysis of similarity between the behavior of the new and the known ones. However, malware authors try to circumvent those techniques through inserting meaningless codes or shuffling the sequence of programs.

Many of currently available API call analysis techniques fail to detect malware of such circumvention. Some techniques focus on extracting APIs that are frequently observed in malware in each class [6, 7]. They monitor APIs that are called and calculate the frequency and total number of events that certain API function called. Even though they quickly reveal the characteristics of malware in the same class, they fail to show the sequence of malware behavior and can be easily evaded by malware authors’ inserting and executing dummy and redundant API calls.

Others extract API call sequence for each class and develop static signature based on it [811]. They are better from the semantic view because they monitor the sequence of calls and the flow of programs. However, simply creating signatures from the extracting frequently found that call sequence for malware in each class does not allow them to detect malware in polymorphic or unknown form. It can be also evaded by malware authors’ evading tricks such as inserting redundant API calls. This incurs the need for new approaches in API call sequence analysis.

Recent few studies focus on the fact that unless the main purpose or functions of the malware are not changed, the critical low-level system call sequence does not change. Therefore, instead of extracting API call sequence for malware in each class, they propose to focus on API call sequence for certain functions of malware [1214]. However, such approach has not been empirically well studied comparing to the API call sequence analysis techniques proposed previously. In this study, with a large set of data, we empirically study whether such approach generates superior results comparing with the previous ones.

In this study, we adopt sequence alignment algorithm which is known to perform well in extracting the similar subsequences from the different sequences. Sequence alignment algorithm will make us less confused by meaningless codes inserted in malware in its detection. Sequence alignment algorithms have been applied in various areas such as natural language processing and biometrics and have proven their excellence [15]. In this paper, we propose a new approach in API call sequence analysis with introducing sequence alignment algorithm. The rest of the paper is organized as follows. In Section 2, we review the related literature. In Section 3, we present our methodology and experiment. In Section 4, we conclude our research and suggest future research direction.


2. Publication

How to Cite this Article

Youngjoon Ki, Eunjin Kim, and Huy Kang Kim, “A Novel Approach to Detect Malware Based on API Call Sequence Analysis,” International Journal of Distributed Sensor Networks, vol. 2015, Article ID 659101, 9 pages, 2015. doi:10.1155/2015/659101

Download full paper
Download citation as EndNote


3. Dataset Release

For academic purposes, we are happy to release our dataset. However, to avoid indiscriminate distribution of malware, you need the password to unzip the dataset. Please send us a request sent by your official email account. If you use our dataset for your experiment, please cite our paper.

Dataset Download Link: Download

Contact: Huy Kang Kim (cenda at korea.ac.kr)


4. Acknowledgement

APIMDS is developed by the Hacking and Countermeasure Research Lab in the Graduate School of Information Security of the Korea University, Seoul, Korea. 

Please contact “Huy Kang Kim” if you have any question.


5. References



benign_program_dataset_WinXP_SP3.zip
malware_dataset.zip
md5digest_benign_programs.txt