Datasets‎ > ‎

Web-Hacking Dataset

Web-Hacking Dataset for the Cyber Criminal Profiling


As in the real world’s criminal investigation, cyber criminal profiling is important to attribute cyber attacks. Every cyber crime committed by the same hacker or hacking group has unique characteristics such as attack purpose, attack methods, and target’s profile. Therefore, a complete analysis of the hacker’s activities can give investigators hard evidence to attribute attacks and unveil criminals. To foster further research, we release the web-hacking case dataset we have collected.

1. Dataset

We built a large hacking case database which includes 212,093 web-hacking cases that happened during the past 15 years from site automatically. At, some information is stored in compliance with defined formats in a case-centric database. Most of the information include the date, domain, IP address, system, and web server for the attack. Other information in mirror pages are stored in the form of HTML source. Due to the case encoding, font and other tags and features that exist in the HTML code, those information are put to use in the case vector design after parsing and processing the HTML contents.
With this dataset, researchers can do clustering analysis and in-depth analysis for discovering relationships between hackers or hacker groups. In our work, we attempted to analyze a relationship between DarkSeoul group's attacks and another set of attacks including Sony Pictures Entertainment attack case.

    1-1. Data Set Release

            For academic purposes, we are happy to release our Dataset. If you have any question, please contact "Mee Lan Han" or "Huy Kang Kim".

2. Publication
  • Preliminary version (2-page poster)
           Han, M. L., Han, H. C., Kang, A. R., Kwak, B. I., Mohaisen, A., & Kim, H. K. (2016). IEEE Conference on Communications and Network Security, Philadelphia, PA USA.
3. Contact
  • Mee Lan Han (blosst at or Huy Kang Kim (cenda at