Review with Security Concern Dataset

1. Introduction

Providing pertinent information to potential users about security concerns drawn from user feedback is essential to enhance the security level of the entire mobile ecosystem. According to a usable security survey, star ratings and reviews can help users' download and permission decisions. However, many apps have security problems with a high star rating. To confirm this problem, we build a dataset by collecting 56,439,878 star ratings and reviews for 8,999 popular game apps on Google Play Store. This study proposes a model using an active learning method that distinguishes Review with Security Concern (RSC) from user reviews.


2. Publication


3. Dataset

This dataset consists of reviews of popular Android game apps that have been downloaded over 10,000,000 from Google Play.


3-1. Data attributes

            • all_reviews

              • All collected reviews (raw version)

              • Consists of a total of 8,999 apps

            • train_reviews

              • Review data for training

              • Consists of a total of 6502 reviews labeled according to the three types below.


3-2. Security Concern type

  1. Data Leakage

  2. Excessive Advertising

  3. Inexpedient Payment



4. download

We share data for academic purposes. The composition of the data was collected from publicly available Google Play. The labeling of this dataset was done manually. If you have any feedback regarding the dataset, please contact us. If you use our dataset for your experiment, please cite our paper.


Dataset Download Link:


5. Contact

Sangho Lee (lee35@korea.ac.kr), Huy Kang Kim (cenda@korea.ac.kr)