Survival Analysis Dataset for automobile IDS

Anomaly intrusion detection method for vehicular networks based on survival analysis


In recent years, alongside with the convergence of In-vehicle network (IVN) and wireless communication technology, vehicle communication technology has been steadily progressing. Furthermore, communication with various external networks—such as cloud, vehicle-to-vehicle (V2V), and vehicle-to-infrastructure (V2I) communication networks—further reinforces the connectivity between the inside and outside of a vehicle. On the contrary, this means that the functions of existing vehicles using computer-assisted mechanical mechanisms can be manipulated and controlled by a malicious packet attack. Therefore, diversified and advanced architectures of vehicle systems can significantly increase the accessibility of the system to hackers and the possibility of an attack. This paper proposes an intrusion detection method for vehicular networks based on the survival analysis model. Our main aims were to identify malicious CAN messages and accurately detect the normality and abnormality of a vehicle network without semantic knowledge of the CAN ID function. To this end, normal and abnormal driving data were extracted from three different types of vehicles and we evaluated the performance of our proposed method by measuring the accuracy and the time complexity of anomaly detection by considering three attack scenarios and the periodic characteristics of CAN IDs. Based on the results, we concluded that a CAN ID with a long cycle affects the detection accuracy and the number of CAN IDs affects the detection speed. The difference in the detection accuracy between applying all CAN IDs and CAN IDs with a short cycle is not considerable with some differences observed in the detection accuracy depending on the chunk size and the specific attack type. High detection accuracy and low computational cost will be the essential factors for real-time processing of IVN security. Taken together, the results of the present study contribute to the current understanding of how to correctly manage vehicle communications for vehicle security and driver safety.


1        Dataset

In the present study, we focused on the following three attack scenarios that can immediately and severely impair in-vehicle functions or deepen the intensity of an attack and the degree of damage: Flooding, Fuzzy, and Malfunction. To substantiate the three attack scenarios, two different datasets were produced. One of the datasets contained normal driving data without an attack. The other dataset included the abnormal driving data that occurred when an attack was performed. In particular, we generated attack data in which attack packets were injected for five seconds every 20 seconds for the three attack scenarios. The following figure shows the three typical attack scenarios against an In-vehicle network (IVN).

The flooding attack allows an ECU node to occupy many of the resources allocated to the CAN bus by maintaining a dominant status on the CAN bus. This attack can limit the communications among ECU nodes and disrupt normal driving. We conducted the flooding attack by injecting a large number of messages with the CAN ID set to 0×000 into the vehicle networks.

In case of the fuzzy attack, the attacker performs indiscriminate attacks by iterative injection of random CAN packets. For the fuzzy attack, we generated random numbers with “randint” function, which is a generation module for random integer numbers within a specified range. Messages were sent to the vehicle once every 0.0003 seconds. This process was conducted for both the ID field and the Data field. The randomly generated CAN ID ranged from 0×000 to 0×7FF and included both CAN IDs originally extracted from the vehicle and CAN IDs which were not.

The malfunction attack targets a selected CAN ID from among the extractable CAN IDs of a certain vehicle. As CAN IDs for the malfunction attack, we chose 0×316, 0×153 and 0×18E from the HYUNDAI YF Sonata, KIA Soul, and CHEVROLET Spark vehicles, respectively. For a malfunction attack, the manipulation of the data field has to be simultaneously accompanied by the injection attack of randomly selected CAN IDs. When the values in the data field consisting of 8 bytes were manipulated using 00 or a random value, the vehicles reacted abnormally.

CAN messages that occurred during normal driving


1.1      Data attributes

Timestamp, CAN ID, DLC, DATA [0], DATA [1], DATA [2], DATA [3], DATA [4], DATA [5], DATA [6], DATA [7], flag

1.2     Summary of our dataset 


For academic purpose, we are happy to release our datasets.

Dataset Download Link: Download (PW: ai.spera!+)

2       Publication


3       Contact

This dataset is used for the the intrusion detection system for automobile  in '2019 Information Security R&D dataset challenge' in South Korea. 

4       see also

Please see the page [HCRL/Datasets] to find out more in-vehicle IDS datasets or other datasets that we have.

DataSets [Survival Analysis].pdf