CAN Signal Extraction and Translation Dataset
1. Dataset
This dataset is intended to support CAN analysis research such as signal extraction and translation. The dataset consists of 40 CAN traffic logs collected by periodically sending OBD queries while driving in a controlled environment. Each log file is named with the PID used to query when collecting CAN traffic.
1.1 Collecting Data
Many vehicle parameters, especially related to powertrain, have important values only when the vehicle is in motion. A typical example is vehicle speed. If a diagnostic request with PID ‘13’h that represents vehicle speed is made when the vehicle is stationary, the response will return ‘00’h. However, there are lots of ‘00’h bytes in CAN traffic because most bytes have nonzero value only when certain functions are working. Since the CAN communication system does not have an authentication procedure for the nodes to be connected, we can simply access the CAN bus via the OBD-II port. In this work, we used the Kvaser CAN interface device, which is one of the widely used CAN interfaces. There are also many other commercial CAN interface devices and any other third party tools can be used such as Raspberry-Pi and Arduino.
Diagnostic responses and normal CAN messages have to be gathered at the same time for cross-analysis. We logged all CAN traffic messages while sending diagnostic request messages at regular intervals. Although it is better to send the diagnostic request as short as possible to obtain high-resolution response data, in our experiment, the normal CAN messages were slightly delayed when diagnostic request messages were transmitted at 100 ms period. Thus, we choose a transmission interval of 200 ms to avoid affecting normal CAN traffic. Besides, since only one PID request can be made at a time, CAN traffic data were collected separately for each PID.
1.2 Data attributes
Timestamp, CAN ID, DLC, DATA[0], DATA[1], DATA[2], DATA[3], DATA[4], DATA[5], DATA[6], DATA[7]
1. Timestamp : recorded time (s)
2. CAN ID : identifier of CAN message in HEX (ex. 043f)
3. DLC : number of data bytes, from 0 to 8
4. DATA[0~7] : data value (byte)
The below table shows he entire supported PIDs of the test vehicle. We collected CAN traffic data and got 40 dump files that corresponding to each PID. Each dump file contains about 127–128k CAN messages including 300 diagnostic response messages.
Table. List of PIDs Supported by the Test Vehicle
1.3 Downloads
For academic purposes, we are happy to release our datasets. If you want to use our dataset for your experiment, please cite our paper.
Please connect to the link below to download this dataset.
Dataset Download Link: Google Forms
2. Publication
H. M. Song and H. K. Kim, "Discovering CAN Specification Using On-Board Diagnostics," in IEEE Design & Test, vol. 38, no. 3, pp. 93-103, June 2021, doi: 10.1109/MDAT.2020.3011036 Full paper download
3. Contact
Huy Kang Kim (cenda at korea.ac.kr)
4. see also
Please see the page [HCRL/Datasets] to find out more in-vehicle IDS datasets or other datasets that we have.