UniTED: A Unified Time Series Event Detection Repository
Resumo
Event detection in time series is essential for numerous real-world applications, from monitoring industrial systems to identifying health anomalies. Public annotated datasets are crucial for benchmarking, training, and validating detection models. Despite recent advances in the field, there is a lack of a standardized and unified repository for evaluating different event types, which limits progress in reproducibility, comparability, and model development. This paper presents the UniTED, a Unified Event Detection Dataset for time series. UniTED consolidates annotated series from diverse domains and offers a common format and protocol for evaluation. The repository supports three event types: anomalies, change points, and motifs. UniTED fosters reusability and reproducibility, contributing to improved performance assessment and model generalization across data analysis tasks. However, existing datasets have limitations, including poor standardization, a lack of annotation guidelines, limited support for different event types, and difficulties in automating performance evaluation. UniTED presents a harmonized ETL process, label and annotation conventions, and an open-source implementation. Three use cases are presented to demonstrate the applicability of the dataset.
Referências
Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3).
Duraj, A., Szczepaniak, P. S., and Sadok, A. (2025). Detection of anomalies in data streams using the lstm-cnn model. Sensors, 25(5).
Han, J., Kamber, M., and Pei, J. (2012). Data Mining: Concepts and Techniques. Elsevier.
Lima, J., Tavares, L. G., Pacitti, E., Ferreira, J. E., Santos, I., Siqueira, I. G., Carvalho, D., Porto, F., Coutinho, R., and Ogasawara, E. (2024). Online Event Detection in Streaming Time Series: Novel Metrics and Practical Insights. In Proceedings of the IJCNN 2024.
Lomio, F., Baselga, D. M., Moreschini, S., Huttunen, H., and Taibi, D. (2020). RARE: A labeled dataset for cloud-native memory anomalies. In MaLTeSQuE 2020, pages 19 – 24.
Moody, G. and Mark, R. (2001). The impact of the mit-bih arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3):45–50.
Moritz, S., Rehbach, F., Chandrasekaran, S., Rebolledo, M., and Bartz-Beielstein, T. (2018). GECCO Industrial Challenge 2018 Dataset. Technical report, [link].
Ogasawara, E., Salles, R., Porto, F., and Pacitti, E. (2025). Event Detection in Time Series. Synthesis Lectures on Data Management. Springer Nature Switzerland, Cham, 1 edition.
Salles, R., Escobar, L., Baroni, L., Zorrilla, R., Ziviani, A., Kreischer, V., Delicato, F., Pires, P. F., Maia, L., Coutinho, R., Assis, L., and Ogasawara, E. (2020). Harbinger: Um framework para integração e análise de métodos de detecção de eventos em séries temporais. In Anais do Simpósio Brasileiro de Banco de Dados (SBBD), pages 73–84. SBC.
Salles, R., Lima, J., Reis, M., Coutinho, R., Pacitti, E., Masseglia, F., Akbarinia, R., Chen, C., Garibaldi, J., Porto, F., and Ogasawara, E. (2024). SoftED: Metrics for soft evaluation of time series event detection. Computers and Industrial Engineering, 198.
Vargas, R. E. V., Munaro, C. J., Ciarelli, P. M., Medeiros, A. G., do Amaral, B. G., Barrionuevo, D. C., de Araújo, J. C. D., Ribeiro, J. L., and aes, L. P. M. (2019). A realistic and public dataset with rare undesirable real events in oil wells. Journal of Petroleum Science and Engineering, 181.
webscope (2015). S5 - A Labeled Anomaly Detection Dataset, version 1.0. Technical report, [link].
Wenig, P., Schmidl Sebastian, S., and Papenbrock, T. (2022). TimeEval: A Benchmarking Toolkit for Time Series Anomaly Detection Algorithms. Proceedings of the VLDB Endowment, 15(12):3678 – 3681.
Wu, R. and Keogh, E. J. (2023). Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress. IEEE Transactions on Knowledge and Data Engineering, 35(3):2421 – 2429.
