A Spectrogram Vision Transformer (ViT) Approach for Cross-Domain Bearing Fault Diagnosis on the UORED-VAFCLS Dataset

  • Ana Beatriz Cardoso IFES
  • Francisco de Assis Boldt IFES
  • Adriano Santos IFES
  • Mert Sehri University of Ottawa
  • Patrick Dumond University of Ottawa

Abstract


This paper addresses limited cross-domain generalization in bearing fault diagnosis from traditional time-series data. This study proposes a spectrogram-based approach using advanced Vision Transformer (ViT) models—ViT, DeiT, DINOv2, SwinV2, and MAE— validated on accelerometer-derived spectrogram images from the UORED-VAFCLS dataset. An existing domain-splitting strategy is iterated to evaluate the model performance across varying fault severities. Results demonstrate that the proposed ViT-driven spectrogram method substantially outperforms the state-of-the-art CNN-LSTM approach, setting a promising pathway for robust cross-domain bearing fault diagnostics.

References

Alexakos, C. T., Karnavas, Y. L., Drakaki, M. and Tziafettas, I. A. (16 feb 2021). A Combined Short Time Fourier Transform and Image Classification Transformer Model for Rolling Element Bearings Fault Diagnosis in Electric Motors. Machine Learning and Knowledge Extraction, v. 3, n. 1, p. 228–242.

Darcet, T., Oquab, M., Mairal, J. and Bojanowski, P. (12 apr 2024). Vision Transformers Need Registers. . arXiv. [link], [accessed on Jan 26].

Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (22 oct 2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. [link], [accessed on Oct 22].

He, K., Chen, X., Xie, S., et al. (2021). Masked Autoencoders Are Scalable Vision Learners. . arXiv. [link], [accessed on Mar 25].

Li, X., Liu, Y., Fang, L. and Chang, J. (22 mar 2024). Health State Recognition of Bearing based on Time-Frequency Spectrogram and Deep Learning. In 2024 6th International Conference on Natural Language Processing (ICNLP). . IEEE. [link], [accessed on Dec 2].

Liu, G. and Zhu, B. (11 dec 2024). A Review of Intelligent Device Fault Diagnosis Technologies Based on Machine Vision. . arXiv. [link], [accessed on Jan 9].

Liu, Z., Hu, H., Lin, Y., et al. (11 apr 2022). Swin Transformer V2: Scaling Up Capacity and Resolution. . arXiv. [link], [accessed on Jan 26].

Michau, G. and Fink, O. (mar 2021). Unsupervised transfer learning for anomaly detection: Application to complementary operating condition transfer. Knowledge-Based Systems, v. 216, p. 106816.

Oquab, M., Darcet, T., Moutakanni, T., et al. (2023). DINOv2: Learning Robust Visual Features without Supervision. . arXiv. [link], [accessed on Mar 25].

Sehri, M., Dumond, P. and Bouchard, M. (aug 2023). University of Ottawa constant load and speed rolling-element bearing vibration and acoustic fault signature datasets. Data in Brief, v. 49, p. 109327.

Sehri, M., Khalilian, N., De Assis Boldt, F. and Dumond, P. (2024). Cross-Domain Fault Diagnosis for Bearing Condition Monitoring Using CNN-LSTM Fusion on the UORED-VAFCLS Dataset. Available at SSRN 5002761,

Soomro, A. A., Muhammad, M. B., Mokhtar, A. A., et al. (sep 2024). Insights into modern machine learning approaches for bearing fault classification: A systematic literature review. Results in Engineering, v. 23, p. 102700.

Touvron, H., Cord, M., Douze, M., et al. (15 jan 2021). Training data-efficient image transformers & distillation through attention. . arXiv. [link], [accessed on Jan 26].

Zeng, Z., Kaur, R., Siddagangappa, S., Balch, T. and Veloso, M. (25 nov 2023). From Pixels to Predictions: Spectrogram and Vision Transformer for Better Time Series Forecasting. In Proceedings of the Fourth ACM International Conference on AI in Finance. , ICAIF ’23. Association for Computing Machinery. DOI: 10.1145/3604237.3626905, [accessed on Jan 12].

Zhang, Z., Li, J., Cai, C., Ren, J. and Xue, Y. (23 mar 2024). Bearing Fault Diagnosis Based on Image Information Fusion and Vision Transformer Transfer Learning Model. Applied Sciences, v. 14, n. 7, p. 2706.

Zim, A. H., Ashraf, A., Iqbal, A., Malik, A. and Kuribayashi, M. (20 sep 2022). A Vision Transformer-Based Approach to Bearing Fault Classification via Vibration Signals. . arXiv. [link], [accessed on Jan 9].
Published
2025-07-20
CARDOSO, Ana Beatriz; BOLDT, Francisco de Assis; SANTOS, Adriano; SEHRI, Mert; DUMOND, Patrick. A Spectrogram Vision Transformer (ViT) Approach for Cross-Domain Bearing Fault Diagnosis on the UORED-VAFCLS Dataset. In: INTEGRATED SOFTWARE AND HARDWARE SEMINAR (SEMISH), 52. , 2025, Maceió/AL. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 441-452. ISSN 2595-6205. DOI: https://doi.org/10.5753/semish.2025.9037.