Feature Selection for Remaining Useful Life Prediction in Hard Disk Drives with Missing Data

  • Gabriel L. S. Felix Universidade Federal do Ceará (UFC)
  • Francisco L. F. Pereira Universidade Federal do Ceará (UFC)
  • Francisco D. B. S. Praciano Universidade Federal do Ceará (UFC)
  • João P. P. Gomes Universidade Federal do Ceará (UFC)
  • Javam C. Machado Universidade Federal do Ceará (UFC)


This paper proposes a two-stage feature selection approach for the problem of Remaining Useful Life (RUL) prediction in Hard Disk Drives (HDDs) with missing data. First, a wrapper method is employed, utilizing a regression estimator to identify the most informative features for RUL prediction. The selected feature set is then evaluated in the second stage using a neural network model, with a focus on assessing the imputation performance for missing data. The goal is to determine a feature subset that enhances RUL prediction accuracy and exhibits robustness in handling missing data scenarios. This approach addresses the challenges of missing data and provides insights into the most relevant features for accurate RUL prediction.
Palavras-chave: HDD, RUL, Failure prediction, Deep Learning, Feature Selection


Amram, M., Dunn, J., Toledano, J. J., and Zhuo, Y. D. (2021). Interpretable predictive maintenance for hard drives. Machine Learning with Applications, 5:100042.

Backblaze (2023). Hard drive data and stats. [link]. Accessed: 2023-02-13.

Cahyadi and Forshaw, M. (2021). Hard disk failure prediction on highly imbalanced data using lstm network. In 2021 IEEE International Conference on Big Data (Big Data), pages 3985–3991.

Hu, L., Han, L., Xu, Z., Jiang, T., and Qi, H. (2020). A disk failure prediction method based on lstm network due to its individual specificity. Procedia Computer Science, 176:791–799. Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24th International Conference KES2020.

Li, J., Ji, X., Jia, Y., Zhu, B., Wang, G., Li, Z., and Liu, X. (2014). Hard drive failure prediction using classification and regression trees. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pages 383–394. IEEE.

Lima, F. D. S., Pereira, F. L. F., Chaves, I. C., Gomes, J. P. P., and Machado, J. C. (2018). Evaluation of recurrent neural networks for hard disk drives failure prediction. In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pages 85–90. IEEE.

Lima, F. D. S., Pereira, F. L. F., Chaves, I. C., Machado, J. C., and Gomes, J. P. P. (2021). Predicting the health degree of hard disk drives with asymmetric and ordinal deep neural models. IEEE Transactions on Computers, 70(2):188–198.

Murray, J. F., Hughes, G. F., and Kreutz-Delgado, K. (2005). Machine learning methods for predicting failures in hard drives: A multiple-instance application. J. Mach. Learn. Res., 6:783–816.

Ottem, E. and Plummer, J. (1995). Playing it smart: The emergence of reliability prediction technology. Technical report, Technical report, Seagate Technology Paper.

Pereira, F. L. F., Bucar, R. C. B., Brito, F. T., Gomes, J. a. P. P., and Machado, J. C. (2022). Predicting failures in hdds with deep nn and irregularly-sampled data. In Intelligent Systems: 11th Brazilian Conference, BRACIS 2022, Campinas, Brazil, November 28 – December 1, 2022, Proceedings, Part II, page 196–209, Berlin, Heidelberg. Springer-Verlag.

Pinheiro, E., Weber, W.-D., and Barroso, L. A. (2007). Failure trends in a large disk drive population. In 5th USENIX Conference on File and Storage Technologies (FAST 07), San Jose, CA. USENIX Association.

Schroeder, B. and Gibson, G. A. (2007). Understanding disk failure rates: What does an mttf of 1,000,000 hours mean to you? ACM Transactions on Storage (TOS), 3(3):8–es.

Xu, C., Wang, G., Liu, X., Guo, D., and Liu, T.-Y. (2016). Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Transactions on Computers, 65(11):3502–3508.
FELIX, Gabriel L. S.; PEREIRA, Francisco L. F.; PRACIANO, Francisco D. B. S.; GOMES, João P. P.; MACHADO, Javam C.. Feature Selection for Remaining Useful Life Prediction in Hard Disk Drives with Missing Data. In: WORKSHOP DE TRABALHOS DE ALUNOS DA GRADUAÇÃO (WTAG) - SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 38. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 57-63. DOI: https://doi.org/10.5753/sbbd_estendido.2023.233372.