Health Levels Modeling for SSD Failure Prediction

  • Gustavo W. M. Valença Universidade Federal do Ceará (UFC)
  • Francisco L. F. Pereira Universidade Federal do Ceará (UFC)
  • Felipe T. Brito Universidade Federal do Ceará (UFC)
  • Victor A. E. de de Farias Universidade Federal do Ceará (UFC)
  • Javam C. Machado Universidade Federal do Ceará (UFC)

Resumo


The increasing adoption of solid-state drives (SSDs) due to their high performance and reliability has made failure prediction crucial for ensuring data integrity and availability. Self-monitoring, Analysis, and Reporting Technology (SMART) is a system for drives that periodically reports various operational parameters that facilitate early detection of potential issues. Although many studies have used SMART attributes for approaching this matter – as a binary problem – we test new ways of predicting SSD failures, considering multiple health levels. In this paper, we first use feature selection for selecting the best SMART attributes as learning features. Then, we test the selected features on several classification models and two different prediction horizons of one month and one year ahead of the failure. The preliminary results effectively validate our approaches to address that problem, mainly in the smaller prediction horizon with non-linear models.
Palavras-chave: SSD, Machine Learning, LSTM, Failure Prediction

Referências

Che, Z., Purushotham, S., Cho, K., Sontag, D., and Liu, Y. (2018). Recurrent neural networks for multivariate time series with missing values. Scientific reports, 8(1):6085.

Chen, L., Zhu, Z., Li, A., Mashhadi, N., Frickey, R., Ye, J., and Guo, X. (2022). Ssd drive failure prediction on alibaba data center using machine learning. In 2022 IEEE International Memory Workshop (IMW), pages 1–4. IEEE.

dos Santos Lima, F. D., Amaral, G. M. R., de Moura Leite, L. G., Gomes, J. P. P., and de Castro Machado, J. (2017). Predicting failures in hard drives with lstm networks. In 2017 Brazilian Conference on Intelligent Systems (BRACIS), pages 222–227. IEEE.

Felix, G. L., Pereira, F. L., Praciano, F. D., Gomes, J. P., and Machado, J. C. (2023). Feature selection for remaining useful life prediction in hard disk drives with missing data. In Anais Estendidos do XXXVIII Simpósio Brasileiro de Bancos de Dados, pages 57–63. SBC.

Han, S., Lee, P. P., Xu, F., Liu, Y., He, C., and Liu, J. (2021). An in-depth study of correlated failures in production ssd-based data centers. In 19th USENIX Conference on File and Storage Technologies (FAST 21). USENIX Association.

Lima, F. D. S., Pereira, F. L. F., Chaves, I. C., Gomes, J. P. P., and Machado, J. C. (2018). Evaluation of recurrent neural networks for hard disk drives failure prediction. In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pages 85–90. IEEE.

Lima, F. D. S., Pereira, F. L. F., Chaves, I. C., Machado, J. C., and Gomes, J. P. P. (2021). Predicting the health degree of hard disk drives with asymmetric and ordinal deep neural models. IEEE Transactions on Computers, 70(2):188–198.

Lu, R., Xu, E., Zhang, Y., Zhu, Z., Wang, M., Zhu, Z., Xue, G., Li, M., and Wu, J. (2022). NVMe SSD failures in the field: the Fail-Stop and the Fail-Slow. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), pages 1005–1020, Carlsbad, CA. USENIX Association.

Maneas, S., Mahdaviani, K., Emami, T., and Schroeder, B. (2020). A study of {SSD} reliability in large scale enterprise storage deployments. In 18th USENIX Conference on File and Storage Technologies (FAST 20), pages 137–149.

Murray, J. F., Hughes, G. F., Kreutz-Delgado, K., and Schuurmans, D. (2005). Machine learning methods for predicting failures in hard drives: A multiple-instance application. Journal of Machine Learning Research, 6(5).

Ottem, E. and Plummer, J. (1995). Playing it smart: The emergence of reliability prediction technology. Technical report, Technical report, Seagate Technology Paper.

Pereira, F. L. F., Bucar, R. C., Brito, F. T., Gomes, J. P. P., and Machado, J. C. (2022). Predicting failures in hdds with deep nn and irregularly-sampled data. In Brazilian Conference on Intelligent Systems, pages 196–209. Springer.

Xu, F., Han, S., Lee, P. P., Liu, Y., He, C., and Liu, J. (2021). General feature selection for failure prediction in large-scale ssd deployment. In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 263–270. IEEE.
Publicado
14/10/2024
VALENÇA, Gustavo W. M.; PEREIRA, Francisco L. F.; BRITO, Felipe T.; DE FARIAS, Victor A. E. de; MACHADO, Javam C.. Health Levels Modeling for SSD Failure Prediction. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 39. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 764-770. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2024.243219.