FACTO Dataset: A Dataset of User Reports for Faulty Computer Components

  • Maria de Lourdes M. Silva Universidade Federal do Ceará (UFC)
  • André L. C. Mendonça Universidade Federal do Ceará (UFC)
  • Eduardo R. D. Neto Universidade Federal do Ceará (UFC)
  • Iago C. Chaves Universidade Federal do Ceará (UFC)
  • Carlos Caminha Universidade Federal do Ceará (UFC)
  • Felipe T. Brito Universidade Federal do Ceará (UFC)
  • Victor A. E Farias Universidade Federal do Ceará (UFC)
  • Javam C. Machado Universidade Federal do Ceará (UFC)

Resumo


Advancements in electronic fabrication technologies have facilitated the large-scale production of computer components, which are prone to faults over time. Despite the availability of fault-reporting tools provided by hardware manufacturers, there is a significant gap in effectively utilizing textual reports due to data scarcity. In this paper, we introduce FACTO dataset, a comprehensive collection of user reports on faulty computer components such as video cards, storage devices, motherboards, memory, and others. Data was gathered through a survey of hardware specialists, web scraping of internet forums, and synthetic text generation from real manufacturer data using large language models. This dataset aims to provide insights for correlating user reports with faulty components, thus enhancing diagnostic capabilities and improving hardware reliability and customer satisfaction.
Palavras-chave: Faulty computer components, user reports, diagnostics

Referências

Abbas, Y. and Malik, M. S. I. (2023). Defective products identification framework using online reviews. Electronic Commerce Research, 23(2):899–920.

Chavan, A., Magazine, R., Kushwaha, S., Debbah, M., and Gupta, D. (2024). Faster and lighter llms: A survey on current challenges and way forward. arXiv preprint arXiv:2402.01799.

Chaves, I. C., de Paula, M. R. P., Leite, L. G., Queiroz, L. P., Gomes, J. P. P., and Machado, J. C. (2016). Banhfap: A bayesian network based failure prediction approach for hard disk drives. In 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pages 427–432.

Cheng, Z., Han, S., Lee, P. P., Li, X., Liu, J., and Li, Z. (2022). An in-depth correlative study between dram errors and server failures in production data centers. In 2022 41st International Symposium on Reliable Distributed Systems (SRDS), pages 262–272. IEEE.

Hakami, A. (2024). Strategies for overcoming data scarcity, imbalance, and feature selection challenges in machine learning models for predictive maintenance. Scientific Reports, 14(1):9645.

Lima, F. D. S., Pereira, F. L. F., Chaves, I. C., Gomes, J. P. P., and Machado, J. C. (2018). Evaluation of recurrent neural networks for hard disk drives failure prediction. In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pages 85–90. IEEE.

Marella, R. (2023). ctransformers:python bindings for the transformer models implemented in c/c++ using ggml library. [link]. Accessed: 2024-06-29.

Park, Y., Fan, S., and Hsu, C. (2020). A review on fault detection and process diagnostics in industrial processes. processes, 8 (9), 1123.

Queiroz, L. P., Rodrigues, F. C. M., Gomes, J. P. P., Brito, F. T., Chaves, I. C., Paula, M. R. P., Salvador, M. R., and Machado, J. C. (2016). A fault detection method for hard disk drives based on mixture of gaussians and nonparametric statistics. IEEE Transactions on Industrial Informatics, 13(2):542–550.

Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.

Rombach, K. (2023). Fault Diagnostics under label and data scarcity. PhD thesis, ETH Zurich.

Schroeder, B. and Gibson, G. A. (2009). A large-scale study of failures in high-performance computing systems. IEEE transactions on Dependable and Secure Computing, 7(4):337–350.

Van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-sne. Journal of machine learning research, 9(11).

Xia, F., Song, H., Yan, L.-C., Li, Y., and Wang, L.-J. (2021). A survey on failure prediction in large-scale computing systems. In 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), pages 2028–2033. IEEE.

Xu, F., Han, S., Lee, P. P., Liu, Y., He, C., and Liu, J. (2021). General feature selection for failure prediction in large-scale ssd deployment. In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 263–270. IEEE.

Young, A., Chen, B., Li, C., Huang, C., Zhang, G., Zhang, G., Li, H., Zhu, J., Chen, J., Chang, J., et al. (2024). Yi: Open foundation models by 01. ai. arXiv preprint arXiv:2403.04652.
Publicado
14/10/2024
SILVA, Maria de Lourdes M.; MENDONÇA, André L. C.; D. NETO, Eduardo R.; CHAVES, Iago C.; CAMINHA, Carlos; BRITO, Felipe T.; FARIAS, Victor A. E; MACHADO, Javam C.. FACTO Dataset: A Dataset of User Reports for Faulty Computer Components. In: DATASET SHOWCASE WORKSHOP (DSW), 6. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 91-102. DOI: https://doi.org/10.5753/dsw.2024.243802.