Feature Selection: supporting the mining process on cyber-physical systems result datasets

  • Hebert Silva UNICAMP
  • Tania Basso UNICAMP
  • Regina Moraes UNICAMP

Abstract


Cyber physical systems (CPs) often generated large sets of data during monitoring or testing processes. Analyzing these results manually is not practical as it requires great human effort. Machine learning can be a valuable approach to support the analysis and can help the responsible professional to make urgent decisions. Moreover, most of time these datasets contain missing, extreme, duplicate or defective values that can bias the general classification methods, which can be worked around with feature selection techniques. However, identifying all the possible combinations of features and select the best set of them is not an easy task. In this work, we present a feature selection study to automate the analysis of CPs test result datasets by the support of machine learning. The idea is to automatically identify a set of attributes that optimize the accuracy of the chosen machine learning model. Three scenarios that use large amounts of data from cyber physical systems were used and the results of feature selection were surprising in some cases.

References

Angelis, V., Felici, G., and Mancinelli, G. (2006). Feature selection for data mining. In Data Mining and Knowledge Discovery Approaches based on Rule Induction Techniques, pages 227–251. Springer.

Bolón-Canedo, V. and A.Alonso-Betanzos (2019). Ensembles for feature selection: A review and future trends. Information Fusion, 52:1–12.

Hindy, H., Brosset, D., Bayne, E., Seeam, A., and Bellekens, X. (2018). Improving siem for critical scada water infrastructures using machine learning. In International Workshop on the Security of Industrial Control Systems and Cyber-Physical Systems - SECPRE, pages 3–19. Springer.

Husna, A. and Adiwijaya, A. (2018). A clustering approach for feature selection in microarray data classification using random forest. Information Process Systems, 14:1167–1175.

Kumar, S. (2021). Automate your feature selection workflow in one line of python code. URL: [link]. [Last access on June, 2021].

Mafarja, M. and Mirjalili, S. (2018). Whale optimization approaches for wrapper feature selection. Applied Soft Computing, 62.

Pipino, L. L., Lee, Y.W., andWang, R. (2019). Data quality assessment. Computer Reviews Journal, 4.

Rothermich J. (2021). Finding machine learning ready data. URL: [link]. [Last access on May, 2021].

Sraavnthi, K., Shamila, M., and Kumar, T. A. (2019). Cyber physical systems: The role of machine learning and cyber security in present and future. Computer Reviews Journal, 4.

Tang, J., Alelyani, S., and Liu, H. (2014). Feature Selection for Classification: A Review, pages 37–64. Number 5.

Vidyavathi, B. M. (2019). A new approach to feature selection for data mining. Computational Intelligence Research, 7(3).
Published
2021-08-16
SILVA, Hebert; BASSO, Tania; MORAES, Regina. Feature Selection: supporting the mining process on cyber-physical systems result datasets. In: FAULT TOLERANCE WORKSHOP (WTF), 22. , 2021, Uberlândia. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 15-28. ISSN 2595-2684. DOI: https://doi.org/10.5753/wtf.2021.17201.