Data Anonymization for Artificial Intelligence using Gorilla Troop Algorithm
Abstract
The collection of data from the environment and individuals through the Internet of Things (IoT) is a reality, where such data is utilized by innovative solutions based on Artificial Intelligence (AI). However, particularly in the healthcare domain, user data must comply with privacy laws. Thus, there is a challenge in understanding the utility of data used in AI solutions while adhering to legal requirements, for instance, by anonymizing the data. Traditional anonymization methods compromise the effectiveness of AI models, reducing their performance. In this context, this article proposes the GOK −Privacy algorithm, which combines a metaheuristic inspired by gorilla behavior with clustering techniques, enabling privacy preservation without sacrificing the performance of analytical models. Experiments conducted using real healthcare data demonstrate the effectiveness of the proposed solution in real-world scenarios.
Keywords:
Anonimization Machine Learning
References
Abdollahzadeh, B., Gharehchopogh, F. S., and Mirjalili, S. (2021). A novel metaheuristic optimization algorithm inspired by gorilla troops’ behaviors. Expert Systems with Applications, 182:115083.
Chiu, C. C. and Tsai, C. Y. (2007). Weighted feature c-means clustering algorithm for data mining in intelligent transportation systems. Expert Systems with Applications, 33(1).
Choudhury, O., Gkoulalas-Divanis, A., Salonidis, T., Sylla, I., Park, Y., Hsu, G., and Das, A. (2020). Anonymizing data for privacy-preserving federated learning. arXiv preprint, arXiv:2002.09096.
El Mestari, S. Z., Lenzini, G., and Demirci, H. (2024). Preserving data privacy in machine learning systems. Computers Security, 137:103605.
Ferreira, M. C., Ribeiro, S. E., Nobre, F. V., Linhares, M. L., Araújo, T. P., and Gomes, R. L. (2024). Mitigating measurement failures in throughput performance forecasting. In 2024 20th International Conference on Network and Service Management (CNSM), pages 1–7.
Gomes, R. L., Bittencourt, L. F., and Madeira, E. R. M. (2014a). A similarity model for virtual networks negotiation. In Proceedings of the 29th Annual ACM Symposium on Applied Computing, SAC ’14, pages 489–494, New York, NY, USA. Association for Computing Machinery.
Gomes, R. L., Bittencourt, L. F., Madeira, E. R. M., Cerqueira, E., and Gerla, M. (2014b). An architecture for dynamic resource adjustment in VSDNs based on traffic demand. In 2014 IEEE Global Communications Conference, pages 2005–2010.
He, X., Chen, H., Chen, Y., Dong, Y., Wang, P., and Huang, Z. (2012). Clustering-based k-anonymity. In Advances in Knowledge Discovery and Data Mining: 16th Pacific-Asia Conference, PAKDD 2012, Kuala Lumpur, Malaysia, May 29-June 1, 2012, Proceedings, Part I 16, pages 405–417. Springer.
Hussain, F., Abbas, S. G., Shah, G. A., Pires, I. M., Fayyaz, U. U., Shahzad, F., Garcia, N. M., and Zdravevski, E. (2021). A framework for malicious traffic detection in IoT healthcare environment. Sensors, 21(9):3025.
Kacha, L., Zitouni, A., and Djoudi, M. (2021). KAB: A new k-anonymity approach based on black hole algorithm. Journal of King Saud University - Computer and Information Sciences.
Kumar, R., Chen, W., and Smith, S. (2024). Privacy-preserving machine learning through k-anonymity: A novel approach for healthcare data protection. Journal of Medical Systems, 48(1):1–15.
Langari, R. K., Sardar, S., Mousavi, S. A. A., and Radfar, R. (2020). Combined fuzzy clustering and firefly algorithm for privacy preserving in social networks. Expert Systems With Applications, 141:112968.
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2006). Mondrian multidimensional k-anonymity. In 22nd International Conference on Data Engineering (ICDE’06), pages 25–25. IEEE.
Ni, C., Cang, L. S., Gope, P., and Min, G. (2022). Data anonymization evaluation for big data and IoT environment. Information Sciences, 605:381–392.
Pimenta, I. A., Silva, D. A., Moura, E. S., Silveira, M. M., and Gomes, R. L. (2024). Impact of data anonymization in machine learning models. In 13th Latin-American Symposium on Dependable and Secure Computing (LADC 2024), pages 1–4, Recife, Brazil.
Portela, A. L., Menezes, R. A., Costa, W. L., Silveira, M. M., Bittencourt, L. F., and Gomes, R. L. (2023). Detection of IoT devices and network anomalies based on anonymized network traffic. In NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, pages 1–6.
Portela, A. L. C., Ribeiro, S. E. S. B., Menezes, R. A., de Araujo, T., and Gomes, R. L. (2024). T-FOR: An adaptable forecasting model for throughput performance. IEEE Transactions on Network and Service Management, pages 1–1.
Seh, A. H., Zarour, M., Alenezi, M., Sarkar, A. K., Agrawal, A., Kumar, R., and Khan, R. A. (2020). Healthcare data breaches: Insights and implications. Healthcare, 8(2):133.
Silva, M., Ribeiro, S., Carvalho, V., Cardoso, F., and Gomes, R. L. (2023). Scalable detection of SQL injection in cyber physical systems. In Proceedings of the 12th Latin-American Symposium on Dependable and Secure Computing, LADC ’23, pages 220–225, New York, NY, USA. Association for Computing Machinery.
Silva, M. V., Mosca, E. E., and Gomes, R. L. (2022). Green industrial internet of things through data compression. International Journal of Embedded Systems, 15(6):457–466.
Silveira, M., Santos, D., Souza, M., Silva, D., Mesquita, M., Neto, J., and Gomes, R. L. (2023a). An anonymization service for privacy in data mining. In Proceedings of the 12th Latin-American Symposium on Dependable and Secure Computing, LADC ’23, pages 214–219, New York, NY, USA. Association for Computing Machinery.
Silveira, M. M., Portela, A. L., Menezes, R. A., Souza, M. S., Silva, D. S., Mesquita, M. C., and Gomes, R. L. (2023b). Data protection based on searchable encryption and anonymization techniques. In NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, pages 1–5.
Silveira, M. M., Silva, D. S., Rodriguez, S. J. R., and Gomes, R. L. (2023c). Searchable symmetric encryption for private data protection in cloud environments. In Proceedings of the 11th Latin-American Symposium on Dependable Computing, LADC ’22, pages 95–98, New York, NY, USA. Association for Computing Machinery.
Slijepčević, D., Henzl, M., Klausner, L. D., Dam, T., Kieseberg, P., and Zeppelzauer, M. (2021). k-anonymity in practice: How generalisation and suppression affect machine learning classifiers. Computers & Security, 111:102488.
Souza, M. S., Ribeiro, S. E. S. B., Lima, V. C., Cardoso, F. J., and Gomes, R. L. (2024). Combining regular expressions and machine learning for SQL injection detection in urban computing. Journal of Internet Services and Applications, 15(1):103–111.
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., and Fu, A. W.-C. (2006). Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–790. ACM.
Yuan, S. and Wu, X. (2022). Trustworthy anomaly detection: A survey. arXiv preprint, arXiv:2202.07787.
Chiu, C. C. and Tsai, C. Y. (2007). Weighted feature c-means clustering algorithm for data mining in intelligent transportation systems. Expert Systems with Applications, 33(1).
Choudhury, O., Gkoulalas-Divanis, A., Salonidis, T., Sylla, I., Park, Y., Hsu, G., and Das, A. (2020). Anonymizing data for privacy-preserving federated learning. arXiv preprint, arXiv:2002.09096.
El Mestari, S. Z., Lenzini, G., and Demirci, H. (2024). Preserving data privacy in machine learning systems. Computers Security, 137:103605.
Ferreira, M. C., Ribeiro, S. E., Nobre, F. V., Linhares, M. L., Araújo, T. P., and Gomes, R. L. (2024). Mitigating measurement failures in throughput performance forecasting. In 2024 20th International Conference on Network and Service Management (CNSM), pages 1–7.
Gomes, R. L., Bittencourt, L. F., and Madeira, E. R. M. (2014a). A similarity model for virtual networks negotiation. In Proceedings of the 29th Annual ACM Symposium on Applied Computing, SAC ’14, pages 489–494, New York, NY, USA. Association for Computing Machinery.
Gomes, R. L., Bittencourt, L. F., Madeira, E. R. M., Cerqueira, E., and Gerla, M. (2014b). An architecture for dynamic resource adjustment in VSDNs based on traffic demand. In 2014 IEEE Global Communications Conference, pages 2005–2010.
He, X., Chen, H., Chen, Y., Dong, Y., Wang, P., and Huang, Z. (2012). Clustering-based k-anonymity. In Advances in Knowledge Discovery and Data Mining: 16th Pacific-Asia Conference, PAKDD 2012, Kuala Lumpur, Malaysia, May 29-June 1, 2012, Proceedings, Part I 16, pages 405–417. Springer.
Hussain, F., Abbas, S. G., Shah, G. A., Pires, I. M., Fayyaz, U. U., Shahzad, F., Garcia, N. M., and Zdravevski, E. (2021). A framework for malicious traffic detection in IoT healthcare environment. Sensors, 21(9):3025.
Kacha, L., Zitouni, A., and Djoudi, M. (2021). KAB: A new k-anonymity approach based on black hole algorithm. Journal of King Saud University - Computer and Information Sciences.
Kumar, R., Chen, W., and Smith, S. (2024). Privacy-preserving machine learning through k-anonymity: A novel approach for healthcare data protection. Journal of Medical Systems, 48(1):1–15.
Langari, R. K., Sardar, S., Mousavi, S. A. A., and Radfar, R. (2020). Combined fuzzy clustering and firefly algorithm for privacy preserving in social networks. Expert Systems With Applications, 141:112968.
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2006). Mondrian multidimensional k-anonymity. In 22nd International Conference on Data Engineering (ICDE’06), pages 25–25. IEEE.
Ni, C., Cang, L. S., Gope, P., and Min, G. (2022). Data anonymization evaluation for big data and IoT environment. Information Sciences, 605:381–392.
Pimenta, I. A., Silva, D. A., Moura, E. S., Silveira, M. M., and Gomes, R. L. (2024). Impact of data anonymization in machine learning models. In 13th Latin-American Symposium on Dependable and Secure Computing (LADC 2024), pages 1–4, Recife, Brazil.
Portela, A. L., Menezes, R. A., Costa, W. L., Silveira, M. M., Bittencourt, L. F., and Gomes, R. L. (2023). Detection of IoT devices and network anomalies based on anonymized network traffic. In NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, pages 1–6.
Portela, A. L. C., Ribeiro, S. E. S. B., Menezes, R. A., de Araujo, T., and Gomes, R. L. (2024). T-FOR: An adaptable forecasting model for throughput performance. IEEE Transactions on Network and Service Management, pages 1–1.
Seh, A. H., Zarour, M., Alenezi, M., Sarkar, A. K., Agrawal, A., Kumar, R., and Khan, R. A. (2020). Healthcare data breaches: Insights and implications. Healthcare, 8(2):133.
Silva, M., Ribeiro, S., Carvalho, V., Cardoso, F., and Gomes, R. L. (2023). Scalable detection of SQL injection in cyber physical systems. In Proceedings of the 12th Latin-American Symposium on Dependable and Secure Computing, LADC ’23, pages 220–225, New York, NY, USA. Association for Computing Machinery.
Silva, M. V., Mosca, E. E., and Gomes, R. L. (2022). Green industrial internet of things through data compression. International Journal of Embedded Systems, 15(6):457–466.
Silveira, M., Santos, D., Souza, M., Silva, D., Mesquita, M., Neto, J., and Gomes, R. L. (2023a). An anonymization service for privacy in data mining. In Proceedings of the 12th Latin-American Symposium on Dependable and Secure Computing, LADC ’23, pages 214–219, New York, NY, USA. Association for Computing Machinery.
Silveira, M. M., Portela, A. L., Menezes, R. A., Souza, M. S., Silva, D. S., Mesquita, M. C., and Gomes, R. L. (2023b). Data protection based on searchable encryption and anonymization techniques. In NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, pages 1–5.
Silveira, M. M., Silva, D. S., Rodriguez, S. J. R., and Gomes, R. L. (2023c). Searchable symmetric encryption for private data protection in cloud environments. In Proceedings of the 11th Latin-American Symposium on Dependable Computing, LADC ’22, pages 95–98, New York, NY, USA. Association for Computing Machinery.
Slijepčević, D., Henzl, M., Klausner, L. D., Dam, T., Kieseberg, P., and Zeppelzauer, M. (2021). k-anonymity in practice: How generalisation and suppression affect machine learning classifiers. Computers & Security, 111:102488.
Souza, M. S., Ribeiro, S. E. S. B., Lima, V. C., Cardoso, F. J., and Gomes, R. L. (2024). Combining regular expressions and machine learning for SQL injection detection in urban computing. Journal of Internet Services and Applications, 15(1):103–111.
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., and Fu, A. W.-C. (2006). Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–790. ACM.
Yuan, S. and Wu, X. (2022). Trustworthy anomaly detection: A survey. arXiv preprint, arXiv:2202.07787.
Published
2025-05-19
How to Cite
PIMENTA, Ivo A.; ARAÚJO, Ramon S.; RODRIGUES, Renann L.; SILVEIRA, Matheus M.; GOMES, Rafael L..
Data Anonymization for Artificial Intelligence using Gorilla Troop Algorithm. In: BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 43. , 2025, Natal/RN.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 448-461.
ISSN 2177-9384.
DOI: https://doi.org/10.5753/sbrc.2025.6252.
