Avanços no tratamento de dados textuais na saúde com técnicas de Inteligência Artificial: Um algoritmo para agrupamento de dados

Alisson I. Dias; Denise S. de Sousa; Josimar A. de Oliveira; Larissa G. Cardoso; Sara L. de Farias; Alan R. dos Santos; Elton C. S. Morais

doi:10.5753/erigo.2024.4849

Alisson I. Dias UEG
Denise S. de Sousa UEG
Josimar A. de Oliveira UEG
Larissa G. Cardoso UEG
Sara L. de Farias IFGoiano
Alan R. dos Santos UEG
Elton C. S. Morais UEG

DOI: https://doi.org/10.5753/erigo.2024.4849

Resumo

O avanço da Tecnologia da Informação (TI) na saúde gerou grande volume de dados, muitas vezes sem processamento adequado. À vista disso, a Inteligência Artificial (IA) ajuda no aproveitamento desses dados, mas lidar com textos clínicos livres e heterogêneos ainda é desafiador. Este presente estudo desenvolveu um algoritmo em Python para o pré-processamento e agrupamento de 217 mil diagnósticos clínicos por similaridades estruturais, com foco em termos relacionados à Dengue e COVID-19. Consequentemente, resultados preliminares mostram que essa abordagem organiza de forma eficaz os dados, facilitando análises posteriores. Apesar do sucesso inicial, desafios como a configuração de termos e a heterogeneidade dos textos indicam a necessidade de aprimoramentos para melhorar a precisão do processo.

Palavras-chave: Inteligência Artificial, processamento de dados, agrupamento, saúde, textos clínicos

Referências

Concepcion, M. B. S., Gerardo, B. D., Elijorde, F. I., Castro, J. T. D., and Cruz, N. B. D. (2024). Development of big data classifier for biomedicine early diagnosis: An experimental approach using machine learning methods. Journal of Computer Science, 20:379–388.

Dobrakowski, A. G., Mykowiecka, A., Marciniak, M., Jaworski, W., and Biecek, P. (2021). Interpretable segmentation of medical free-text records based on word embeddings. Journal of Intelligent Information Systems, 57:447–465.

Dou, Y. and Meng, W. (2023). Comparative analysis of weka-based classification algorithms on medical diagnosis datasets. Technology and health care : official journal of the European Society for Engineering and Medicine, 31:397–408.

Ghaddar, B. and Naoum-Sawaya, J. (2018). High dimensional data classification and feature selection using support vector machines. European Journal of Operational Research, 265:993–1004.

Godinho, T. M., Lebre, R., Almeida, J. R., and Costa, C. (2019). Etl framework for real-time business intelligence over medical imaging repositories. Journal of Digital Imaging, 32:870–879.

Haraty, R. A., Dimishkieh, M., and Masud, M. (2015). An enhanced k-means clustering algorithm for pattern discovery in healthcare data. International Journal of Distributed Sensor Networks, 2015.

Napravnik, M., Hržić, F., Tschauner, S., and Štajduhar, I. (2024). Building radiologynet: an unsupervised approach to annotating a large-scale multimodal medical database. BioData Mining, 17.

Paula, F. D. A. P., Ferreira, J. Z., Júnior, E. L. D. S., Alves, I. G., Narvaes, J. V. R., Paula, C. D. A. P., Baretta, I. P., and Pacheco, R. B. (2023). Incidência da dengue durante a covid-19.

Singh, P., Singh, S. P., and Singh, D. S. (2019). An introduction and review on machine learning applications in medicine and healthcare.

Siouda, R., Nemissi, M., and Seridi, H. (2024). Diverse activation functions based-hybrid rbf-elm neural network for medical classification. Evolutionary Intelligence, 17:829–845.

Thangarasu, G. and Dominic, P. D. D. (2015). Diabetic deduction through non-parametric analysis. International Journal of Business Information Systems, 20:325–347.

Tripathi, M. A., Tripathi, R., Effendy, F., Manoharan, G., Paul, M. J., and Aarif, M. (2023). An in-depth analysis of the role that ml and big data play in driving digital marketing’s paradigm shift.

Waqas, S. M., Hussain, K., Mostafa, S. A., Nawi, N. M., and Khan, S. (2022). Fuzzy density-based clustering for medical diagnosis. volume 457 LNNS, pages 264–271.