Abordagens de Aprendizado de Máquina para Automatização de Etapas do Processo de Meta-Análise no Contexto da Saúde
Resumo
Este trabalho avalia abordagens de aprendizado de máquina para automatizar etapas da condução de meta-análises em saúde, com ênfase na extração de entidades PICO e triagem de estudos. Um modelo BioELECTRA ajustado para NER foi comparado a Grandes Modelos de Linguagem – LLMs (i.e., Llama 3 e Gemini) usando a base de dados EBM-NLP e um conjunto derivado de 55 meta-análises na tarefa de extração de entidades PICO para ser utilizada na triagem de artigos. O BioELECTRA obteve maior recall e F1-score na extração de entidades PICO, enquanto as LLMs superaram o baseline no ranqueamento de artigos (Precisão@1: 1,00 vs. 0,84), indicando que pipelines híbridos NER–LLM são promissores para automatizar a condução de meta-análises na área da saúde.
Referências
Egger, M., Smith, G. D., and Altman, D. G., editors (2008). Systematic Reviews in Health Care: Meta-Analysis in Context. BMJ Publishing Group, 2 edition.
Gao, T., Yao, X., and Chen, D. (2021). Simcse: Simple contrastive learning of sentence embeddings. In Proc. of EMNLP, pages 6894–6910.
Geng, R. et al. (2025). SLOT: Structuring the output of large language models. arXiv preprint arXiv:2505.04016.
Górska, A. and Tacconelli, E. (2024). Towards autonomous living meta-analyses: A framework for automation of systematic review and meta-analyses. Studies in Health Technology and Informatics, 316:378–382.
Guo, E., Gupta, M., Deng, J., Park, Y.-J., Paget, M., and Naugler, C. (2024). Automated paper screening for clinical reviews using large language models: Data analysis study. Journal of Medical Internet Research, 26:e48996.
Higgins, J. P., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M. J., and Welch, V. A., editors (2024). Cochrane Handbook for Systematic Reviews of Interventions. Version 6.5.
Hoffmann, F., Allers, K., Rombey, T., et al. (2021). Nearly 80 systematic reviews were published each day: Observational study on trends in epidemiology and reporting over the years 2000–2019. Journal of Clinical Epidemiology, 138:1–11.
Jaccard, P. (1912). The distribution of the flora in the alpine zone. New Phytologist, 11(2):37–50.
Jakab, M. (2024). How many authors are (too) many? a retrospective, descriptive analysis of authorship in biomedical publications. Scientometrics.
Kanakarajan, K. r., Kundumani, B., and Sankarasubbu, M. (2021). BioELECTRA: Pretrained biomedical text encoder using discriminators. In Proc. of the BioNLP, pages 143–154.
Li, L., Mathrani, A., and Susnjak, T. (2025). Transforming evidence synthesis: A systematic review of the evolution of automated meta-analysis in the age of ai. arXiv.
Liu, F., Vashishth, S., Uzuner, O., et al. (2021). Self-alignment pretraining for biomedical entity representations. In Proc. of NAACL, pages 4228–4238.
Marshall, I. J. and Wallace, B. C. (2019). Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Systematic Reviews, 8(1):163.
Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P., and Stewart, L. A. (2015). Preferred reporting items for systematic review and meta-analysis protocols (prisma-p) 2015 statement. Systematic Reviews, 4(1):1.
Nye, B., Li, J. J., Patel, R., Yang, Y., Marshall, I., Nenkova, A., and Wallace, B. (2018). A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature. In Proc. of the Annual Meeting of the ACL, pages 197–207.
Oami, T., Okada, Y., and Nakada, T.-A. (2024). Performance of a large language model in screening citations. JAMA Network Open, 7(7):e2420496.
Ofori-Boateng, R., Aceves-Martins, M., Wiratunga, N., and Moreno-Garcia, C. F. (2024). Towards the automation of systematic reviews using natural language processing, machine learning, and deep learning: a comprehensive review. Artificial Intelligence Review, 57:200.
O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., and Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic Reviews, 4(1):5.
Ouzzani, M., Hammady, H., Fedorowicz, Z., and Elmagarmid, A. (2016). Rayyan—a web and mobile app for systematic reviews. Systematic Reviews, 5(1):210.
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hrobjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., McGuinness, L. A., Stewart, L. A., Thomas, J., Tricco, A. C., Welch, V. A., Whiting, P., and Moher, D. (2021). The prisma 2020 statement: an updated guideline for reporting systematic reviews. BMJ, 372:n71.
Richardson, W. S., Wilson, M. C., Nishikawa, J., and Hayward, R. S. (1995). The well-built clinical question: a key to evidence-based decisions. ACP Journal Club, 123(3):A12–A13.
Santos, C. M. d. C., Pimenta, C. A. d. M., and Nobre, M. R. C. (2007). The pico strategy for the research question construction and evidence search. Revista Latino-Americana de Enfermagem, 15(3):508–511.
Tsafnat, G., Glasziou, P., Karystianis, G., and Coiera, E. (2018). Automated screening of research studies for systematic reviews using study characteristics. Systematic Reviews, 7(1):64.
Veritas Health Innovation (2024). Covidence systematic review software. [link]. Accessed: 2026-04-27.
Wallace, B. C., Small, K., Brodley, C. E., Lau, J., and Trikalinos, T. A. (2012). Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. Proc. of the ACM SIGHIT, pages 819–824.
Wallace, B. C., Small, K., Brodley, C. E., and Trikalinos, T. A. (2010a). Active learning for biomedical citation screening. In Proc. of the ACM SIGKDD, pages 173–181.
Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C., and Schmid, C. H. (2010b). Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics, 11:55.
Wang, Z., Cao, L., Danek, B., Jin, Q., Lu, Z., and Sun, J. (2025). Accelerating clinical evidence synthesis with large language models. npj Digital Medicine, 8(1):509.
Xia, C., Xing, C., Du, J., Yang, X., Feng, Y., Xu, R., Yin, W., and Xiong, C. (2024). FOFO: A benchmark to evaluate LLMs’ format-following capability. arXiv preprint arXiv:2402.18667.
Yasunaga, M., Leskovec, J., and Liang, P. (2022). Linkbert: Pretraining language models with document links. In Proc. of the Annual Meeting of the ACL, pages 8003–8016.
