A LLM-Based Approach for Analyzing Vaccine Misinformation

  • Athus Cavalini IFES / UFES
  • Leandro Furlam Turi UFES
  • André G. C. Pacheco UFES
  • Giovanni Comarela UFES

Abstract


This article proposes a fully LLM-based approach to extract insights from large-scale data on vaccine misinformation. Using BERT and GPT models, we examined millions of anti-vaccine messages from Telegram. The analysis identified 12 key misinformation narratives, highlighting alarmist concerns and conspiracy theories, while also suggesting mitigation strategies. The results demonstrate the potential of LLMs for large-scale automated misinformation analysis, mapping its dynamics and providing insights for the development of agile and evidence-based strategic responses.

References

Albuquerque, F. (2023). Brasil atingiu em 2021 menor cobertura vacinal em 20 anos. [link]. Acesso em: 15 de mar. de 2025.

Baratieri, T., Lentsck, M. H., and Peres, K. C. et al (2021). Modelagem de tópicos de pesquisa sobre o novo coronavírus: aplicação do latent dirichlet allocation. Ciência, Cuidado E Saúde, 20(1):e56403.

Brown, T., Mann, B., and Ryder, N. et al. (2020). Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Inf. Processing Systems, volume 33, pages 1877–1901.

Burghardt, K., Chen, K., and Lerman, K. (2020). Large language models reveal information operation goals, tactics, and narrative frames. arXiv preprint arXiv.2405.03688, pages 1–15.

Cavalini, A., Malini, F., and Gouveia, F. et al. (2023). Politics and disinformation: Analyzing the use of telegram’s information disorder network in brazil for political mobilization. First Monday, 28(5):12901.

Chang, Y., Wang, X., and Wang, J. el al. (2024). A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3):2157–6904.

De, S. and Vats, S. (2023). Decoding concerns: Multi-label classification of vaccine sentiments in social media. In Ghosh, K., Mandl, T., Majumder, P., and Mitra, M., editors, Working Notes of FIRE 2023 – Forum for Information Retrieval Evaluation (FIRE-WN 2023), pages 99–111, India. CEUR-WS.org.

DeepSeek-AI, Guo, D., and Yang, D. et al. (2025). DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv preprint arXiv.2501.12948, pages 1–22.

Devlin, J., Chang, M.-W., and Lee, K. et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 4171–4186, EUA. Association for Computational Linguistics.

Gehrke, M. and Benetti, M. (2021). A desinformação no brasil durante a pandemia de covid-19:: temas, plataformas e atores. Fronteiras - estudos midiáticos, 23(2).

George, L. and Sumathy, P. (2023). An integrated clustering and bert framework for improved topic modelin. International Journal of Information Technology, 15:2187–2195.

Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794, pages 1–10.

Hughes, B., Miller-Idriss, C., Piltch-Loeb, R., Goldberg, B., White, K., Criezis, M., and Savoia, E. (2021). Development of a codebook of online anti-vaccination rhetoric to manage covid-19 vaccine misinformation. International Journal of Environmental Research and Public Health, 18(14).

Instituto Capixaba de Ensino, Pesquisa e Inovação (2024). Projeto Observa ICEPi. Acesso em: 10 de fev. de 2025.

Junior, R. V. B. P., Junior, N. C., and Sala, A. et al. (2022). Desempenho da atenção primária à saúde, segundo clusters de municípios convergentes no estado de são paulo. Revista Brasileira de Epidemiologia, 25:E220017.

Malini, F., Sodré, F., and Cavalini, A. et al. (2024). Five patterns of vaccine misinformation on telegram. Lecture Notes in Computer Science, 15213:181–196.

Massarani, L., Waltz, I., and Leal, T. et al. (2021). Narrativas sobre vacinação em tempos de fake news: uma análise de conteúdo em redes sociais. Saúde e Sociedade, 30:e200317.

McInnes, L., Healy, J., and Astels, S. (2017). HDBCSAN: Hierarchical density based clustering. The Journal of Open Source Software, 2(11):205.

McInnes, L., Healy, J., and Melville, J. (2020). UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv.1802.03426, pages 1–63.

Ministério da Saúde (2024). Ministra da saúde defende aliança internacional contra a desinformação em saúde. [link]. Acesso em: 15 de mar. de 2025.

Radford, A., Narasimhan, K., and Salimans, T. et al. (2018). Improving language understanding by generative pre-training. OpenAI preprint Technical Report, pages 1–12.

Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-Networks. In Padó, S. and Huang, R., editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pages 3982–3992, China. Association for Computational Linguistics.

Sistema Único de Saúde (2023). Saúde sem boato. [link]. Acesso em: 10 de fev. de 2025.

Sistema Único de Saúde (2025). ConecteSUS. [link]. Acesso em: 20 de fev. de 2025.

World Health Organization (2022). Health topics: Infodemic. [link]. Acesso em: 15 de mar. de 2025.

YoshimiTanaka, O., Júnior, M. D., and Cristo, E. B. et al. (2015). Uso da análise de clusters como ferramenta de apoio à gestão no sus. Saúde e Sociedade, 24(1):34–45.

Zhong, R., Chacón-Montalván, E. A., and Moraga, P. (2024). Bayesian spatial functional data clustering: applications in disease surveillance. arXiv preprint arXiv:2407.12633, pages 1–19.
Published
2025-06-09
CAVALINI, Athus; TURI, Leandro Furlam; PACHECO, André G. C.; COMARELA, Giovanni. A LLM-Based Approach for Analyzing Vaccine Misinformation. In: BRAZILIAN SYMPOSIUM ON COMPUTING APPLIED TO HEALTH (SBCAS), 25. , 2025, Porto Alegre/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 272-283. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2025.7041.

Most read articles by the same author(s)

1 2 > >>