RoBIn Chatbot: Leveraging LLMs for Automated Risk of Bias Assessment in Clinical Studies

Abel Corrêa Dias; Viviane Pereira Moreira; João Luiz Dihl Comba

doi:10.5753/sbcas_estendido.2025.7734

Abel Corrêa Dias UFRGS
Viviane Pereira Moreira UFRGS
João Luiz Dihl Comba UFRGS

DOI: https://doi.org/10.5753/sbcas_estendido.2025.7734

Resumo

The Risk of Bias (RoB) assessment is an essential instrument for evaluating the reliability of clinical studies and identifying any systematic error that can occur. This task is traditionally performed by humans, and only a few works tried to automate it using machine learning. Recent advances in large language models (LLMs) have revolutionized natural language processing and information retrieval, allowing us to build applications that can chat with documents and perform the most diverse tasks. In this work, we propose RoBIn chatbot, an LLM application able to receive clinical studies as input and classify their RoB. RoBIn chatbot uses a model trained on data derived from the Cochrane Database of Systematic Reviews and is able to perform inference for six bias types. To prevent the LLM from generating misleading conclusions, it relies on retrieval-augmented generation on the submitted file to extract the piece of evidence and send it to a pretrained model responsible for performing RoB inference.

Referências

Brainard, J. (2020). Scientists are drowning in COVID-19 papers. Can new tools keep them afloat? Science.

Dias, A. C., Moreira, V. P., and Comba, J. L. D. (2025). RoBIn: A Transformer-based model for risk of bias inference with machine reading comprehension. Journal of Biomedical Informatics, 166:104819.

Higgins, J. P., Savović, J., Page, M., and Sterne, J. (2019). Revised Cochrane risk-of-bias tool for randomized trials (RoB 2).

Landhuis, E. (2016). Scientific literature: Information overload. Nature, 535:457–458.

Meyer, C., Ulbricht, S., Baumeister, S., Schumann, A., Rüge, J., Bischof, G., Rumpf, H., and John, U. (2008). Proactive interventions for smoking cessation in general medical practice: a quasi-randomized controlled trial to examine the efficacy of computer-tailored letters and physician-delivered brief advice. Addiction, pages 294–304.

Romanov, A. and Shivade, C. (2018). Lessons from Natural Language Inference in the Clinical Domain. CoRR, abs/1808.06752.

Stead, L., Buitrago, D., Preciado, N., Sanchez, G., Hartmann-Boyce, J., and Lancaster, T. (2013). Physician advice for smoking cessation (review). [link].

Wang, L. L., Lo, K., Chandrasekhar, Y., et al. (2020). CORD-19: The COVID-19 Open Research Dataset. In Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020. Association for Computational Linguistics.

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y. (2023). React: Synergizing reasoning and acting in language models. [link].