Strengthening Scientific Integrity: Digital Forensics for Biomedical Research Imaging
Abstract
To fight against the increasing misconduct cases in science, this Ph.D. research confronted the challenge of scientific integrity with a pioneering investigation into digital forensic analysis specifically tailored for biomedical images. This work conducted extensive research into key manipulation types – copy-move forgery, image reuse, and AI-generated content – developing novel, fully explainable, and auditable computational detection methods for each. In a commitment to transparency and to promote research to the area, these techniques are provided as open-source resources. Besides the isolated techniques for each type of image forged, a central contribution is the development of an end-to-end system, created through collaboration with international forensic experts and the U.S. Office of Research Integrity (ORI). This system automates the analysis of scientific publications, starting from PDF documents and ending by identifying figures with potential integrity concerns.
References
Bik, E. (2020). The stock photo paper mill. Science Integrity Digest [Internet]. Available at [link]. (Accessed March 2025).
Bik, E. M., Casadevall, A., and Fang, F. C. (2016). The prevalence of inappropriate image duplication in biomedical research publications. mBio, 7(3).
Bucci, E. M. (2018). Automatic detection of image manipulations in the biomedical literature. Cell Death & Disease, 9(3).
Byrne, J. A. and Christopher, J. (2020). Digital magic, or the dark arts of the 21stcentury—how can journals and peer reviewers detect manuscripts and publications from paper mills? FEBS Letters, 594(4):583–589.
Cardenuto, J. P., Mandelli, S., Moreira, D., Bestagini, P., Delp, E., and Rocha, A. (2024a). Explainable artifacts for synthetic western blot source attribution. In 2024 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6.
Cardenuto, J. P., Moreira, D., and Rocha, A. (2024b). Unveiling scientific articles from paper mills with provenance analysis. PLOS ONE, 19(10):e0312666.
Cardenuto, J. P., Moreira, D., and Rocha, A. (2024c). UPM - DATASET. Available at [link] (Accessed March 2025).
Cardenuto, J. P. and Rocha, A. (2022a). Benchmarking scientific image forgery detectors. Science and Engineering Ethics, 28(4).
Cardenuto, J. P. and Rocha, A. (2022b). Recod.ai scientific image integrity dataset (rsiid). Available at [link] (Accessed March 2025).
Cardenuto, J. P., Yang, J., Padilha, R., et al. (2023). The age of synthetic realities: Challenges and opportunities. APSIPA Transactions on Signal and Information Processing, 12(1).
Chawla, D. (2020). A single ‘paper mill’ appears to have churned out 400 papers, sleuths find. Science.
Christopher, J. (2018). Systematic fabrication of scientific images revealed. FEBS Letters, 592(18):3027–3029.
Else, H. and Noorden, R. V. (2021). The fight against fake-paper factories that churn out sham science. Nature, 591(7851):516–519.
Farid, H. (2006). Exposing digital forgeries in scientific images. In Proceeding of the 8th workshop on Multimedia and security - MM&Sec '06. ACM Press.
Google (2021). Conheça os vencedores do prêmio lara 2021, o programa de bolsas de pesquisa do google. Available at [link] (Accessed March 2025).
Mandelli, S., Cozzolino, D., Cannas, E. D., et al. (2022). Forensic analysis of synthetically generated western blot images. IEEE Access, 10:59919–59932.
Moreira, D., Cardenuto, J. P., Shao, R., et al. (2022). Sila: a system for scientific image analysis. Scientific Reports, 12(1).
NCBI Resource Coordinators (2005). Pubmed: the database. National Center for Biotechnology Information [Internet]. Available from: [link]. Accessed on June 2024.
Qi, C., Zhang, J., and Luo, P. (2020). Emerging concern of scientific fraud: Deep learning and image manipulation. BioRxiv [Preprint]. Available from DOI: 10.1101/2020.11.24.395319.
Rossner, M. (2008). A false sense of security. Journal of Cell Biology, 183(4):573–574.
Rossner, M. and Yamada, K. M. (2004). What's in a picture? the temptation of image manipulation. Journal of Cell Biology, 166(1):11–15.
Tipping, M. E. and Bishop, C. M. (2002). Probabilistic principal component analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology, 61(3):611–622.
