ImageFactCk.BR: Repositório de Imagens para a Detecção de Desinformação Disseminada em Plataformas Digitais

Yago Santos; Michel M. Silva; Julio C. S. Reis

doi:10.5753/dsw.2023.234267

Yago Santos Universidade Federal de Viçosa
Michel M. Silva Universidade Federal de Viçosa https://orcid.org/0000-0002-2499-9619
Julio C. S. Reis Universidade Federal de Viçosa https://orcid.org/0000-0003-0563-0434

DOI: https://doi.org/10.5753/dsw.2023.234267

Resumo

Com o surgimento das mídias sociais e dos aplicativos de mensagens instantâneas, a disseminação de informação textual por meio de imagens se popularizou bastante. Comitantemente, este tipo de mídia tem sido bastante explorado para a disseminação de desinformação. Imagens com conteúdo textual possuem diversas características peculiares que trazem inúmeros desafios para ferramentas focadas na identificação, contenção e moderação deste tipo de conteúdo. Diante deste cenário, neste trabalho, apresentamos o ImageFactCk.BR, um repositório de dados que contém 12.209 imagens sintéticas, geradas a partir características comumente encontradas em plataformas digitais, e que contém desinformação escrita no idioma Português verificada por agências de checagem de fatos brasileiras. Esperamos que ele possa ser útil para estudos em diferentes contextos em torno do fenômeno da desinformação em plataformas digitais.

Palavras-chave: Desinformação, Detecção de desinformação, Imagens, Plataformas digitais

Referências

Akbar, S. Z., Panda, A., Kukreti, D., Meena, A., and Pal, J. (2021). Misinformation as a window into prejudice: Covid-19 and the information environment in India. Proc. of the ACM on Human-Computer Interaction, 4(CSCW3):1–28.

Boididou, C., Papadopoulos, S., Kompatsiaris, Y., Schifferes, S., and Newman, N. (2014). Challenges of computational verification in social multimedia. In Proc. of the Int’l ACM Conference on World Wide Web (WWW) Companion, pages 743–748.

Camões, L. V. (1818). Os Lusíadas, poema épico. Didot.

Cao, J., Qi, P., Sheng, Q., Yang, T., Guo, J., and Li, J. (2020). Exploring the role of visual content in fake news detection. Disinformation, Misinformation, and Fake News in Social Media: Emerging Research Challenges and Opportunities, pages 141–161.

Castro, J. D. B., Canchumuni, S. W. A., Villalobos, C. E. M., Cordeiro, F. C., Alexandre, A. M. A., and Pacheco, M. A. C. (2021). Improvement optical character recognition for structured documents using generative adversarial networks. In Proc. of the Int’l Conference on Computational Science and Its Applications (ICCSA), pages 285–292.

Couto, J. M., Pimenta, B., de Araújo, I. M., Assis, S., Reis, J. C., da Silva, A. P. C., Almeida, J. M., and Benevenuto, F. (2021). Central de fatos: Um repositório de checagens de fatos. In Anais do Dataset Showcase Workshop (DSW), Simpósio Brasileiro de Banco de Dados (SBBD), pages 128–137.

Hao, Q., Luo, L., Jan, S. T., and Wang, G. (2021). It’s not what it looks like: Manipulating perceptual hashing based applications. In Proc. of the ACM Conference on Computer and Communications Security (SIGSAC), pages 69–85.

Krstovski, K., Ryu, A., and Kogut, B. (2022). Evons: A dataset for fake and real news virality analysis and prediction. In Proc. of the Int’l Conference on Computational Linguistics (COLING).

Marques, I., Salles, I., Couto, J. M., Pimenta, B. C., Assis, S., Reis, J. C., da Silva, A. P. C., de Almeida, J. M., and Benevenuto, F. (2022). A comprehensive dataset of Brazilian fact-checking stories. Journal of Information and Data Management, 13(1).

Monteiro, R. A., Santos, R. L., Pardo, T. A., de Almeida, T. A., Ruiz, E. E., and Vale, O. A. (2018). Contributions to the study of fake news in Portuguese: New corpus and automatic detection results. In Proc. of the Int’l Conference on Computational Processing of the Portuguese Language (PROPOR), pages 324–334.

Moreno, J. and Bressan, G. (2019). Factck. br: a new dataset to study fake news. In Proc. of the Brazillian Symposium on Multimedia and the Web (WebMedia), pages 525–527.

Newman, N., Fletcher, R., Kalogeropoulos, A., and Nielsen, R. K. (2019). Reuters Institute Digital News Report 2019. Reuters Institute for the Study of Journalism.

Reis, J. C., Correia, A., Murai, F., Veloso, A., and Benevenuto, F. (2019). Supervised learning for fake news detection. IEEE Intelligent Systems, 34(2):76–81.

Reis, J. C., Melo, P., Garimella, K., Almeida, J. M., Eckles, D., and Benevenuto, F. (2020). A dataset of fact-checked images shared on WhatsApp during the Brazilian and Indian elections. In Proc. of the Int’l AAAI Conference on Web and Social Media (ICWSM), pages 903–908.

Reis, J. C. S., Melo, P., Silva, M. I., and Benevenuto, F. (2023). Desinformação em plataformas digitais: Conceitos, abordagens tecnológicas e desafios. Jornada de Atualização em Informática (JAI). Sociedade Brasileira de Computação.

Resende, G., Melo, P., Sousa, H., Messias, J., Vasconcelos, M., Almeida, J., and Benevenuto, F. (2019). (mis) information dissemination in WhatsApp: Gathering, analyzing and countermeasures. In Proc. of the World Wide Web Conference (WWW), pages 818–828.

Santos, Y., Silva, M. M., and Reis, J. C. S. (2023). Evaluation of optical character recognition (OCR) systems dealing with misinformation in Portuguese. In Proc. of the Conference on Graphics, Patterns, and Images (SIBGRAPI).

Sharma, D. K. and Garg, S. (2021). Ifnd: a benchmark dataset for fake news detection. Complex & Intelligent Systems, pages 1–21.

Shu, K., Sliva, A., Wang, S., Tang, J., and Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1):22–36.

Thota, A., Tilak, P., Ahluwalia, S., and Lohia, N. (2018). Fake news detection: a deep learning approach. SMU Data Science Review, 1(3):10.