Combining compact news representations generated using DistilBERT and topological features to classify fake news

Carlos Abel Córdova Sáenz; Marcelo Dias; Karin Becker

doi:10.5753/kdmile.2020.11978

Carlos Abel Córdova Sáenz Universidade Federal do Rio Grande do Sul
Marcelo Dias Universidade Federal do Rio Grande do Sul, Instituto Federal de Educação Ciência e Tecnologia Sul-rio-grandense
Karin Becker Universidade Federal do Rio Grande do Sul

DOI: https://doi.org/10.5753/kdmile.2020.11978

Resumo

Fake news (FN) have affected people’s lives in unimaginable ways. The automatic classification of FN is a vital tool to prevent their dissemination and support fact-checking. Related work has shown that FN spread faster, deeper, and more broadly than the truth on social media. Besides, deep learning has produced state-of-the-art solutions in this field, mainly based on textual attributes. In this paper, we propose initial experiments to combine compact representations of the textual news properties generated using DistilBERT, with topological metrics extracted from the social propagation network. Using a dataset related to politics and five distinct classification algorithms, our results are encouraging. Regarding the textual attributes, we reached results comparable to state-of-the-art solutions using only the news title and contents, which is useful for FN early detection. The topological attributes were not as effective, but the promising results encourage the investigation of alternative architectures for their combination

Palavras-chave: distilBERT, fake news, fake news classifification, topological features

Referências

Bauskar, S., Badole, V., Jain, P., and Chawla, M. Natural Language Processing based Hybrid Model for Detecting Fake News Using Content-Based Features and Social Features. International Journal of Information Engineering and Electronic Business 11 (4): 1–10, 2019.

Bondielli, A. and Marcelloni, F. A survey on fake news and rumour detection techniques. Information Sciences vol. 497, pp. 38–55, 2019.

Devlin, J., Chang, M., Lee, K., and Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (NAACL-HLT), J. Burstein, C. Doran, and T. Solorio (Eds.). pp. 4171–4186, 2019.

Papanastasiou, F., Katsimpras, G., and Paliouras, G. Tensor factorization with label information for fake news detection. arXiv preprint arXiv:1908.03957 , 2019.

Pierri, F., Piccardi, C., and Ceri, S. Topology comparison of twitter diffusion networks effectively reveals misleading information. Scientific Reports 10 (1), Jan, 2020.

Reis, J. C. S., Correia, A., Murai, F., Veloso, A., and Benevenuto, F. Supervised learning for fake news detection. IEEE Intelligent Systems 34 (2): 76–81, 2019.

Sanh, V., Debut, L., Chaumond, J., and Wolf, T. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 , 2019.

Shu, K., Cui, L., Wang, S., Lee, D., and Liu, H. Defend: Explainable fake news detection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining. KDD ’19. Association for Computing Machinery, New York, NY, USA, pp. 395–405, 2019.

Shu, K., Mahudeswaran, D., Wang, S., Lee, D., and Liu, H. Fakenewsnet: A data repository with news content, social context and dynamic information for studying fake news on social media. arXiv preprint arXiv:1809.01286 vol. 8, 2018.

Shu, K., Mahudeswaran, D., Wang, S., and Liu, H. Hierarchical propagation networks for fake news detection: Investigation and exploitation. In Proceedings of the International AAAI Conference on Web and Social Media. Vol. 14. pp. 626–637, 2020.

Shu, K., Sliva, A., Wang, S., Tang, J., and Liu, H. Fake news detection on social media: A data mining perspective. SIGKDD Explor. Newsl. 19 (1): 22–36, Sept., 2017.

Shu, K., Zhou, X., Wang, S., Zafarani, R., and Liu, H. The role of user profiles for fake news detection. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. pp. 436–439, 2019a.

Wang, W. Y. "Liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 , 2017.

Zhou, X., Wu, J., and Zafarani, R. Safe: Similarity-aware multi-modal fake news detection. arXiv preprint arXiv:2003.04981 , 2020.

Zhou, X. and Zafarani, R. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Comput. Surv. 0 (ja), 2020.