A Comparative Study of Machine Learning Algorithms for the Detection of Fake News on the Internet
Resumo
Context: The increase in the proliferation of fake news on the Internet has significantly impacted the quality and veracity of information received by society. Problem: The malicious use of information can compromise democracy by manipulating people's opinions. In addition, there are few facilitating mechanisms that classify and help the citizen to know whether a certain news propagated is true or not. This problem has driven new research directions in an attempt to classify and identify these news. Methodology: This work in its methodology performs a comparison of algorithms to serve as an intelligent solution in the detection of fake news in Portuguese. About 12,000 news featured the dataset used for this analysis. Pre-processing techniques were used to analyze the patterns of these news, as well as to reduce noise and eliminate null information. The algorithms used for comparison were Logistic Regression, Stochastic Gradient Descent, Support Vector Machine and Multilayer Perceptron. Result: The results obtained showed that the models generated by the four algorithms obtained an accuracy greater than 90%. To choose the best algorithm, metrics such as precision, recall and f-measure were used for each of the models. The SVM algorithm had the best performance, with 96.39% accuracy. Contribution: In addition to the analytical results presented, this work brought as contributions the availability of a database containing news in Portuguese and an analysis, from the text of the news, both grammatical and structural, in order to detect the existing patterns between true and false.
Referências
Rafael Batista. 2018. A divulgação de notícias falsas, conhecidas como fake news, pode interferir negativamente em vários setores da sociedade, como política, saúde e segurança. [link]. Acessado: 20/04/2021.
Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013. API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108–122.
Sonia Castelo, Thais Almeida, Anas Elghafari, Aécio Santos, Kien Pham, Eduardo Nakamura, and Juliana Freire. 2019. A Topic-Agnostic Approach for Identifying Fake News Pages. In Companion Proceedings of The 2019 World Wide Web Conference. ACM, 975–980.
Douglas Ciriaco. 2018. Mais de 4 bilhões de pessoas usam a internet ao redor do mundo. [link]. Acessado: 19/04/2021.
Rosanne D'Agostino. 2017. Três anos depois, linchamento de Fabiane após boato na web pode ajudar a endurecer lei. [link]. Acessado: 20/04/2021.
Ithalo Henrique de Sousa Leal. 2018. O uso de aprendizagem de máquina para identificação e classificação de fake news no twitter referentes a eleição presidencial de 2018. Monografia (Bacharelado em Ciência da Computação), Faculdade Doctum de Caratinga.
Caroline Delmazo and Jonas C.L. Valente. 2018. Fake news nas redes sociais online: propagação e reações à desinformação em busca de cliques. Media & Jornalismo 18 (04 2018), 155 – 169. [link].
Davi P. Guimarães, Guilherme M. Moreira, Matheus E. Fagundes, and Nilson M. Lazarin. 2019. Análise de sites disseminadores de fake news. In Anais Estendidos do XV Simpósio Brasileiro de Sistemas de Informação (Aracaju). SBC, Porto Alegre, RS, Brasil, 17–20. https://doi.org/10.5753/sbsi.2019.7431
Md Abu Kausar, VS Dhaka, and Sanjeev Kumar Singh. 2013. Web crawler: a review. International Journal of Computer Applications 63, 2(2013).
Simon Kemp. 2018. Digital in 2018: World's Internet users pass the 4 billion mark. [link] [link]. Acessado em 19/04/2021.
Jake Lever, Martin Krzywinski, and Naomi Altman. 2016. Logistic regression.
Marumo and Fabiano Shiiti. 2018. Deep Learning para Classificação de fake news por sumarização de texto.Monografia (Bacharelado em Ciência da Computação), Universidade Estadual de Londrina.
Ryan Mitchell. 2018. Web scraping with Python: Collecting more data from the modern web. ” O'Reilly Media, Inc.”.
Maria Carolina Monard and José Augusto Baranauskas. 2003. Conceitos Sobre Aprendizado de Máquina. In Sistemas Inteligentes Fundamentos e Aplicações (1 ed.). Manole Ltda, Barueri-SP, 89–114.
Rafael Monteiro, Roney L. de Sales, and Thiago A. S. Pardo. 2018. Detecção Automática de Notícias Falsas para o Português. [link]. Acessado: 11/04/2021.
Roger Monteiro, Rodrigo Nogueira, and Greisse Moser. 2019. Desenvolvimento de um sistema para a classificação de Fakenews acoplado à etapa de ETL de um Data Warehouse de Textos de Notícias em língua Portuguesa. In Anais da XV Escola Regional de Banco de Dados(Chapecó). SBC, Porto Alegre, RS, Brasil, 131–140. https://doi.org/10.5753/erbd.2019.8486
Rafael A. Monteiro, Roney L. S. Santos, Thiago A. S. Pardo, Tiago A. de Almeida, Evandro E. S. Ruiz, and Oto A. Vale. 2018. Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results. In Computational Processing of the Portuguese Language. Springer International Publishing, NY, USA, 324–334.
Kenneth Rapoza. 2017. Can ’Fake News’ Impact The Stock Market? [link]. Acessado: 19/04/2021.
Sebastian Raschka. 2018. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. arxiv:1811.12808 [cs.LG]
Christopher Salton, Gerard e Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513–523.
Wellison Santos, Marcus Xavier, David Carlos da Cunha, Jose Carlos Ferreira, Daniel Adauto, and Carlos Ferraz. 2019. TrendsBot: Verificando a veracidade das mensagens do Telegram utilizando Data Stream. In Anais Estendidos do XXXVII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (Gramado). SBC, Porto Alegre, RS, Brasil, 65–72. https://doi.org/10.5753/sbrc_estendido.2019.7771
Daniel Silveira. 2018. Brasil ganha 10 milhões de internautas em 1 ano, aponta IBGE. [link]. Acessado: 20/04/2021.
Statista. 2017. Internet usage in Brazil - Statistics & Facts. [link]. Acessado: 21/04/2021.