Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings

  • Danielly Sorato Universidade Federal de Santa Catarina
  • Renato Fileto Universidade Federal de Santa Catarina

Resumo


Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.
Palavras-chave: Natural Language Processing, Linguistic Pattern Recognition, Word Embeddings, Twitter, Data Analysis
Publicado
20/05/2019
SORATO, Danielly; FILETO, Renato. Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 15. , 2019, Aracajú. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 143-150.