Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings

Danielly Sorato; Renato Fileto

Danielly Sorato Universidade Federal de Santa Catarina
Renato Fileto Universidade Federal de Santa Catarina

Resumo

Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.

Palavras-chave: Natural Language Processing, Linguistic Pattern Recognition, Word Embeddings, Twitter, Data Analysis

Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings

Resumo

Artigos mais lidos do(s) mesmo(s) autor(es)