LIWBC: a bigram algorithm to enhance results in polarity classification

Flavio Carvalho; Rafael G. Rodrigues; Gustavo Paiva Guedes

Flavio Carvalho CEFET/RJ
Rafael G. Rodrigues CEFET/RJ
Gustavo Paiva Guedes CEFET/RJ

Resumo

The text mining literature shows a growing body of work concerned with the automatic identification of sentiment in text. Sentiment polarity classification is one of the most important text mining tasks. The typical approach to polarity classification uses lexicons to count word usage from linguistic or emotional aspects. One of the most widely used lexicons is the Linguistic Inquiry and Word Count (LIWC). LIWC assigns words to categories (e.g., positive emotion) based on a lexicon of words associated with psycholinguist categories. It has been widely used in polarity classification task with good results. However, it only accounts for word count, discarding the text structure and ignoring important semantic relationships between words. In this work, we present LIWBC, an algorithm to count bigrams using the lexicon provided by LIWC. The goal is to incorporate text structure information to improve the polarity classification task with LIWC lexicon. We conducted experiments to evaluate LIWBC with two real datasets: the first one consists of blogger posts; the second one is the movie reviews dataset, which contains full-text movie reviews from IMDB. Both datasets were processed with LIWC and LIWBC. After that, we ran four classification algorithms in the data processed by LIWC and LIWBC. The SVM algorithm executed with LIWBC data yielded the best result in both datasets. The F1 score of SVM in blogger posts and movie reviews dataset had an improvement of 2.2% and 2.5%, respectively.

Palavras-chave: Text mining, Sentiment analysis, LIWC

LIWBC: a bigram algorithm to enhance results in polarity classification

Resumo

Artigos mais lidos do(s) mesmo(s) autor(es)