An Analysis of the Sentiment Classification of Short Messages Using Word2Vec
Resumo
Sentiment analisys and the polarity classification of texts constitute one of the main tools currently used by companies and organizations for the most varied purposes. This work presents an analysis of the use of word embeddings, built through Word2Vec, in the process of features extraction for polarity classification of short messages written in English. The texts used were extracted from Twitter and the results obtained show that, in spite of the possible need to use larger textual bases to obtain better vectors, Word2Vec is a promising tool for the features extraction of textual data, contributing to obtain good classification results.
Referências
Aguiar, R. F. and Prati, R. C. (2015). Incorporação de representação vetorial distribuída de palavras e parágrafos na classificação de sms spam. ENIAC-Encontro Nacional de Inteligência Artificial e Computacional. Natal, Brasil.
de França, T. C. and Oliveira, J. (2014). Análise de sentimento de tweets relacionados aos protestos que ocorreram no brasil entre junho e agosto de 2013. In Proceedings of the III Brazilian Workshop on Social Network Analysis and Mining (BRASNAN), pages 128–139.
Duarte, E. S. (2013). Sentiment analysis on Twitter for the Portuguese language. PhD thesis, Universidade Nova de Lisboa.
Jiang, S., Lewris, J., Voltmer, M., and Wang, H. (2016). Integrating rich document representations for text classification. In Systems and Information Engineering Design Symposium (SIEDS), 2016 IEEE, pages 303–308. IEEE.
Lai, S., Liu, K., He, S., and Zhao, J. (2016). How to generate a good word embedding. IEEE Intelligent Systems, 31(6):5–14.
Liu, H. (2017). Sentiment analysis of citations using word2vec. arXiv preprint ar-Xiv:1704.00177.
Lochter, J. V., Zanetti, R. F., and Almeida, T. A. (2015). Detecçao de opiniao em mensagens curtas usando comitê de classificadores e indexaçao semântica. ENIAC-Encontro Nacional de Inteligência Artificial e Computacional. Natal, Brasil.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Rehurek, R. and Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta. ELRA. Disponível em: http://is.muni.cz/publication/884893/en.
Saif, H., Fernandez, M., He, Y., and Alani, H. (2013). Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the sts-gold. [S.l: S.n].
Silva, T. P. d. et al. (2016). Normalização textual e indexação semântica aplicadas da filtragem de sms spam. [S.l: S.n].
Xue, B., Fu, C., and Shaobin, Z. (2014). A study on sentiment computing and classification of sina weibo with word2vec. In Big Data (BigData Congress), 2014 IEEE International Congress on, pages 358–363. IEEE.
Zhang, D., Xu, H., Su, Z., and Xu, Y. (2015). Chinese comments sentiment classification based on word2vec and svm perf. Expert Systems with Applications, 42(4):1857–1863.