Polarity Classification of Traffic Related Tweets
Resumo
In this paper we present a study about polarity classification of tweets in the traffic domain. Specifically, we use the data in Portuguese language from an account maintained by a traffic management agency. We evaluate the performance of three learning methods: SVM (Support Vector Machine), Naive Bayes and Maximum Entropy. We also explore how the use of balanced vs. unbalanced corpus affects the models behavior. The results show that, in this context, a ML classifier obtains better results than the reported in the literature. In our experiments, SVM trained with a balanced corpus outperforms all tested models, achieving 99% of Accuracy, Average Recall and Average Precision.
Referências
Aching., J. L., de Oliveira, T. B. F., and Bazzan, A. L. C. (2014). Traffic information extraction from a blogging platform using a bootstrapped named entity recognition approach. In Computational Intelligence in Vehicles and Transportation Systems (CIVTS), 2014 IEEE Symposium on, pages 6–13, Orlando. IEEE.
Albuquerque, F. C., Casanova, M. A., Lopes, H., Redlich, L. R., de Macedo, J. A. F., Lemos, M., de Carvalho, M. T. M., and Renso, C. (2016). A methodology for trafficrelated twitter messages interpretation. Computers in Industry, 78:57–69.
Batista, G. E. A. P. A., Prati, R. C., and Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl., 6(1):20–29.
Bird, S., Klein, E., and Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”.
Cao, J., Zeng, K.,Wang, H., Cheng, J., Qiao, F.,Wen, D., and Gao, Y. (2014). Web-based traffic sentiment analysis: Methods and applications. IEEE transactions on Intelligent Transportation systems, 15(2):844–853.
D’Andrea, E., Ducange, P., Lazzerini, B., and Marcelloni, F. (2015). Real-time detection of traffic from twitter stream analysis. IEEE transactions on intelligent transportation systems, 16(4):2269–2283.
Gilbert, C. and Hutto, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International Conference on Weblogs and Social Media (ICWSM-14).
John, G. H. and Langley, P. (1995). Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 338–345. Morgan Kaufmann Publishers Inc.
Karthik, A. V. (2017). Implementation of fuzzy based traffic sentiment analysys. International Journal of Advanced Research in Computer Science, 8(9):851–854.
Krstajic, D., Buturovic, L. J., Leahy, D. E., and Thomas, S. (2014). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of cheminformatics, 6(1):10.
Lalrempuii, C. and Mittal, N. (2016). Sentiment classification of crisis related tweets using segmentation. In Proceedings of the International Conference on Informatics and Analytics, page 89. ACM.
Le, Q. and Mikolov, T. (2014). Distributed representations of sentences and documents. In International Conference on Machine Learning, pages 1188–1196.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553):436.
Pawar, P. Y. and Gawande, S. (2012). A comparative study on different types of approaches to text categorization. International Journal of Machine Learning and Computing, 2(4):423.
Ratnaparkhi, A. (1997). A simple introduction to maximum entropy models for natural language processing. IRCS Technical Reports Series, page 81.
Refaeilzadeh, P., Tang, L., and Liu, H. (2009). Cross-validation. In Encyclopedia of database systems, pages 532–538. Springer.
Rosenthal, S., Farra, N., and Nakov, P. (2017). Semeval-2017 task 4: Sentiment analysis in twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 502–518.
Schaffer, C. (1993). Selecting a classification method by cross-validation. Machine Learning, 13(1):135–143.