Analysis of the Evolution of Speeches by Pre-candidates for President through Vector Linguistic Representations

  • Kid Valeriano UFF
  • Aline Paes UFF
  • Daniel de Oliveira UFF

Abstract


Commonly, pre-candidates for government positions express their opinions and campaign platforms in informal speeches, prior to the official period. This behavior is essential for the voter to know the ideologies and campaign platforms, in order to make his voting decision. In the decision-making process, the voter can consider the similarity between speeches of different candidates, how the speech varies over time, and what is the adequacy of the speech to the most relevant topics for society. However, analyzing and capturing such aspects from informal discourses is a difficult task for the voter, given the volume of information made available by various media outlets, and the political bias of some of them. Thus, in this article, we propose a political discourse analysis tool based on Linguistic Representation Learning techniques to assist voters in their decision. Results obtained from the speeches of the pre-candidates for the post of President of Brazil in 2018 allow us to verify how the candidates behave in terms of their own speeches and those of their competitors.
Keywords: doc2vec, natural language processing, discourse analysis

References

Azarbonyad, H., Dehghani, M., Beelen, K., Arkut, A., Marx, M., and Kamps, J. Words are malleable: Computing semantic shifts in political and media discourse. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, pp. 1509–1518, 2017.

Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. A neural probabilistic language model. Journal of machine learning research 3 (Feb): 1137–1155, 2003.

Bird, S. and Loper, E. Nltk: the natural language toolkit. In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions. Association for Computational Linguistics, pp. 31, 2004.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (Aug): 2493–2537, 2011.

Dai, A. M., Olah, C., and Le, Q. V. Document embedding with paragraph vectors. arXiv preprint arXiv:1507.07998 , 2015.

Gautrais, C., Cellier, P., Quiniou, R., and Termier, A. Topic signatures in political campaign speeches. In EMNLP 2017-Conference on Empirical Methods in Natural Language Processing, 2017.

Greene, D. and Cross, J. P. Exploring the political agenda of the european parliament using a dynamic topic modeling approach. Political Analysis 25 (1): 77–94, 2017.

Guthrie, D., Allison, B., Liu, W., Guthrie, L., and Wilks, Y. A closer look at skip-gram modelling. In Proceedings of the 5th international Conference on Language Resources and Evaluation (LREC-2006). pp. 1–4, 2006.

Harris, Z. S. Distributional structure. Word 10 (2-3): 146–162, 1954.

Lau, J. H. and Baldwin, T. An empirical evaluation of doc2vec with practical insights into document embedding generation. In Proceedings of the 1st Workshop on Representation Learning for NLP. pp. 78–86, 2016.

Le, Q. and Mikolov, T. Distributed representations of sentences and documents. In International Conference on Machine Learning. pp. 1188–1196, 2014.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. pp. 3111–3119, 2013.

Mikolov, T., Yih, W.-t., and Zweig, G. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 746–751, 2013.

Pennington, J., Socher, R., and Manning, C. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pp. 1532–1543, 2014.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning representations by back-propagating errors. nature 323 (6088): 533, 1986.

Zhang, Y., Jin, R., and Zhou, Z.-H. Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics 1 (1-4): 43–52, 2010.
Published
2018-10-22
VALERIANO, Kid; PAES, Aline; DE OLIVEIRA, Daniel. Analysis of the Evolution of Speeches by Pre-candidates for President through Vector Linguistic Representations. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 6. , 2018, São Paulo/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 81-88. ISSN 2763-8944. DOI: https://doi.org/10.5753/kdmile.2018.27388.