DEBACER: a method for slicing moderated debates

Thomas Palmeira Ferraz; Alexandre Alcoforado; Enzo Bustos; André Seidel Oliveira; Rodrigo Gerber; Naíde Müller; André Corrêa d’Almeida; Bruno Miguel Veloso; Anna Helena Reali Costa

doi:10.5753/eniac.2021.18293

Thomas Palmeira Ferraz USP
Alexandre Alcoforado USP
Enzo Bustos USP
André Seidel Oliveira USP
Rodrigo Gerber USP
Naíde Müller Catholic University of Portugal
André Corrêa d’Almeida Columbia University
Bruno Miguel Veloso Universidade Portucalense / INESC TEC
Anna Helena Reali Costa USP

DOI: https://doi.org/10.5753/eniac.2021.18293

Resumo

Subjects change frequently in moderated debates with several participants, such as in parliamentary sessions, electoral debates, and trials. Partitioning a debate into blocks with the same subject is essential for understanding. Often a moderator is responsible for defining when a new block begins so that the task of automatically partitioning a moderated debate can focus solely on the moderator's behavior. In this paper, we (i) propose a new algorithm, DEBACER, which partitions moderated debates; (ii) carry out a comparative study between conventional and BERTimbau pipelines; and (iii) validate DEBACER applying it to the minutes of the Assembly of the Republic of Portugal. Our results show the effectiveness of DEBACER.

Palavras-chave: Language Processing, Political Documents, Spoken Text Processing, Speech Split, Dialogue Partitioning

Referências

Ales, Z., Pauchet, A., and Knippel, A. (2018). Extraction and clustering of twodimensional dialogue patterns. International Journal on Artificial Intelligence Tools, 27(02):1850001.

Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the ACL, 5:135–146.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep In Proceedings of the 2019 bidirectional transformers for language understanding. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.

Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., and Herrera, F. (2018). Learning from imbalanced data sets, volume 11. Springer.

Guerini, M., Strapparava, C., and Stock, O. (2008). Corps: A corpus of tagged political speeches for persuasive communication processing. Journal of Information Technology & Politics, 5(1):19–32.

Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., and Muller, P.-A. (2019). Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, 33(4):917–963.

Kim, H., Howland, P., Park, H., and Christianini, N. (2005). Dimension reduction in text classification with support vector machines. Journal of machine learning research, 6(1).

Le, Q. and Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196.

Lippi, M. and Torroni, P. (2016). Argument mining from speech: Detecting claims in political debates. In 30th AAAI Conference on Artificial Intelligence.

Liu, Y., Loh, H. T., and Sun, A. (2009). Imbalanced text classification: A term weighting approach. Expert systems with Applications, 36(1):690–701.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceeding of the ICLR, pages 1301–3781.

Roush, A. and Balaji, A. (2020). Debatesum: A large-scale argument mining and summarization dataset. In 7th Workshop on Argument Mining, pages 1–7.

Sechidis, K., Tsoumakas, G., and Vlahavas, I. (2011). On the stratification of multi-label data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 145–158. Springer.

Shan, Y., Li, Z., Zhang, J., Meng, F., Feng, Y., Niu, C., and Zhou, J. (2020). A contextual hierarchical attention network with adaptive objective for dialogue state tracking. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6322–6333.

Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: Pretrained bert models for brazilian portuguese. In Brazilian Conference on Intelligent Systems (BRACIS), pages 403–417. Springer.

Wallace, B. C. and Dahabreh, I. J. (2012). Class probability estimates are unreliable for imbalanced data (and how to fix them). In 2012 IEEE 12th International Conference on Data Mining, pages 695–704.

Wen, T., Vandyke, D., Mrksíc, N., Gasíc, M., Rojas-Barahona, L., Su, P., Ultes, S., and Young, S. (2017). A network-based end-to-end trainable task-oriented dialogue system. In 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017-Proceedings of Conference, volume 1, pages 438–449.

Wu, Q., Ye, Y., Zhang, H., Ng, M. K., and Ho, S.-S. (2014). Forestexter: an efficient random forest algorithm for imbalanced text categorization. Knowledge-Based Systems, 67:105–116.

Yu, B., Kaufmann, S., and Diermeier, D. (2008). Classifying party affiliation from political speech. Journal of Information Technology & Politics, 5(1):33–48.

DEBACER: a method for slicing moderated debates

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)