Improving Irony Detection by Balancing Methods and Feature Selection

Anthony I. M. Luz; Henrique Santos; Manoel M. P. Medeiros; Rafael T. Anchiêta

doi:10.5753/brasnam.2023.230152

Anthony I. M. Luz IFPI
Henrique Santos UFPI
Manoel M. P. Medeiros IFPI
Rafael T. Anchiêta IFPI

DOI: https://doi.org/10.5753/brasnam.2023.230152

Resumo

Irony is a linguistic phenomenon that can be seen as a funny or strange aspect of a situation that is very different from what is expected, using words that say the opposite of what they really mean, often as a joke, and with a voice that shows that. When it is just text, detecting irony becomes quite challenging. In this paper, we adopt an approach organized into three stages: feature extraction, sampling techniques, and feature selection to detect ironic texts written in the Portuguese language. We evaluate our strategy on the IDPT corpus and achieve 0.55 balanced accuracy, outperforming state-of-the-art results. Moreover, we found out that both sampling techniques and feature selection may improve the results.

Referências

Anchiêta, R. T., Neto, F. A. R., Marinho, J. C., do Nascimento, K. V., and Moura, R. S. (2021). Piln IDPT 2021: Irony detection in portuguese texts with superficial features and embeddings. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), pages 917–924, Málaga, Spain. CEUR-WS.org.

Corrêa, U. B., Coelho, L., Santos, L., and de Freitas, L. A. (2021). Overview of the idpt task on irony detection in portuguese at iberlef 2021. Procesamiento del Lenguaje Natural, 67:269–276.

Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., and Herrera, F. (2018). Learning from imbalanced data sets, volume 10. Springer.

Fonseca, E. R. and Rosa, J. L. G. (2013). Mac-morpho revisited: Towards robust part-of-speech tagging. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology, pages 98–107, Fortaleza, Brazil. Sociedade Brasileira de Computação.

Heinrich, T., Ceschin, F., and Marchi, F. (2021). Teamufpr at IDPT 2021: Equalizing a strategy using machine learning for two types of data in detecting irony. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), pages 952–932, Málaga, Spain. CEUR-WS.org.

Lemaître, G., Nogueira, F., and Aridas, C. K. (2017). Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1–5.

Oliveira, H. G., Pereira, J., and Cruz, G. (2021). Cisuc at IDPT2021: Traditional and deep learning for irony detection in portuguese. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), pages 898–909, Málaga, Spain. CEUR-WS.org.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Pedro, G. W. (2018). Comentcorpus: Identificação e pistas linguísticas para detecção de ironia no português do brasil. Master’s thesis, Universidade Federal de São Carlos.

Reyes, A., Rosso, P., and Buscaldi, D. (2009). Humor in the blogosphere: First clues for a verbal humor taxonomy. Journal of Intelligent Systems, 18(4):311–332.

Improving Irony Detection by Balancing Methods and Feature Selection

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)