Text-to-hashtag Generation using Seq2Seq Learning

Resumo


In this paper, we studied whether models based on BiLSTM and BERT can predict hashtags in Brazilian Portuguese for Ecommerce websites. Hashtags have a sizable financial impact on Ecommerce. We processed a corpus of Ecommerce reviews as inputs, and predicted hashtags as outputs. We evaluated the results using four quantitative metrics: NIST, BLEU, METEOR and a crowdsourced score. A word cloud was used as a qualitative metric. While all computer-generated metrics (NIST, BLEU and METEOR) indicated bad results, the crowdsourced results produced amazing scores. We concluded that the texts predicted by the neural networks are very promising for use as hashtags for products on Ecommerce websites. The code for this work is available at https://github.com/augustocamargo/text-to-hashtag.
Palavras-chave: Hashtags, Ecommerce, BERT, BiLSTM, Portuguese

Referências

“b2w-reviews01” ([S.d.]). https://bit.ly/3cfPj8u, [accessed May 30, 2021].

Bahdanau, D., Cho, K. e Bengio, Y. (1 sep 2014). “Neural Machine Translation by Jointly Learning to Align and Translate”. 1 sep 2014. http://arxiv.org/abs/1409.0473.arXiv.

DePaolo, C. A. e Wilkinson, K. (1 may 2014). “Get Your Head into the Clouds: Using Word Clouds for Analyzing Qualitative Assessment Data”. TechTrends, v. 58, n. 3, p. 38–44.

Devlin, J., Chang, M.-W., Lee, K. e Toutanova, K. (11 oct 2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. 11 oct 2018. http://arxiv.org/abs/1810.04805. arXiv.

Howard, J. e Ruder, S. (18 jan 2018). “Universal Language Model Fine-tuning for Text Classification”. 18 jan 2018. http://arxiv.org/abs/1801.06146. arXiv.

Kaviani, M. e Rahmani, H. (apr 2020). “EmHash: Hashtag Recommendation using Neural Network based on BERT Embedding”. In 2020 6th International Conference on Web Research (ICWR).

Keras Team ([S.d.]). “Keras: the Python deep learning API”. https://keras.io/, [accessed on May 30, 2021].

Li, Y., Liu, T., Hu, J. e Jiang, J. (28 feb 2019). “Topical Co-Attention Networks for hashtag recommendation on microblogs”. Neurocomputing, v. 331, p. 356–365.

Li, Y., Liu, T., Jiang, J. e Zhang, L. (dec 2016). “Hashtag Recommendation with Topical Attention-Based LSTM”. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. . The COLING 2016 Organizing Committee.

Liu, Y., Ott, M., Goyal, N., et al. (26 jul 2019). “RoBERTa: A Robustly Optimized BERT Pretraining Approach”. 26 jul 2019. http://arxiv.org/abs/1907.11692.arXiv.

“Max Normalization” ([S.d.]). https://bit.ly/3fZPBkO, [accessed on May 30, 2021.

“Project Jupyter” ([S.d.]). https://jupyter.org/, [accessed on May 30, 2021].

Radford, A., Narasimhan, K., Salimans, T. e Sutskever, I. (2018). “Improving language understanding by generative pre-training”. https://bit.ly/3p6O7cB, [accessed on May 30, 2021].

“TensorFlow” ([S.d.]). https://www.tensorflow.org/, [accessed on May 30, 2021].

Wikipedia contributors (27 nov 2020). “Crowdsourcing”. https://bit.ly/3p4gwjC, [accessed on May 30, 2021].

Wolf, T., Debut, L., Sanh, V., et al. (9 oct 2019). “HuggingFace’s Transformers: State-of-the-art Natural Language Processing”. 9 oct 2019. http://arxiv.org/abs/1910.03771.arXiv.

Wołk, K. e Koržinek, D. (12 jan 2016). “Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking”. 12 jan 2016. http://arxiv.org/abs/1601.02789.arXiv.

Yang, D., Zhu, R. e Li, Y. (1 apr 2019). “Self-Attentive Neural Network for Hashtag Recommendation”. Journal of Engineering Science and Technology Review, v. 12, n. 2), p. 104–110.
Publicado
18/07/2021
CAMARGO, Augusto; CARVALHO, Wesley; PERESSIM, Felipe; BARZILAY, Alan; FINGER, Marcelo. Text-to-hashtag Generation using Seq2Seq Learning. In: BRAZILIAN E-SCIENCE WORKSHOP (BRESCI), 15. , 2021, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 121-128. ISSN 2763-8774. DOI: https://doi.org/10.5753/bresci.2021.15797.