Generating E-commerce Product Titles in Portuguese

  • Livy Real B2W Digital
  • Karina M. Johansson UFSCar
  • Júlio C. S. Mendes USP
  • Bianca M. Lopes UFSCar
  • Márcio T. I. Oshiro B2W Digital


This paper explores how Natural Language Processing techniques can be integrated to solve real-world problems in the e-commerce scenario. We address the issue of having high quality information products offered to customers in a marketplace platform, composed by thousands of sellers producing original content in multiple languages, following different SEO and cultural assumptions. We propose an NLP pipeline to generate high quality titles products in Portuguese.
Palavras-chave: Natural Language Processing, e-commerce, Portuguese


Cheng, X., Bowden, M., Bhange, B. R., Goyal, P., Packer, T., and Javed, F. (2020). An end-to-end solution for named entity recognition in ecommerce search.

Honnibal, M., Montani, I., Van Landeghem, S., and Boyd, A. (2020). spaCy: Industrial-strength Natural Language Processing in Python.

Huimin Xu, Wenting Wang, X. M. X. J. M. L. (2020). Scaling up open tagging from tens to thousands: Comprehension empowered attribute value extraction from product title. Proceedings of the 57th Annual Meeting of the ACL.

Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.

Mahesh Joshi, Ethan Hart, M. V. J.-D. R. (2015). Distributed word representations improve ner for e-commerce. Proceedings of NAACL-HLT 2015.

Mathur, P., Ueffing, N., and Leusch, G. (2018). Multi-lingual neural title generation for e-commerce browse pages.

Najmi, A. (2019). Imputation of missing product information using deep learning: A use case on the amazon product catalogue. Master’s thesis, TECHNISCHE UNIVERSITÄT MÜNCHEN.

Peng Yuan, Haoran Li, S. X. Y. W. X. H. and Zhou, B. (2020). On the faithfulness for e-commerce product summarization. Proceedings of the 28th International Conference on Computational Linguistics

Qi, P., Zhang, Y., Zhang, Y., Bolton, J., and Manning, C. D. (2020). Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the ACL.

Souza, F., Nogueira, R., and Lotufo, R. (2019). Portuguese named entity recognition using bert-crf. arXiv preprint arXiv:1909.10649

Zhang, H., Hennig, L., Alt, C., Hu, C., Meng, Y., and Wang, C. (2020). Bootstrapping named entity recognition in e-commerce with positive unlabeled learning

Zhangming Chan, Xiuying Chen, Y. W. J. L. Z. Z. K. G. D. Z. R. Y. (2019). Stick to facts: Towards fidelity-oriented product description generation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing.
REAL, Livy; JOHANSSON, Karina M.; MENDES, Júlio C. S.; LOPES, Bianca M.; OSHIRO, Márcio T. I.. Generating E-commerce Product Titles in Portuguese. In: SEMINÁRIO INTEGRADO DE SOFTWARE E HARDWARE (SEMISH), 48. , 2021, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 299-304. ISSN 2595-6205. DOI: