Rating Prediction in Brazilian Portuguese Reviews: An Approach Based on Textual Features
Abstract
This paper investigates rating prediction in user reviews from Amazon written in Brazilian Portuguese, leveraging textual features and machine learning models. We propose and analyze different groups of textual features, with experimental results highlighting the crucial role of lexical information in this task. Model performance varies across product categories, with higher accuracy observed in domains with a more homogeneous vocabulary. As a contribution, this study reinforces the significance of textual representations in automated review analysis and advances the understanding of rating prediction within the context of the Portuguese language.References
A. Semary, N., Ahmed, W., Amin, K., Pławiak, P., and Hammad, M. (2024). Enhancing machine learning-based sentiment analysis through feature extraction techniques. Plos one, 19(2):e0294968.
Anchiêta, R. T., Neto, F. A. R., Marinho, J. C., do Nascimento, K. V., and Moura, R. S. (2021). Piln idpt 2021: Irony detection in portuguese texts with superficial features and embeddings. In IberLEF@ SEPLN, pages 917–924.
Antonio, N., de Almeida, A. M., Nunes, L., Batista, F., and Ribeiro, R. (2018). Hotel online reviews: creating a multi-source aggregated index. International Journal of Contemporary Hospitality Management, 30(12):3574–3591.
Chai, Y., Lei, C., and Yin, C. (2019). Study on the influencing factors of online learning effect based on decision tree and recursive feature elimination. In Proc. 10th Int. Conf. on E-Education, E-Business, E-Management and E-Learning (IC4E’19), pages 52–57.
Chambua, J. and Niu, Z. (2021). Review text based rating prediction approaches: preference knowledge learning, representation and utilization. Artificial Intelligence Review, 54:1171–1200.
Chenlo, J. M. and Losada, D. E. (2014). An empirical study of sentence features for subjectivity and polarity classification. Information Sciences, 280:275–288.
de Almeida Neto, J. A. and de Melo, T. (2023). Exploring supervised learning models for multi-label text classification in brazilian restaurant reviews. In Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pages 126–140. SBC.
de Melo, T. (2021). Sentiprodbr: Building domain-specific sentiment lexicons for the portuguese language. In Anais do XXXVI Simpósio Brasileiro de Bancos de Dados, pages 349–354. SBC.
de Melo, T., da Silva, A. S., de Moura, E. S., and Calado, P. (2019). Opinionlink: Leveraging user opinions for product catalog enrichment. Information Processing & Management, 56(3):823–843.
de Oliveira, M. and de Melo, T. (2021). An empirical study of text features for identifying subjective sentences in portuguese. In Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, November 29–December 3, 2021, Proceedings, Part II 10, pages 374–388. Springer.
Hanić, S., Bagić Babac, M., Gledec, G., and Horvat, M. (2024). Comparing machine learning models for sentiment analysis and rating prediction of vegan and vegetarian restaurant reviews. Computers, 13(10):248.
Hogenboom, A., Bal, D., Frasincar, F., Bal, M., De Jong, F., and Kaymak, U. (2015). Exploiting emoticons in polarity classification of text. Journal of Web Engineering, pages 022–040.
Hossain, M. I., Rahman, M., Ahmed, M. T., Rahman, M. S., and Islam, A. T. (2021). Rating prediction of product reviews of bangla language using machine learning algorithms. In 2021 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), pages 1–6. IEEE.
Kang, W.-C., Ni, J., Mehta, N., Sathiamoorthy, M., Hong, L., Chi, E., and Cheng, D. Z. (2023). Do llms understand user preferences? evaluating llms on user rating prediction. arXiv preprint arXiv:2305.06474.
Khan, R. A., Mannan, A., and Aslam, N. (2022). Prediction of product rating based on polarized reviews using supervised machine learning. VFAST Transactions on Software Engineering, 10(4):01–09.
Kimura, M. and Katsurai, M. (2018). Investigating the consistency of emoji sentiment lexicons constructed using different languages. In Proc. 20th Int. Conf. on Information Integration and Web-based Applications & Services (iiWAS’18), pages 310–313.
Li, J., Wang, Y., and Tao, Z. (2022a). A rating prediction recommendation model combined with the optimizing allocation for information granularity of attributes. Information, 13(1):21.
Li, S., Liu, F., Zhang, Y., Zhu, B., Zhu, H., and Yu, Z. (2022b). Text mining of user-generated content (ugc) for business applications in e-commerce: A systematic review. Mathematics, 10(19):3554.
Pereira, D. A. (2021). A survey of sentiment analysis in the portuguese language. Artificial Intelligence Review, 54(2):1087–1115.
Stankevičius, L. and Lukoševičius, M. (2024). Extracting sentence embeddings from pretrained transformer models. Applied Sciences, 14(19):8887.
Anchiêta, R. T., Neto, F. A. R., Marinho, J. C., do Nascimento, K. V., and Moura, R. S. (2021). Piln idpt 2021: Irony detection in portuguese texts with superficial features and embeddings. In IberLEF@ SEPLN, pages 917–924.
Antonio, N., de Almeida, A. M., Nunes, L., Batista, F., and Ribeiro, R. (2018). Hotel online reviews: creating a multi-source aggregated index. International Journal of Contemporary Hospitality Management, 30(12):3574–3591.
Chai, Y., Lei, C., and Yin, C. (2019). Study on the influencing factors of online learning effect based on decision tree and recursive feature elimination. In Proc. 10th Int. Conf. on E-Education, E-Business, E-Management and E-Learning (IC4E’19), pages 52–57.
Chambua, J. and Niu, Z. (2021). Review text based rating prediction approaches: preference knowledge learning, representation and utilization. Artificial Intelligence Review, 54:1171–1200.
Chenlo, J. M. and Losada, D. E. (2014). An empirical study of sentence features for subjectivity and polarity classification. Information Sciences, 280:275–288.
de Almeida Neto, J. A. and de Melo, T. (2023). Exploring supervised learning models for multi-label text classification in brazilian restaurant reviews. In Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pages 126–140. SBC.
de Melo, T. (2021). Sentiprodbr: Building domain-specific sentiment lexicons for the portuguese language. In Anais do XXXVI Simpósio Brasileiro de Bancos de Dados, pages 349–354. SBC.
de Melo, T., da Silva, A. S., de Moura, E. S., and Calado, P. (2019). Opinionlink: Leveraging user opinions for product catalog enrichment. Information Processing & Management, 56(3):823–843.
de Oliveira, M. and de Melo, T. (2021). An empirical study of text features for identifying subjective sentences in portuguese. In Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, November 29–December 3, 2021, Proceedings, Part II 10, pages 374–388. Springer.
Hanić, S., Bagić Babac, M., Gledec, G., and Horvat, M. (2024). Comparing machine learning models for sentiment analysis and rating prediction of vegan and vegetarian restaurant reviews. Computers, 13(10):248.
Hogenboom, A., Bal, D., Frasincar, F., Bal, M., De Jong, F., and Kaymak, U. (2015). Exploiting emoticons in polarity classification of text. Journal of Web Engineering, pages 022–040.
Hossain, M. I., Rahman, M., Ahmed, M. T., Rahman, M. S., and Islam, A. T. (2021). Rating prediction of product reviews of bangla language using machine learning algorithms. In 2021 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), pages 1–6. IEEE.
Kang, W.-C., Ni, J., Mehta, N., Sathiamoorthy, M., Hong, L., Chi, E., and Cheng, D. Z. (2023). Do llms understand user preferences? evaluating llms on user rating prediction. arXiv preprint arXiv:2305.06474.
Khan, R. A., Mannan, A., and Aslam, N. (2022). Prediction of product rating based on polarized reviews using supervised machine learning. VFAST Transactions on Software Engineering, 10(4):01–09.
Kimura, M. and Katsurai, M. (2018). Investigating the consistency of emoji sentiment lexicons constructed using different languages. In Proc. 20th Int. Conf. on Information Integration and Web-based Applications & Services (iiWAS’18), pages 310–313.
Li, J., Wang, Y., and Tao, Z. (2022a). A rating prediction recommendation model combined with the optimizing allocation for information granularity of attributes. Information, 13(1):21.
Li, S., Liu, F., Zhang, Y., Zhu, B., Zhu, H., and Yu, Z. (2022b). Text mining of user-generated content (ugc) for business applications in e-commerce: A systematic review. Mathematics, 10(19):3554.
Pereira, D. A. (2021). A survey of sentiment analysis in the portuguese language. Artificial Intelligence Review, 54(2):1087–1115.
Stankevičius, L. and Lukoševičius, M. (2024). Extracting sentence embeddings from pretrained transformer models. Applied Sciences, 14(19):8887.
Published
2025-09-29
How to Cite
MARREIRA, Emanuelle; OLIVEIRA, Miguel de; MELO, Tiago de.
Rating Prediction in Brazilian Portuguese Reviews: An Approach Based on Textual Features. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 22. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 1352-1363.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2025.11794.
