Rating Prediction in Brazilian Portuguese Reviews: An Approach Based on Textual Features
Resumo
Este trabalho explora a predição de ratings em avaliações de usuários da Amazon escritas em português brasileiro, utilizando features textuais e modelos de aprendizado de máquina. Diferentes grupos de características textuais foram propostos e analisados, e os experimentos indicam que informações léxicas desempenham um papel fundamental na tarefa. A eficácia dos modelos varia entre categorias de produtos, sendo maior em domínios com vocabulário mais homogêneo. Como contribuição, este estudo reforça a importância de representações textuais na análise automatizada de avaliações e amplia o conhecimento sobre predição de ratings no contexto da língua portuguesa.Referências
A. Semary, N., Ahmed, W., Amin, K., Pławiak, P., and Hammad, M. (2024). Enhancing machine learning-based sentiment analysis through feature extraction techniques. Plos one, 19(2):e0294968.
Anchiêta, R. T., Neto, F. A. R., Marinho, J. C., do Nascimento, K. V., and Moura, R. S. (2021). Piln idpt 2021: Irony detection in portuguese texts with superficial features and embeddings. In IberLEF@ SEPLN, pages 917–924.
Antonio, N., de Almeida, A. M., Nunes, L., Batista, F., and Ribeiro, R. (2018). Hotel online reviews: creating a multi-source aggregated index. International Journal of Contemporary Hospitality Management, 30(12):3574–3591.
Chai, Y., Lei, C., and Yin, C. (2019). Study on the influencing factors of online learning effect based on decision tree and recursive feature elimination. In Proc. 10th Int. Conf. on E-Education, E-Business, E-Management and E-Learning (IC4E’19), pages 52–57.
Chambua, J. and Niu, Z. (2021). Review text based rating prediction approaches: preference knowledge learning, representation and utilization. Artificial Intelligence Review, 54:1171–1200.
Chenlo, J. M. and Losada, D. E. (2014). An empirical study of sentence features for subjectivity and polarity classification. Information Sciences, 280:275–288.
de Almeida Neto, J. A. and de Melo, T. (2023). Exploring supervised learning models for multi-label text classification in brazilian restaurant reviews. In Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pages 126–140. SBC.
de Melo, T. (2021). Sentiprodbr: Building domain-specific sentiment lexicons for the portuguese language. In Anais do XXXVI Simpósio Brasileiro de Bancos de Dados, pages 349–354. SBC.
de Melo, T., da Silva, A. S., de Moura, E. S., and Calado, P. (2019). Opinionlink: Leveraging user opinions for product catalog enrichment. Information Processing & Management, 56(3):823–843.
de Oliveira, M. and de Melo, T. (2021). An empirical study of text features for identifying subjective sentences in portuguese. In Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, November 29–December 3, 2021, Proceedings, Part II 10, pages 374–388. Springer.
Hanić, S., Bagić Babac, M., Gledec, G., and Horvat, M. (2024). Comparing machine learning models for sentiment analysis and rating prediction of vegan and vegetarian restaurant reviews. Computers, 13(10):248.
Hogenboom, A., Bal, D., Frasincar, F., Bal, M., De Jong, F., and Kaymak, U. (2015). Exploiting emoticons in polarity classification of text. Journal of Web Engineering, pages 022–040.
Hossain, M. I., Rahman, M., Ahmed, M. T., Rahman, M. S., and Islam, A. T. (2021). Rating prediction of product reviews of bangla language using machine learning algorithms. In 2021 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), pages 1–6. IEEE.
Kang, W.-C., Ni, J., Mehta, N., Sathiamoorthy, M., Hong, L., Chi, E., and Cheng, D. Z. (2023). Do llms understand user preferences? evaluating llms on user rating prediction. arXiv preprint arXiv:2305.06474.
Khan, R. A., Mannan, A., and Aslam, N. (2022). Prediction of product rating based on polarized reviews using supervised machine learning. VFAST Transactions on Software Engineering, 10(4):01–09.
Kimura, M. and Katsurai, M. (2018). Investigating the consistency of emoji sentiment lexicons constructed using different languages. In Proc. 20th Int. Conf. on Information Integration and Web-based Applications & Services (iiWAS’18), pages 310–313.
Li, J., Wang, Y., and Tao, Z. (2022a). A rating prediction recommendation model combined with the optimizing allocation for information granularity of attributes. Information, 13(1):21.
Li, S., Liu, F., Zhang, Y., Zhu, B., Zhu, H., and Yu, Z. (2022b). Text mining of user-generated content (ugc) for business applications in e-commerce: A systematic review. Mathematics, 10(19):3554.
Pereira, D. A. (2021). A survey of sentiment analysis in the portuguese language. Artificial Intelligence Review, 54(2):1087–1115.
Stankevičius, L. and Lukoševičius, M. (2024). Extracting sentence embeddings from pretrained transformer models. Applied Sciences, 14(19):8887.
Anchiêta, R. T., Neto, F. A. R., Marinho, J. C., do Nascimento, K. V., and Moura, R. S. (2021). Piln idpt 2021: Irony detection in portuguese texts with superficial features and embeddings. In IberLEF@ SEPLN, pages 917–924.
Antonio, N., de Almeida, A. M., Nunes, L., Batista, F., and Ribeiro, R. (2018). Hotel online reviews: creating a multi-source aggregated index. International Journal of Contemporary Hospitality Management, 30(12):3574–3591.
Chai, Y., Lei, C., and Yin, C. (2019). Study on the influencing factors of online learning effect based on decision tree and recursive feature elimination. In Proc. 10th Int. Conf. on E-Education, E-Business, E-Management and E-Learning (IC4E’19), pages 52–57.
Chambua, J. and Niu, Z. (2021). Review text based rating prediction approaches: preference knowledge learning, representation and utilization. Artificial Intelligence Review, 54:1171–1200.
Chenlo, J. M. and Losada, D. E. (2014). An empirical study of sentence features for subjectivity and polarity classification. Information Sciences, 280:275–288.
de Almeida Neto, J. A. and de Melo, T. (2023). Exploring supervised learning models for multi-label text classification in brazilian restaurant reviews. In Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pages 126–140. SBC.
de Melo, T. (2021). Sentiprodbr: Building domain-specific sentiment lexicons for the portuguese language. In Anais do XXXVI Simpósio Brasileiro de Bancos de Dados, pages 349–354. SBC.
de Melo, T., da Silva, A. S., de Moura, E. S., and Calado, P. (2019). Opinionlink: Leveraging user opinions for product catalog enrichment. Information Processing & Management, 56(3):823–843.
de Oliveira, M. and de Melo, T. (2021). An empirical study of text features for identifying subjective sentences in portuguese. In Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, November 29–December 3, 2021, Proceedings, Part II 10, pages 374–388. Springer.
Hanić, S., Bagić Babac, M., Gledec, G., and Horvat, M. (2024). Comparing machine learning models for sentiment analysis and rating prediction of vegan and vegetarian restaurant reviews. Computers, 13(10):248.
Hogenboom, A., Bal, D., Frasincar, F., Bal, M., De Jong, F., and Kaymak, U. (2015). Exploiting emoticons in polarity classification of text. Journal of Web Engineering, pages 022–040.
Hossain, M. I., Rahman, M., Ahmed, M. T., Rahman, M. S., and Islam, A. T. (2021). Rating prediction of product reviews of bangla language using machine learning algorithms. In 2021 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), pages 1–6. IEEE.
Kang, W.-C., Ni, J., Mehta, N., Sathiamoorthy, M., Hong, L., Chi, E., and Cheng, D. Z. (2023). Do llms understand user preferences? evaluating llms on user rating prediction. arXiv preprint arXiv:2305.06474.
Khan, R. A., Mannan, A., and Aslam, N. (2022). Prediction of product rating based on polarized reviews using supervised machine learning. VFAST Transactions on Software Engineering, 10(4):01–09.
Kimura, M. and Katsurai, M. (2018). Investigating the consistency of emoji sentiment lexicons constructed using different languages. In Proc. 20th Int. Conf. on Information Integration and Web-based Applications & Services (iiWAS’18), pages 310–313.
Li, J., Wang, Y., and Tao, Z. (2022a). A rating prediction recommendation model combined with the optimizing allocation for information granularity of attributes. Information, 13(1):21.
Li, S., Liu, F., Zhang, Y., Zhu, B., Zhu, H., and Yu, Z. (2022b). Text mining of user-generated content (ugc) for business applications in e-commerce: A systematic review. Mathematics, 10(19):3554.
Pereira, D. A. (2021). A survey of sentiment analysis in the portuguese language. Artificial Intelligence Review, 54(2):1087–1115.
Stankevičius, L. and Lukoševičius, M. (2024). Extracting sentence embeddings from pretrained transformer models. Applied Sciences, 14(19):8887.
Publicado
29/09/2025
Como Citar
MARREIRA, Emanuelle; OLIVEIRA, Miguel de; MELO, Tiago de.
Rating Prediction in Brazilian Portuguese Reviews: An Approach Based on Textual Features. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 22. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 1352-1363.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2025.11794.
