SteamBR: a dataset for game reviews and evaluation of a state-of-the-art method for helpfulness prediction

  • Germano A. Z. Jorge USP
  • Thiago A. S. Pardo USP


The digital revolution has led to exponential growth in user-generated content, including ratings and reviews, across numerous online platforms. One such platform is Steam, a multifaceted digital distribution network primarily for video games, that also functions as an active social network. Like many e-commerce, travel, and restaurant platforms, Steam users rely heavily on reviews to inform their purchasing decisions. However, the vast amount of data and varying quality of reviews may hinder the utility of such reviews. Furthermore, there is a significant challenge in assessing the helpfulness of recent or less-voted reviews. This study proposes a method for automating review helpfulness evaluation, focusing particularly on Brazilian Portuguese game reviews. The research involved the collection of a large dataset, including 2,789,893 reviews from over 12,000 games, creating a novel dataset for game reviews. Using feature extraction techniques, we were able to capture the metadata, semantic elements, and distributional characteristics present in the reviews. Subsequently, Machine Learning algorithms were employed to perform classification and regression tasks, with the objective of discerning helpful from unhelpful reviews. The achieved results demonstrated that the method was highly effective in predicting review helpfulness.


Abukausar, MD., S.dhaka, V and Kumar Singh, S. (2013) “Web Crawler: A Review. In: International Journal of Computer Applications, v.63, n.2, p.31-36

Balage Filho, P., Pardo, T. and Aluísio, S. (2013) “An Evaluation of the Brazilian Portuguese LIWC Dictionary for Sentiment Analysis. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology, p.215-219

Barbosa, J., Moura, R. and Santos, R. L. (2016) “Predicting Portuguese Steam Review Helpfulness Using Artificial Neural Networks In: Proceedings of the 22nd Brazilian Symposium on Multimedia and the Web (pp. 287-293).

Bertaglia, T. F. C. (2017) “Normalização textual de conteúdo gerado por usuário”. Tese (Mestrado em Ciência da Computação) – Instituto de Ciências Matemáticas e de Computação.

Blei, D., Ng A.y and Jordan, M. (2003) “Latent Dirichlet Allocation”, In: Journal of Machine Learning Research., p.993–1022.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). “Bert: Pre-training of deep bidirectional transformers for language understanding”. arXiv preprint arXiv:1810.04805.

Friedman, J. (2001) “Greedy function approximation: A gradient boosting machine.” In: The Annals of Statistics, v.20, n.5., p.1189-1232.

Kim, S. M., Pantel, P., Chklovski, T., & Pennacchiotti, M. (2006). “Automatically assessing review helpfulness”. In: Proceedings of the 2006 Conference on empirical methods in natural language processing., p. 423-430.

Krishnamoorthy, S. (2015). “Linguistic features for review helpfulness prediction”. In: Expert Systems with Applications, 42(7)., p. 3751-3759.

Le, Q. and Mikolov, T. (2014). “Distributed Representations of Sentences and Documents”. In: Proceedings of the 31st International Conference on Machine Learning, 32(2):1188-1196

Liu, B. (2012). “Sentiment analysis and opinion mining”. In: Synthesis lectures on human language technologies, 5(1), 1-167.

Liu, J., Cao, Y., Lin, C.Y., Huang, Y., Zhou, M. (2007) “Low-quality product review detection in opinion summarization”. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

Lu, Y., Tsaparas, P., Ntoulas, A., & Polanyi, L. (2010). “Exploiting social context for review quality prediction”. In: Proceedings of the 19th international conference on World wide web., p. 691-700.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). “Distributed representations of words and phrases and their compositionality”. In: Advances in neural information processing systems, 26.

Mudambi, S. M., & Schuff, D. (2010). “Research note: What makes a helpful online review? A study of customer reviews on”. In: MIS quarterly, 185-200.

Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). “Linguistic inquiry and word count: LIWC 2001”. In: Mahway: Lawrence Erlbaum Associates, 71.

Sousa, R. F. D., Brum, H. B., & Nunes, M. D. G. V. (2019). “A bunch of helpfulness and sentiment corpora in Brazilian Portuguese”. In: Symposium in Information and Human Language Technology – STIL. SBC

Sousa, R., & Pardo, T. (2022). “Evaluating Content Features and Classification Methods for Helpfulness Prediction of Online Reviews: Establishing a Benchmark for Portuguese”. In: Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis., p. 204-213.

T. A. S., & Aluísio, S. M. (2014, May). “A Large Corpus of Product Reviews in Portuguese: Tackling Out-Of-Vocabulary Words”. In: LREC., p. 3865-3871.

Zhang, Z., & Varadarajan, B. (2006). “Utility scoring of product reviews”. In: Proceedings of the 15th ACM international conference on Information and knowledge management., p. 51-57.
JORGE, Germano A. Z.; PARDO, Thiago A. S.. SteamBR: a dataset for game reviews and evaluation of a state-of-the-art method for helpfulness prediction. In: BRAZILIAN WORKSHOP ON SOCIAL NETWORK ANALYSIS AND MINING (BRASNAM), 12. , 2023, João Pessoa/PB. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 210-215. ISSN 2595-6094. DOI:

Artigos mais lidos do(s) mesmo(s) autor(es)