MQD-1222: A Brazilian Portuguese Sentiment Analysis Dataset with Gender-Paired Annotations
Abstract
This paper presents MQD-1222, a publicly available Brazilian Portuguese sentiment analysis dataset composed of 1,222 texts from Meu Querido Diário, annotated under a gender-paired protocol. Each text was annotated by four male participants and four female participants, who assigned one of three sentiment labels: negative, neutral, or positive. In addition to the dataset with majority labels for each group, the study provides all 11,704 individual annotations and their response times. Agreement between the two groups fell within the ‘substantial’ range (κ = 0.7664), with matching labels in 84.5% of the instances. In cases of disagreement, an asymmetrical pattern was observed: in 63.5% of them, the female group softened polar judgments toward the neutral class.References
Al Kuwatly, H., Wich, M., and Groh, G. (2020). Identifying and measuring annotator bias based on annotators’ demographic characteristics. In Akiwowo, S., Vidgen, B., Prabhakaran, V., and Waseem, Z., editors, Proceedings of the Fourth Workshop on Online Abuse and Harms, pages 184–190, Online. Association for Computational Linguistics.
Aroyo, L. and Welty, C. (2015). Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine, 36(1):15–24.
Azevedo, G. d., Pettine, G., Feder, F., Portugal, G., Schocair Mendes, C. O., Castaneda Ribeiro, R., Mauro, R. C., Paschoal Júnior, F., and Guedes, G. (2021). Nat: Towards an emotional agent. In 2021 16th Iberian Conference on Information Systems and Technologies (CISTI), pages 1–4.
Biester, L., Sharma, V., Kazemi, A., Deng, N., Wilson, S., and Mihalcea, R. (2022). Analyzing the effects of annotator gender across 4 NLP tasks. In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, pages 10–19. European Language Resources Association.
Brum, H. and Volpe Nunes, M. d. G. (2018). Building a sentiment corpus of tweets in Brazilian Portuguese. In Calzolari, N., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Hasida, K., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S., and Tokunaga, T., editors, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Ding, Y., You, J., Machulla, T.-K., Jacobs, J., Sen, P., and Höllerer, T. (2022). Impact of annotator demographics on sentiment dataset labeling. Proc. ACM Hum.-Comput. Interact., 6(CSCW2).
dos Santos Silva, L. N., Zandavalle, A. C., Rodrigues, C. F. G., da Silva Gama, T., Souza, F. G., Zaidan, P. D. S., da Silva, A. F. S., Soares, K., and Real, L. (2024). RePro: A benchmark dataset for opinion mining in Brazilian Portuguese. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 432–440, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.
Frenda, S., Basile, V., Caselli, T., and Patti, V. (2024). Perspectivist approaches to natural language processing: A survey. Language Resources and Evaluation.
Geva, M., Goldberg, Y., and Berant, J. (2019). Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP-IJCNLP), pages 1161–1166, Hong Kong, China. Association for Computational Linguistics.
Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.
Mostafazadeh Davani, A., Díaz, M., and Prabhakaran, V. (2022). Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics, 10:92–110.
Nascimento, G., Duarte, F., and Guedes, G. P. (2018). Emoções em português do brasil: um conjunto de dados e resultados de base. In Anais do VII Brazilian Workshop on Social Network Analysis and Mining, pages 223–228, Porto Alegre, RS, Brasil. SBC.
Pei, J. and Jurgens, D. (2023). When do annotator demographics matter? measuring the influence of annotator demographics with the POPQUORN dataset. In Prange, J. and Friedrich, A., editors, Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII), pages 252–265, Toronto, Canada. Association for Computational Linguistics.
Piorino, A. et al. (2025). Sentiment analysis of shared content in Brazilian Reddit communities. Journal on Interactive Systems.
Prabhakaran, V., Mostafazadeh Davani, A., and Diaz, M. (2021). On releasing annotator-level labels and information in datasets. In Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop, pages 133–138. Association for Computational Linguistics.
Saha, K., Yousuf, A., Hickman, L., Gupta, P., Tay, L., and De Choudhury, M. (2021). A social media study on demographic differences in perceived job satisfaction. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1):167.
Zehr, J. and Schwarz, F. (2018). PennController for Internet Based Experiments (IBEX). OSF.
Aroyo, L. and Welty, C. (2015). Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine, 36(1):15–24.
Azevedo, G. d., Pettine, G., Feder, F., Portugal, G., Schocair Mendes, C. O., Castaneda Ribeiro, R., Mauro, R. C., Paschoal Júnior, F., and Guedes, G. (2021). Nat: Towards an emotional agent. In 2021 16th Iberian Conference on Information Systems and Technologies (CISTI), pages 1–4.
Biester, L., Sharma, V., Kazemi, A., Deng, N., Wilson, S., and Mihalcea, R. (2022). Analyzing the effects of annotator gender across 4 NLP tasks. In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, pages 10–19. European Language Resources Association.
Brum, H. and Volpe Nunes, M. d. G. (2018). Building a sentiment corpus of tweets in Brazilian Portuguese. In Calzolari, N., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Hasida, K., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S., and Tokunaga, T., editors, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Ding, Y., You, J., Machulla, T.-K., Jacobs, J., Sen, P., and Höllerer, T. (2022). Impact of annotator demographics on sentiment dataset labeling. Proc. ACM Hum.-Comput. Interact., 6(CSCW2).
dos Santos Silva, L. N., Zandavalle, A. C., Rodrigues, C. F. G., da Silva Gama, T., Souza, F. G., Zaidan, P. D. S., da Silva, A. F. S., Soares, K., and Real, L. (2024). RePro: A benchmark dataset for opinion mining in Brazilian Portuguese. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 432–440, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.
Frenda, S., Basile, V., Caselli, T., and Patti, V. (2024). Perspectivist approaches to natural language processing: A survey. Language Resources and Evaluation.
Geva, M., Goldberg, Y., and Berant, J. (2019). Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP-IJCNLP), pages 1161–1166, Hong Kong, China. Association for Computational Linguistics.
Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.
Mostafazadeh Davani, A., Díaz, M., and Prabhakaran, V. (2022). Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics, 10:92–110.
Nascimento, G., Duarte, F., and Guedes, G. P. (2018). Emoções em português do brasil: um conjunto de dados e resultados de base. In Anais do VII Brazilian Workshop on Social Network Analysis and Mining, pages 223–228, Porto Alegre, RS, Brasil. SBC.
Pei, J. and Jurgens, D. (2023). When do annotator demographics matter? measuring the influence of annotator demographics with the POPQUORN dataset. In Prange, J. and Friedrich, A., editors, Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII), pages 252–265, Toronto, Canada. Association for Computational Linguistics.
Piorino, A. et al. (2025). Sentiment analysis of shared content in Brazilian Reddit communities. Journal on Interactive Systems.
Prabhakaran, V., Mostafazadeh Davani, A., and Diaz, M. (2021). On releasing annotator-level labels and information in datasets. In Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop, pages 133–138. Association for Computational Linguistics.
Saha, K., Yousuf, A., Hickman, L., Gupta, P., Tay, L., and De Choudhury, M. (2021). A social media study on demographic differences in perceived job satisfaction. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1):167.
Zehr, J. and Schwarz, F. (2018). PennController for Internet Based Experiments (IBEX). OSF.
Published
2026-07-19
How to Cite
FEITOSA, Alexander; FASANO, André; GUEDES, Gustavo.
MQD-1222: A Brazilian Portuguese Sentiment Analysis Dataset with Gender-Paired Annotations. In: BRAZILIAN WORKSHOP ON SOCIAL NETWORK ANALYSIS AND MINING (BRASNAM), 15. , 2026, Gramado/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 270-276.
ISSN 2595-6094.
DOI: https://doi.org/10.5753/brasnam.2026.23296.
