Gender Representation in Literature: Analysis of Characters' Physical Descriptions

  • Mariana O. Silva Universidade Federal de Minas Gerais
  • Luiza de Melo-Gomes Universidade Federal de Minas Gerais
  • Mirella M. Moro Universidade Federal de Minas Gerais


This study employs Natural Language Processing (NLP) techniques to quantitatively analyze the descriptions of male and female body parts in Portuguese literature. We investigate these descriptions' frequency, specificity, and objectification by examining a corpus of literary works. The results indicate distinct differences in how male and female bodies are portrayed, revealing evidence of gender bias in the choice of specific descriptors for body parts. This research contributes to the ongoing discourse surrounding gender representation in literature, shedding light on the potential biases in textual descriptions. Furthermore, it underscores the significance of NLP techniques in uncovering patterns within literary texts, providing valuable insights into data mining. Through this analysis, we deepen our understanding of gender dynamics within literary works and foster critical discussions on representation in literature.
Palavras-chave: data mining, natural language processing, gender representation, Portuguese literature


Adukia, A. et al. Portrayals of race and gender: Sentiment in 100 years of childrens literature. In ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies (COMPASS). Seattle, WA, USA, pp. 2028, 2022a.

Adukia, A. et al. Tales and tropes: Gender roles from word embeddings in a century of children’s books. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022. International Committee on Computational Linguistics, pp. 3086–3097, 2022b.

Cardoso, B. and Pereira, D. Evaluating an aspect extraction method for opinion mining in the portuguese language. In Anais do VIII Symposium on Knowledge Discovery, Mining and Learning. SBC, pp. 137–144, 2020.

Cheng, J. Fleshing out models of gender in english-language novels (1850–2000). Journal of Cultural Analytics 5 (1): 11652, 2020.

Cordeiro, D. et al. Representativeness of women in postgraduate programs in computer science in brazil. In Anais do XIV Women in Information Technology. SBC, Cuiabá, pp. 110–119, 2020.

Cristiani, A., Lieira, D., and Camargo, H. A sentiment analysis of brazilian elections tweets. In Anais do VIII Symposium on Knowledge Discovery, Mining and Learning. SBC, pp. 153–160, 2020.

Hoyle, A. et al. Unsupervised discovery of gendered language through latent-variable modeling. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL. Vol. 1. pp. 1706–1716, 2019.

Jockers, M. and Kirilloff, G. Understanding gender and character agency in the 19th century novel. Journal of Cultural Analytics 2 (2), 12, 2016.

Khadilkar, K., KhudaBukhsh, A. R., and Mitchell, T. M. Gender bias, social bias, and representation: 70 years of bollywood. Patterns 3 (2): 100409, 2022.

Kohler, L. et al. A representatividade feminina nos jogos digitais. In Anais do XV Women in Information Technology. SBC, Evento Online, pp. 265–269, 2021.

Labatut, V. and Bost, X. Extraction and analysis of fictional character networks: A survey. ACM Comput. Surv. 52 (5): 89:1–89:40, 2019.

Pizzol, N. D., Barbosa, E., and Musse, S. Gender representation in brazilian computer science conferences. In Anais do XVI Women in Information Technology. SBC, Niterói, pp. 67–76, 2022.

Silva, M. O. et al. PPORTAL: Public domain Portuguese-language literature Dataset. In SBBD DSW. SBC, Rio de Janeiro, Brazil, pp. 77–88, 2021.

Ermáková, A. and Mahlberg, M. Gender inequality and female body language in childrens literature. Digital Scholarship in the Humanities 36 (Supplement_2): ii72–ii77, 12, 2020.
SILVA, Mariana O.; MELO-GOMES, Luiza de; MORO, Mirella M.. Gender Representation in Literature: Analysis of Characters' Physical Descriptions. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 11. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 17-24. ISSN 2763-8944. DOI: