TY - JOUR AU - P. Braga, M. Luisa AU - G. Nakamura, FabĂ­ola AU - F. Nakamura, Eduardo PY - 2021/08/20 Y2 - 2024/03/29 TI - Creation and Characterization of a Sexist Discourse Corpus in Portuguese JF - iSys - Brazilian Journal of Information Systems JA - iSys VL - 14 IS - 2 SE - Extended versions of selected articles DO - 10.5753/isys.2021.1797 UR - https://sol.sbc.org.br/journals/index.php/isys/article/view/1797 SP - 79-95 AB - <p>Sexism is a topic whose social interest has grown as the female figure overcomes barriers of gender inequality. Sexist discourse propagates and encourages derogatory and abusive behavior against women. Accurate characterization and identification are key for treating and mitigating violence. In this work, we present a corpus of sexist discourse in Portuguese collected from news portals of great popular acceptance. The paper presents three main contributions: (1) the process of creating the corpus and labeling comments (sexist / non-sexist); (2) the characterization and analysis of the corpus and the behavior of anonymous labelers; (3) an initial assessment of machine learning techniques for classifying sexist / non-sexist comments. Preliminary results show that, when using support vector machine, it is possible to identify sexist comments with an F1 measure above 0.8, precision above 0.9 and recall close to 0.8.</p> ER -