A BERT-Based Approach for Gender Inference from Place Reviews: Applications to Urban Representativeness
Resumo
O conteúdo gerado pelo usuário em plataformas baseadas em localização fornece informações valiosas sobre o comportamento urbano, mas a falta de informações demográficas explícitas limita as análises de representatividade social. Em particular, compreender as diferenças de gênero no uso do espaço urbano permanece um desafio devido à ausência de atributos estruturados do usuário. Neste trabalho, investigamos o uso de técnicas de processamento de linguagem natural para inferir rótulos binários de gênero a partir de avaliações textuais no Google Places. Avaliamos duas abordagens baseadas em Transformers: um classificador BERT ajustado e um modelo baseado em BERT aumentado com características linguísticas para classificação de gênero a partir do texto da avaliação. Experimentos conduzidos em um conjunto de dados em larga escala de avaliações de locais mostram que o modelo BERT aumentado alcança alto desempenho, atingindo uma pontuação F1 média de 0,95. Além do desempenho preditivo, exploramos como os rótulos proxy de gênero inferidos podem apoiar a análise de representatividade urbana. Usando a cidade de Nova York como estudo de caso, analisamos a distribuição espacial do desequilíbrio de gênero entre os CEPs e avaliamos em que medida esses padrões se alinham com um benchmark externo (Foursquare). Essas descobertas destacam tanto o potencial quanto as limitações do uso de atributos demográficos inferidos para estudar a representatividade urbana. Nossos resultados demonstram que modelos de linguagem contextual podem auxiliar na inferência demográfica em dados sociais baseados em localização, possibilitando novas perspectivas sobre o comportamento urbano, ao mesmo tempo que levantam considerações importantes sobre viés, incerteza e representatividade.Referências
Alekseev, A. and Nikolenko, S. (2017). Word embeddings for user profiling in online social networks. Computacion y Sistemas, 21(2):203–226.
Aletras, N. and Chamberlain, B. P. (2018). Predicting Twitter User Socioeconomic Attributes with Network and Language Information. In Proceedings of the 29th on Hypertext and Social Media, pages 20–24, New York, NY, USA. ACM.
Bernabeu-Bautista, A., Serrano-Estrada, L., Perez-Sanchez, V. R., and Marti, P. (2021). The geography of social media data in urban areas: Representativeness and complementarity. ISPRS International Journal of Geo-Information, 10(11).
Blank, G. and Lutz, C. (2017). Representativeness of social media in great britain: Investigating facebook, linkedin, twitter, pinterest, google+, and instagram. American Behavioral Scientist, 61(7):741–756.
de Souza e Silva, A. (2007). Cell phones and places: The use of mobile technologies in brazil. In Societies and cities in the age of instant access, pages 295–310. Springer.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proc. of NAACL-HLT, volume 1, pages 4171–4186.
Eke, C. I., Norman, A. A., Shuib, L., and Nweke, H. F. (2019). A survey of user profiling: State-of-the-art, challenges, and solutions. IEEE Access, 7:144907–144924.
Gardazi, N. M., Daud, A., Malik, M. K., Bukhari, A., Alsahfi, T., and Alshemaimri, B. (2025). Bert applications in natural language processing: a review. Artificial Intelligence Review, 58(6):166.
Gubert, F. R., Santos, G. H., Delgado, M., Silver, D., and Silva, T. H. (2024). Culture Fingerprint: Identification of Culturally Similar Urban Areas Using Google Places Data. In Proc of ASONAM, Rende, Calabria, Italy.
Gómez, J.-C., Moreno, J., Manzano, M. A. I., and Ojeda, D. L. A. (2023). Reconstructive classification for age and gender identification in social networks. IEEE Trans. on Comput. Social Sys., 11(2):2291–2301.
Hargittai, E. (2020). Potential biases in big data: Omitted voices on social media. Social Science Computer Review, 38(1):10–24.
He, R., Kang, W.-C., and McAuley, J. (2017). Translation-based recommendation. In Proc of RecSys, page 161–169, New York, NY, USA. Association for Computing Machinery.
Himdi, H. and Shaalan, K. (2024). Advancing author gender identification in modern standard arabic with innovative deep learning and textual feature techniques. Information, 15(12).
Ikae, C. and Savoy, J. (2022). Gender identification on twitter. Journal of the Association for Information Science and Technology, 73(1):58–69.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Mueller, W., Silva, T. H., Almeida, J. M., and Loureiro, A. A. (2017). Gender matters! analyzing global cultural gender preferences for venues using social sensing. EPJ Data Science, 6(1):5.
Muhammad, R., Zhao, Y., and Liu, F. (2019). Spatiotemporal analysis to observe gender based check-in behavior by using social media big data: A case study of guangzhou, china. Sustainability, 11(10).
O’Connor, K., Golder, S., Weissenbacher, D., Klein, A. Z., Magge, A., and Gonzalez-Hernandez, G. (2024). Methods and annotated data sets used to predict the gender and age of twitter users: Scoping review. Journal of Medical Internet Research, 26:e47923.
Pasricha, R. and McAuley, J. (2018). Translation-based factorization machines for sequential recommendation. In Proc of RecSys, page 63–71, New York, NY, USA. Association for Computing Machinery.
Rogers, D., Preece, A., Innes, M., and Spasić, I. (2022). Real-time text classification of user-generated content on social media: Systematic review. IEEE Transactions on Computational Social Systems, 9(4):1154–1166.
Sanderson, R., Franklin, R., MacKinnon, D., et al. (2024). Left out and invisible?: Exploring social media representation of ‘left behind places’. GeoJournal, 89:37.
Santos, G., Gubert, F., Delgado, M., and Silva, T. (2024). Redes de interesse: comparando o google places e foursquare na captura da escolha de usuários por áreas urbanas. In Proc of CoUrb, pages 99–112, Niterói/RJ. SBC.
Sarwar, R., An Ha, L., Teh, P. S., Sabah, F., Nawaz, R., Hameed, I. A., and Hassan, M. U. (2024). Agi-p: A gender identification framework for authorship analysis using customized fine-tuning of multilingual language model. IEEE Access, 12:15399–15409.
Silva, T. H. and Silver, D. (2025). Using graph neural networks to predict local culture. Environment and Planning B: Urban Analytics and City Science, 52(2):355–376.
Thome, B., Hertweck, F., and Conrad, S. (2025). Predicting perceived text complexity: The role of personrelated features in profile-based models. Journal of Educational Data Mining, 17(1):276–307.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, pages 5998–6008.
Yuan, Y., Wei, G., and Lu, Y. (2018). Evaluating gender representativeness of location-based social media: a case study of weibo. Annals of GIS, 24(3):163–176.
Aletras, N. and Chamberlain, B. P. (2018). Predicting Twitter User Socioeconomic Attributes with Network and Language Information. In Proceedings of the 29th on Hypertext and Social Media, pages 20–24, New York, NY, USA. ACM.
Bernabeu-Bautista, A., Serrano-Estrada, L., Perez-Sanchez, V. R., and Marti, P. (2021). The geography of social media data in urban areas: Representativeness and complementarity. ISPRS International Journal of Geo-Information, 10(11).
Blank, G. and Lutz, C. (2017). Representativeness of social media in great britain: Investigating facebook, linkedin, twitter, pinterest, google+, and instagram. American Behavioral Scientist, 61(7):741–756.
de Souza e Silva, A. (2007). Cell phones and places: The use of mobile technologies in brazil. In Societies and cities in the age of instant access, pages 295–310. Springer.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proc. of NAACL-HLT, volume 1, pages 4171–4186.
Eke, C. I., Norman, A. A., Shuib, L., and Nweke, H. F. (2019). A survey of user profiling: State-of-the-art, challenges, and solutions. IEEE Access, 7:144907–144924.
Gardazi, N. M., Daud, A., Malik, M. K., Bukhari, A., Alsahfi, T., and Alshemaimri, B. (2025). Bert applications in natural language processing: a review. Artificial Intelligence Review, 58(6):166.
Gubert, F. R., Santos, G. H., Delgado, M., Silver, D., and Silva, T. H. (2024). Culture Fingerprint: Identification of Culturally Similar Urban Areas Using Google Places Data. In Proc of ASONAM, Rende, Calabria, Italy.
Gómez, J.-C., Moreno, J., Manzano, M. A. I., and Ojeda, D. L. A. (2023). Reconstructive classification for age and gender identification in social networks. IEEE Trans. on Comput. Social Sys., 11(2):2291–2301.
Hargittai, E. (2020). Potential biases in big data: Omitted voices on social media. Social Science Computer Review, 38(1):10–24.
He, R., Kang, W.-C., and McAuley, J. (2017). Translation-based recommendation. In Proc of RecSys, page 161–169, New York, NY, USA. Association for Computing Machinery.
Himdi, H. and Shaalan, K. (2024). Advancing author gender identification in modern standard arabic with innovative deep learning and textual feature techniques. Information, 15(12).
Ikae, C. and Savoy, J. (2022). Gender identification on twitter. Journal of the Association for Information Science and Technology, 73(1):58–69.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Mueller, W., Silva, T. H., Almeida, J. M., and Loureiro, A. A. (2017). Gender matters! analyzing global cultural gender preferences for venues using social sensing. EPJ Data Science, 6(1):5.
Muhammad, R., Zhao, Y., and Liu, F. (2019). Spatiotemporal analysis to observe gender based check-in behavior by using social media big data: A case study of guangzhou, china. Sustainability, 11(10).
O’Connor, K., Golder, S., Weissenbacher, D., Klein, A. Z., Magge, A., and Gonzalez-Hernandez, G. (2024). Methods and annotated data sets used to predict the gender and age of twitter users: Scoping review. Journal of Medical Internet Research, 26:e47923.
Pasricha, R. and McAuley, J. (2018). Translation-based factorization machines for sequential recommendation. In Proc of RecSys, page 63–71, New York, NY, USA. Association for Computing Machinery.
Rogers, D., Preece, A., Innes, M., and Spasić, I. (2022). Real-time text classification of user-generated content on social media: Systematic review. IEEE Transactions on Computational Social Systems, 9(4):1154–1166.
Sanderson, R., Franklin, R., MacKinnon, D., et al. (2024). Left out and invisible?: Exploring social media representation of ‘left behind places’. GeoJournal, 89:37.
Santos, G., Gubert, F., Delgado, M., and Silva, T. (2024). Redes de interesse: comparando o google places e foursquare na captura da escolha de usuários por áreas urbanas. In Proc of CoUrb, pages 99–112, Niterói/RJ. SBC.
Sarwar, R., An Ha, L., Teh, P. S., Sabah, F., Nawaz, R., Hameed, I. A., and Hassan, M. U. (2024). Agi-p: A gender identification framework for authorship analysis using customized fine-tuning of multilingual language model. IEEE Access, 12:15399–15409.
Silva, T. H. and Silver, D. (2025). Using graph neural networks to predict local culture. Environment and Planning B: Urban Analytics and City Science, 52(2):355–376.
Thome, B., Hertweck, F., and Conrad, S. (2025). Predicting perceived text complexity: The role of personrelated features in profile-based models. Journal of Educational Data Mining, 17(1):276–307.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, pages 5998–6008.
Yuan, Y., Wei, G., and Lu, Y. (2018). Evaluating gender representativeness of location-based social media: a case study of weibo. Annals of GIS, 24(3):163–176.
Publicado
25/05/2026
Como Citar
ABATE, Jemal; PEIXOTO, Felipe; BALD, João; DELGADO, Myriam; SILVA, Thiago H..
A BERT-Based Approach for Gender Inference from Place Reviews: Applications to Urban Representativeness. In: WORKSHOP DE COMPUTAÇÃO URBANA (COURB), 10. , 2026, Praia do Forte/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 1-14.
ISSN 2595-2706.
DOI: https://doi.org/10.5753/courb.2026.24116.
