Identifying urban occurrences for cities without open data using web scraping, local news and social media: a case study in Bauru, SP
Resumo
Smart cities, driven by technologies and digital platforms, use data to improve the quality of life for citizens. The lack of data on urban occurrences impairs the identification of problems and the formulation of effective public policies. The city of Bauru, for example, faces challenges in adopting initiatives aimed at data-driven urban management. This work proposes a methodology that uses local information sources, collected through web scraping, to identify urban occurrences. These occurrences are categorized and geolocated, allowing for a spatiotemporal analysis using maps and charts. The methodology supports data-driven decision-making and can be applied to any city that has local information sources. The case study in Bauru points to a large number of occurrences related to infrastructure and crime in the city.Referências
Abernathy, P. M. (2020). Will local news survive? News deserts and ghost newspapers. Technical report, University of North Carolina.
Agonafir, C., Pabon, A. R., Lakhankar, T., Khanbilvardi, R., and Devineni, N. (2022). Understanding New York City street flooding through 311 complaints. Journal of Hydrology, 605:127300.
Anantharam, P., Barnaghi, P., Thirunarayan, K., and Sheth, A. (2015). Extracting city traffic events from social streams. ACM Transactions on Intelligent Systems and Technology, 6(4).
Boeing, G. and Waddell, P. (2017). New insights into rental housing markets across the united states: Web scraping and analyzing craigslist rental listings. Journal of Planning Education and Research, 37(4):457–476.
Bolta, V. and Hassani, M. (2023). Using human mobility patterns to forecast outliers in citizen complaints data. In Proceedings of the 2023 IEEE International Conference on Big Data, BigData 2023, pages 5166–5175.
Bondielli, A., Ducange, P., and Marcelloni, F. (2020). Exploiting categorization of online news for profiling city areas. In Proceedings of the 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems, EAIS 2020, pages 1–8.
Brasil (2011). Lei no 12.527, de 18 de novembro de 2011 – institui a Lei de Acesso à Informação.
D’Andrea, E., Ducange, P., Loffreno, D., Marcelloni, F., and Zaccone, T. (2018). Smart profiling of city areas based on web data. In Proceedings of the 2018 IEEE International Conference on Smart Computing, SMARTCOMP 2018, pages 226–233.
Dias, R. S., Cioni, J. C., Kaiser, I. M., Peixoto, A. S. P., and Manzato, G. G. (2015). Cadastramento de informações urbanas do município de Bauru-SP utilizando sistemas de informação geográfica. In Anais do 8o Congresso de Extensão Universitária da UNESP. Universidade Estadual Paulista.
Dongo, I., Cadinale, Y., Aguilera, A., Martínez, F., Quintero, Y., and Barrios, S. (2021). Web scraping versus twitter api: A comparison for a credibility analysis. In Proceedings of the 22nd International Conference on Information Integration and Web-Based Applications & Services, iiWAS ’20, page 263–273, New York, NY, USA.
Eisenstein, J. (2018). Introduction to Natural Language Processing (NLP). MIT Press.
Eshleman, R. and Yang, H. (2014). “hey 311, come clean my street!”: A spatio-temporal sentiment analysis of twitter data and 311 civil complaints. In Proceedings of the 2014 IEEE 14th International Conference on Big Data and Cloud Computing, BDCloud 2014, pages 477–484.
Gao, S., Janowicz, K., and Couclelis, H. (2017). Extracting urban functional regions from points of interest and human activities on location-based social networks. Transactions in GIS, 21(3):446–467.
Gulinelli, É. L. (2016). O saneamento e as águas de Bauru: uma perspectiva histórica (1896-1940). Master’s thesis, Universidade Estadual Paulista (Unesp).
Harrison, C., Eckman, B., Hamilton, R., Hartswick, P., Kalagnanam, J., Paraszczak, J., and Williams, P. (2010). Foundations for smarter cities. IBM Journal of Research and Development, 54(4):1–16.
He, J., Zhang, W., and Yang, M. (2024). The spatial and temporal characteristics of urban public safety under the residents’ complaints: Evidence from 12345 data in beijing, china. Journal of Urban Management, 13(2):217–231.
Hong, Z., Wang, H., Lyu, W., Wang, H., Liu, Y., Wang, G., He, T., and Zhang, D. (2023). Urban-scale poi updating with crowd intelligence. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, page 4631–4638.
Jiao, Y., Li, C., Yao, Z., Weng, C., Lian, A., and Dong, R. (2024). How can online citizen complaints provide solutions to refine environmental management: A spatio-temporal perspective. Information Processing & Management, 61(2):103611.
Krause, A. B. P. (2020). Intervenções públicas em ocupações irregulares: um estudo de caso sobre a Favela Ferradura na cidade de Bauru. Projectare: Revista de Arquitetura e Urbanismo, 1(10).
Laudon, K. C. and Laudon, J. P. (2022). Management information systems: Managing the digital firm – 17th ed. Pearson Education.
Lee, W., Gross, K. J., Yong, C., Chelmis, C., and Zois, D.-S. (2025). Who reaps the benefits of smart management of neighborhood complaints? Impact of online participatory forums on neighborhood equity. Cities, 158:105716.
Macedo, E. T. d., Salles, M. C. T., Nunes, E. R., Martins, M. d. F., and Ribeiro, R. O. (2018). Problemas urbanos que interferem na (in) sustentabilidade de cidades: um estudo no município de Serra Redonda - PB. Revista Brasileira de Planejamento e Desenvolvimento, 7(3):1–24.
Magagnin, R. C. and da Silva, A. N. R. (2008). Reflexos da dependência do transporte motorizado individual em cidades brasileiras de médio porte: a questão da mobilidade no município de Bauru. Olhares sobre Bauru, 1:159–170.
Mitchell, R. (2018). Web Scraping with Python. O’Reilly Media, Sebastopol, CA.
Osorio-Arjona, J., Horak, J., Svoboda, R., and García-Ruíz, Y. (2021). Social media semantic perceptions on madrid metro system: Using twitter data to link complaints to space. Sustainable Cities and Society, 64:102530.
Páez, A. and Boisjoly, G. (2022). Exploratory Data Analysis, pages 25–64. Springer International Publishing, Cham.
Ramos, F. J. d. C. (2020). Indicadores socioeconômicos locais para a cidade de Bauru: um diagnóstico sob a ótica da competência em informação e midiática. Master’s thesis, Universidade Estadual Paulista (Unesp).
Sta, H. B. (2017). Quality and the efficiency of data in “smart-cities”. Future Generation Computer Systems, 74:409–416.
Taveira, M. F., Mariano, R. d. A., Trinta, P. Q., Bento, M. B. C., and Rocha, C. (2026). Resiliência urbana em grandes eventos: a atuação do COR-Rio no G20. Revista De Administração Pública, 60:e2025–0579.
Tukey, J. W. (1977). Exploratory Data Analysis. Addison Wesley, 1 edition.
United Nations (2025). World urbanization prospects 2025: Summary of results. UN DESA/POP/2025/TR/ NO. 12, Department of Economic and Social Affairs – Population Division, New York.
Vincenzi, A. M. R., Delamaro, M. E., Dias Neto, A. C., Fabbri, S. C. P. F., Jino, M., and Maldonado, J. C. (2018). Automatização de teste de software com ferramentas de software livre. Elsevier.
Agonafir, C., Pabon, A. R., Lakhankar, T., Khanbilvardi, R., and Devineni, N. (2022). Understanding New York City street flooding through 311 complaints. Journal of Hydrology, 605:127300.
Anantharam, P., Barnaghi, P., Thirunarayan, K., and Sheth, A. (2015). Extracting city traffic events from social streams. ACM Transactions on Intelligent Systems and Technology, 6(4).
Boeing, G. and Waddell, P. (2017). New insights into rental housing markets across the united states: Web scraping and analyzing craigslist rental listings. Journal of Planning Education and Research, 37(4):457–476.
Bolta, V. and Hassani, M. (2023). Using human mobility patterns to forecast outliers in citizen complaints data. In Proceedings of the 2023 IEEE International Conference on Big Data, BigData 2023, pages 5166–5175.
Bondielli, A., Ducange, P., and Marcelloni, F. (2020). Exploiting categorization of online news for profiling city areas. In Proceedings of the 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems, EAIS 2020, pages 1–8.
Brasil (2011). Lei no 12.527, de 18 de novembro de 2011 – institui a Lei de Acesso à Informação.
D’Andrea, E., Ducange, P., Loffreno, D., Marcelloni, F., and Zaccone, T. (2018). Smart profiling of city areas based on web data. In Proceedings of the 2018 IEEE International Conference on Smart Computing, SMARTCOMP 2018, pages 226–233.
Dias, R. S., Cioni, J. C., Kaiser, I. M., Peixoto, A. S. P., and Manzato, G. G. (2015). Cadastramento de informações urbanas do município de Bauru-SP utilizando sistemas de informação geográfica. In Anais do 8o Congresso de Extensão Universitária da UNESP. Universidade Estadual Paulista.
Dongo, I., Cadinale, Y., Aguilera, A., Martínez, F., Quintero, Y., and Barrios, S. (2021). Web scraping versus twitter api: A comparison for a credibility analysis. In Proceedings of the 22nd International Conference on Information Integration and Web-Based Applications & Services, iiWAS ’20, page 263–273, New York, NY, USA.
Eisenstein, J. (2018). Introduction to Natural Language Processing (NLP). MIT Press.
Eshleman, R. and Yang, H. (2014). “hey 311, come clean my street!”: A spatio-temporal sentiment analysis of twitter data and 311 civil complaints. In Proceedings of the 2014 IEEE 14th International Conference on Big Data and Cloud Computing, BDCloud 2014, pages 477–484.
Gao, S., Janowicz, K., and Couclelis, H. (2017). Extracting urban functional regions from points of interest and human activities on location-based social networks. Transactions in GIS, 21(3):446–467.
Gulinelli, É. L. (2016). O saneamento e as águas de Bauru: uma perspectiva histórica (1896-1940). Master’s thesis, Universidade Estadual Paulista (Unesp).
Harrison, C., Eckman, B., Hamilton, R., Hartswick, P., Kalagnanam, J., Paraszczak, J., and Williams, P. (2010). Foundations for smarter cities. IBM Journal of Research and Development, 54(4):1–16.
He, J., Zhang, W., and Yang, M. (2024). The spatial and temporal characteristics of urban public safety under the residents’ complaints: Evidence from 12345 data in beijing, china. Journal of Urban Management, 13(2):217–231.
Hong, Z., Wang, H., Lyu, W., Wang, H., Liu, Y., Wang, G., He, T., and Zhang, D. (2023). Urban-scale poi updating with crowd intelligence. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, page 4631–4638.
Jiao, Y., Li, C., Yao, Z., Weng, C., Lian, A., and Dong, R. (2024). How can online citizen complaints provide solutions to refine environmental management: A spatio-temporal perspective. Information Processing & Management, 61(2):103611.
Krause, A. B. P. (2020). Intervenções públicas em ocupações irregulares: um estudo de caso sobre a Favela Ferradura na cidade de Bauru. Projectare: Revista de Arquitetura e Urbanismo, 1(10).
Laudon, K. C. and Laudon, J. P. (2022). Management information systems: Managing the digital firm – 17th ed. Pearson Education.
Lee, W., Gross, K. J., Yong, C., Chelmis, C., and Zois, D.-S. (2025). Who reaps the benefits of smart management of neighborhood complaints? Impact of online participatory forums on neighborhood equity. Cities, 158:105716.
Macedo, E. T. d., Salles, M. C. T., Nunes, E. R., Martins, M. d. F., and Ribeiro, R. O. (2018). Problemas urbanos que interferem na (in) sustentabilidade de cidades: um estudo no município de Serra Redonda - PB. Revista Brasileira de Planejamento e Desenvolvimento, 7(3):1–24.
Magagnin, R. C. and da Silva, A. N. R. (2008). Reflexos da dependência do transporte motorizado individual em cidades brasileiras de médio porte: a questão da mobilidade no município de Bauru. Olhares sobre Bauru, 1:159–170.
Mitchell, R. (2018). Web Scraping with Python. O’Reilly Media, Sebastopol, CA.
Osorio-Arjona, J., Horak, J., Svoboda, R., and García-Ruíz, Y. (2021). Social media semantic perceptions on madrid metro system: Using twitter data to link complaints to space. Sustainable Cities and Society, 64:102530.
Páez, A. and Boisjoly, G. (2022). Exploratory Data Analysis, pages 25–64. Springer International Publishing, Cham.
Ramos, F. J. d. C. (2020). Indicadores socioeconômicos locais para a cidade de Bauru: um diagnóstico sob a ótica da competência em informação e midiática. Master’s thesis, Universidade Estadual Paulista (Unesp).
Sta, H. B. (2017). Quality and the efficiency of data in “smart-cities”. Future Generation Computer Systems, 74:409–416.
Taveira, M. F., Mariano, R. d. A., Trinta, P. Q., Bento, M. B. C., and Rocha, C. (2026). Resiliência urbana em grandes eventos: a atuação do COR-Rio no G20. Revista De Administração Pública, 60:e2025–0579.
Tukey, J. W. (1977). Exploratory Data Analysis. Addison Wesley, 1 edition.
United Nations (2025). World urbanization prospects 2025: Summary of results. UN DESA/POP/2025/TR/ NO. 12, Department of Economic and Social Affairs – Population Division, New York.
Vincenzi, A. M. R., Delamaro, M. E., Dias Neto, A. C., Fabbri, S. C. P. F., Jino, M., and Maldonado, J. C. (2018). Automatização de teste de software com ferramentas de software livre. Elsevier.
Publicado
25/05/2026
Como Citar
FERREIRA, Felipe Augusto; SOUZA, Higor Amario de.
Identifying urban occurrences for cities without open data using web scraping, local news and social media: a case study in Bauru, SP. In: WORKSHOP DE COMPUTAÇÃO URBANA (COURB), 10. , 2026, Praia do Forte/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 127-140.
ISSN 2595-2706.
DOI: https://doi.org/10.5753/courb.2026.23144.
