Use of Bots for Automatic Collection of Public Data using Web Crawling and Web Scraping Techniques

Abstract


The Escola Virtual.Gov (EV.G) receives resources from partner institutions to provide a range of courses that are required by them. In order to promote active transparency, and following the Lei de Acesso a Informação, accountability for the application of these resources needs to be available to ordinary people. From this, EV.G manages the application of resources through its system. In this way, the system is fed manually. In this situation, given the need for EV.G, this paper simplifies the updating process of the Portal em Números, automating the manual feeding activities performed by EV.G today, and publishing the information obtained in the portal data source.

Keywords: Escola Virtual.Gov, Lei de Acesso à Informação, Web Crawling, Web Scraping

References

Arruda, D. G. and Araujo, I. P. (2017). "Contabilidade publica" . Editora Saraiva, São Paulo, SP. D3

D’Haen, J., Van den Poel, D., Thorleuchter, D., et al. (2016). Integrating expert knowledge and multilingual web crawling data in a lead qualification system. Decision Support Systems, 82:69–78. D3

Khalil, S. and Fakir, M. (2017). RCrawler: An R package for parallel web crawling and scraping. SoftwareX, 6:98–106. D3

Omari, A., Shoham, S., and Yahav, E. (2016). Cross-supervised synthesis of webcrawlers. In Proceedings of the 38th International Conference on Software Engineering, pages 368–379, New York, NY, USA. ACM. D3

Santos, M. G. (2018). Portal da transparência da cidade de Bananeiras: uma análise segundo parâmetros da lei de acesso à informação e requisitos de usabilidade. Master’s thesis, Universidade Estadual da Paraíba (UEPB), João Pessoa, PB. D3

Zhao, B. (2017). Web scraping. In Schintler, L. and McNeely, C., editors, Encyclopedia of Big Data. Springer, Cham. D3
Published
2020-06-30
GALDINO, Igor Martins; GALLINDO, Erica de Lima; MOREIRA, Mário W. L.. Use of Bots for Automatic Collection of Public Data using Web Crawling and Web Scraping Techniques. In: LATIN AMERICAN SYMPOSIUM ON DIGITAL GOVERNMENT (LASDIGOV), 8. , 2020, Cuiabá. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 172-179. ISSN 2763-8723. DOI: https://doi.org/10.5753/wcge.2020.11269.