Use of Bots for Automatic Collection of Public Data using Web Crawling and Web Scraping Techniques
Abstract
The Escola Virtual.Gov (EV.G) receives resources from partner institutions to provide a range of courses that are required by them. In order to promote active transparency, and following the Lei de Acesso a Informação, accountability for the application of these resources needs to be available to ordinary people. From this, EV.G manages the application of resources through its system. In this way, the system is fed manually. In this situation, given the need for EV.G, this paper simplifies the updating process of the Portal em Números, automating the manual feeding activities performed by EV.G today, and publishing the information obtained in the portal data source.
References
D’Haen, J., Van den Poel, D., Thorleuchter, D., et al. (2016). Integrating expert knowledge and multilingual web crawling data in a lead qualification system. Decision Support Systems, 82:69–78. D3
Khalil, S. and Fakir, M. (2017). RCrawler: An R package for parallel web crawling and scraping. SoftwareX, 6:98–106. D3
Omari, A., Shoham, S., and Yahav, E. (2016). Cross-supervised synthesis of webcrawlers. In Proceedings of the 38th International Conference on Software Engineering, pages 368–379, New York, NY, USA. ACM. D3
Santos, M. G. (2018). Portal da transparência da cidade de Bananeiras: uma análise segundo parâmetros da lei de acesso à informação e requisitos de usabilidade. Master’s thesis, Universidade Estadual da Paraíba (UEPB), João Pessoa, PB. D3
Zhao, B. (2017). Web scraping. In Schintler, L. and McNeely, C., editors, Encyclopedia of Big Data. Springer, Cham. D3
