A Dataset Enriched with Web-Data for Georeferencing Applications

  • Clovis S. Junior Federal University of Rondonópolis (UFR)
  • Carina F. Dorneles Federal University of Santa Catarina (UFSC)

Abstract


Agricultural and environmental applications largely depend on georeferenced data. Getting this type of data requires high resources related to hardware and specialized human resources. Data extraction can be a viable alternative for creating datasets for this demand. It is possible to find public repositories on the Web to create or complement datasets in the agricultural and environmental domain, whether for delimiting agricultural areas or identifying and monitoring environmental areas. This paper presents a proposal for data extraction from the Web to create a dataset for agricultural and environmental use through geo-coordinates extraction in public repositories.

Keywords: georeferencing, data extraction, environment, agriculture, dataset

References

Azad, S., Wasimi, S., and Ali, A. (2018). Business data enrichment: Issues and challenges. In Business Data Enrichment: Issues and Challenges, pages 98-102.

Azeroual, O. and Jha, M. (2021). Without data quality, there is no data migration. MDPI, 5(2):24.

Bonamigo, A. (2015). Impactos na adequação das áreas de preservação permanente de imóveis rurais ao disposto na lei nº 12.651 e lei nº 4.771 (código florestal).

Cruz, I. F., Ganesh, V. R., and Mirrezaei, S. I. (2013). Semantic extraction of geographic data from web tables for big data integration. In Proceedings of the 7th Workshop on Geographic Information Retrieval, GIR ’13, page 19-26, New York, NY, USA. Association for Computing Machinery.

Dong, X. L., Hajishirzi, H., Lockard, C., and Shiralkar, P. (2020). Multi-modal information extraction from text, semi-structured, and tabular data on the web. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, KDD ’20, page 3543-3544, New York, NY, USA. Association for Computing Machinery.

Gong, D., Wang, D. Z., and Peng, Y. (2017). Multimodal learning for web information extraction. In Proceedings of the 25th ACM International Conference on Multimedia, MM ’17, page 288-296, New York, NY, USA. Association for Computing Machinery.

Imbrenda, V., Calamita, G., Coluzzi, R., D’Emilio, M., Lanfredi, M., Perrone, A., Ragosta, M., and Simoniello, T. (2013). Free and open source software for land degradation vulnerability assessment. None, page 11153.

Jaya, I., Sidi, F., Ishak, I., Affendey, L., and A. Jabar, M. (2017). A review of data quality research in achieving high data quality within organization. Journal of Theoretical and Applied Information Technology, 95:2647-2657.

Lloret-Gazo, J. (2020). A browserless architecture for extracting web prices. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, SAC ’20, page 2193-2200, New York, NY, USA. Association for Computing Machinery.

openforis (2021). Open foris. http://openforis.org/.

SCITEPRESS (2014). Database design of a geo-environmental information system. In Proceedings of the 16th International Conference on Enterprise Information Systems. SCITEPRESS-Science and Technology Publications.

Theoson, L., Anthony, R., and Purnama, J. (2020). Distance-measurement decision-making backend system using nodejs. In Proceedings of the International Conference on Engineering and Information Technology for Sustainable Industry, ICONETSI, New York, NY, USA. Association for Computing Machinery.
Published
2022-09-19
S. JUNIOR, Clovis; DORNELES, Carina F.. A Dataset Enriched with Web-Data for Georeferencing Applications. In: DATASET SHOWCASE WORKSHOP (DSW), 4. , 2022, Búzios. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 57-67. DOI: https://doi.org/10.5753/dsw.2022.226242.