Integração e Rotulação Automatizada de Dados sobre o Cnidário Physalia physalis, usando a Geolocalização como Referência

  • Lisiane Reips Universidade Federal do Paraná
  • Carmem Satie Hara Universidade Federal do Paraná

Resumo


Classification techniques in machine learning models have been effectively applied to text and image recognition. But for any and every application, data need to be trained and tested. In order to achieve good performance in the classification process, these data need to be reliably labeled, which makes the process expensive and time-consuming. In this paper, we propose an approach to reduce the cost of manual labeling a database composed of Portuguese man of war (Physalia physalis) sightings on Brazilian beaches. The technique is based on integrating Instagram posts with newspaper articles based on their temporal and spatial proximity. The ultimate goal is to use these labeled data for training a classification technique in the machine learning process.

Palavras-chave: caravelas-portuguesas, integração de dados, rotulação de dados, redes sociais, geolocalização

Referências

Abhari, S., Rostam Niakan Kalhori, S., Ebrahimi, M., Hasannejadasl, H., and Garavand, A. (2019). Artificial intelligence applications in type 2 diabetes mellitus care: Focus on machine learning methods. Healthcare Informatics Research, 25:248–261.

Bach, S. H., Rodriguez, D., Liu, Y., Luo, C., Shao, H., Xia, C., Sen, S., Ratner, A., Hancock, B., Alborzi, H., et al. (2019). Snorkel drybell: A case study in deploying weak supervision at industrial scale. In Proceedings of the 2019 International Conference on Management of Data, pages 362–375.

Bochner, R. and Struchiner, C. J. (2002). Acidentes por animais peçonhentos e sistemas nacionais de informação. Cadernos de Saúde Pública, 18:735–746.

Daume, S. (2016). Mining twitter to monitor invasive alien species - an analytical framework and sample information topologies. Ecological Informatics, 31:70–82.

Kulkarni, R. and Di Minin, E. (2021). Automated retrieval of information on threatened species from online sources using machine learning. Methods in Ecology and Evolution, 12(7):1226–1239.

Ratner, A., Bach, S. H., Ehrenberg, H., Fries, J., Wu, S., and Ré, C. (2017). Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, 11(3):269.

Tang, C., Yuan, G., and Zheng, T. (2021). Weakly supervised learning creates a fusion of modeling cultures. Observational Studies, 7(1):203–211.

Varma, P. and Ré, C. (2018). Snuba: Automating weak supervision to label training data. Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, 12(3):223.
Publicado
19/09/2022
REIPS, Lisiane; HARA, Carmem Satie. Integração e Rotulação Automatizada de Dados sobre o Cnidário Physalia physalis, usando a Geolocalização como Referência. In: WORKSHOP DE TESES E DISSERTAÇÕES (WTDBD) - SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 37. , 2022, Búzios. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 105-111. DOI: https://doi.org/10.5753/sbbd_estendido.2022.21851.