Integração e Rotulação Automatizada de Dados sobre o Cnidário Physalia physalis, usando a Geolocalização como Referência
Resumo
Classification techniques in machine learning models have been effectively applied to text and image recognition. But for any and every application, data need to be trained and tested. In order to achieve good performance in the classification process, these data need to be reliably labeled, which makes the process expensive and time-consuming. In this paper, we propose an approach to reduce the cost of manual labeling a database composed of Portuguese man of war (Physalia physalis) sightings on Brazilian beaches. The technique is based on integrating Instagram posts with newspaper articles based on their temporal and spatial proximity. The ultimate goal is to use these labeled data for training a classification technique in the machine learning process.
Referências
Bach, S. H., Rodriguez, D., Liu, Y., Luo, C., Shao, H., Xia, C., Sen, S., Ratner, A., Hancock, B., Alborzi, H., et al. (2019). Snorkel drybell: A case study in deploying weak supervision at industrial scale. In Proceedings of the 2019 International Conference on Management of Data, pages 362–375.
Bochner, R. and Struchiner, C. J. (2002). Acidentes por animais peçonhentos e sistemas nacionais de informação. Cadernos de Saúde Pública, 18:735–746.
Daume, S. (2016). Mining twitter to monitor invasive alien species - an analytical framework and sample information topologies. Ecological Informatics, 31:70–82.
Kulkarni, R. and Di Minin, E. (2021). Automated retrieval of information on threatened species from online sources using machine learning. Methods in Ecology and Evolution, 12(7):1226–1239.
Ratner, A., Bach, S. H., Ehrenberg, H., Fries, J., Wu, S., and Ré, C. (2017). Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, 11(3):269.
Tang, C., Yuan, G., and Zheng, T. (2021). Weakly supervised learning creates a fusion of modeling cultures. Observational Studies, 7(1):203–211.
Varma, P. and Ré, C. (2018). Snuba: Automating weak supervision to label training data. Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, 12(3):223.