Using a labeling function for automatic classification of agribusiness news: A weak supervisory approach
Resumo
O grande volume de notícias geradas na internet têm aumentado o uso de aplicações com aprendizado de máquina. Modelos preditivos precisam de amostras rotuladas em grande quantidade e qualidade para garantir boa acurácia em tarefas de classificação. No entanto, a tarefas de rotular notícias é manual e demanda tempo do especialista de domínio. Neste trabalho, uma função é proposta para rotular notícias do agronegócio. Oscilações das séries de preços da soja no mercado nacional, internacional e cotação do dólar são a entrada para a função de rotulagem. Diferentes paradigmas de aprendizado e representações textuais são usadas na etapa de avaliação. Os modelos de linguagem neural demonstraram melhor desempenho e os resultados indicam que a proposta pode ser uma alternativa para aplicações de tempo real.
Referências
Aggarwal, C. C. and Reddy, C. K. (2014). Data clustering. Algorithms and applications. Chapman&Hall/CRC Data mining and Knowledge Discovery series, Londra.
Anklin, V., Pati, P., Jaume, G., Bozorgtabar, B., Foncubierta-Rodriguez, A., Thiran, J.-P., Sibony, M., Gabrani, M., and Goksel, O. (2021). Learning whole-slide segmentation from inexact and incomplete labels using tissue graphs. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 636-646. Springer.
Boecking, B., Neiswanger, W., Xing, E., and Dubrawski, A. (2020). Interactive weak supervision: Learning useful heuristics for data labeling. arXiv preprint arXiv:2012.06046.
Chatfield, C. and Xing, H. (2019). The Analysis of Time Series: an introduction with R. CRC press.
Chen, L.-M., Xiu, B.-X., and Ding, Z.-Y. (2022). Multiple weak supervision for short text classification. Applied Intelligence, 52(8):9101-9116.
Dai, E., Shu, K., Sun, Y., and Wang, S. (2021). Labeled data generation with inexact supervision. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 218-226.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
García, S., Fernández, A., Luengo, J., and Herrera, F. (2010). Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information sciences, 180(10):2044-2064.
Helmstetter, S. and Paulheim, H. (2021). Collecting a large scale dataset for classifying fake news tweets using weak supervision. Future Internet, 13(5):114.
Lasserre, J. A., Bishop, C. M., and Minka, T. P. (2006). Principled hybrids of generative and discriminative models. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), volume 1, pages 87-94. IEEE.
Lison, P., Hubin, A., Barnes, J., and Touileb, S. (2020). Named entity recognition without labelled data: A weak supervision approach. arXiv preprint arXiv:2004.14723.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Munezero, M., Montero, C. S., Sutinen, E., and Pajunen, J. (2014). Are they different? affect, feeling, emotion, sentiment, and opinion detection in text. IEEE transactions on affective computing, 5(2):101-111.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830.
Ratner, A., Bach, S. H., Ehrenberg, H., Fries, J., Wu, S., and Ré, C. (2020). Snorkel: Rapid training data creation with weak supervision. The VLDB Journal, 29(2):709-730.
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23 (to appear).
Wang, Y., Yang, W., Ma, F., Xu, J., Zhong, B., Deng, Q., and Gao, J. (2020). Weak supervision for fake news detection via reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 516-523.
Zhou, Z.-H. (2018). A brief introduction to weakly supervised learning. National science review, 5(1):44-53.
Zhu, X. J. (2005). Semi-supervised learning literature survey.