Uma Proposta para Redução do Conjunto de Treinamento Utilizando Aprendizagem Ativa

  • Maicon Brandão UFFS
  • Marcelo Acordi UFFS
  • Guilherme Dal Bianco UFFS

Abstract


Supervised methods are commonly used in numerous tasks, such as classification. However, supervised methods depend on creating a labeled training set to represent the dataset patterns. Identifying informative and representative instances can reduce the labeling cost. In this context, active learning aims to select more informative instances to be labeled to reduce the training set size. This paper aims to propose weights for an active learning algorithm to reduce the number of labeled instances. In other words, our goal is to reduce the impact of class imbalance by using weights for the active learning method. Preliminary experiments demonstrated that it is possible to reduce the labeled set’s size without impacting the method’s effectiveness.

References

Ayodele, T. O. (2010). Types of machine learning algorithms. In New advances in machine learning. IntechOpen.

Bianco, G. D., Duarte, D., and Gonçalves, M. A. (2023). Reducing the user labeling effort in effective high recall tasks by fine-tuning active learning. Journal of Intelligent Information Systems, pages 1-20.

Bilenko, M. and Mooney, R. J. (2003). Adaptive duplicate detection using learnable string similarity measures. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 39-48. ACM.

Cruz, L. A. (2019). Modelo para recuperação de informação em repositórios institucionais utilizando a técnica de sumarização a partir da seleção de atributos do cassiopeia.

Dal Bianco, G. (2014). Redução do esforço do usuário na configuração da deduplicação de grandes bases de dados.

Dal Bianco, G., Galante, R., Heuser, C. A., and Gonçalves, M. A. (2013). Tuning large scale deduplication with reduced effort. pages 1-12.

Dal Bianco, G., Gonçalves, M. A., and Duarte, D. (2018). Bloss: Effective meta-blocking with almost no effort. Information Systems, 75:75-89.

de Magalhães Silva, R. (2012). Aprendizado ativo para ordenação de resultados.

Haykin, S. (1999). Neural networks, a comprehensive foundation, prentice-hall inc. Upper Saddle River, New Jersey, 7458:161-175.

Kee, S., Del Castillo, E., and Runger, G. (2018). Query-by-committee improvement with diversity and density in batch active learning. Information Sciences, 454:401-418.

Lewis, D. D. and Gale, W. A. (1994). A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval.

Lorena, A. C. and de Carvalho, A. C. (2007). Uma introdução às support vector machines. Revista de Informática Teórica e Aplicada, 14(2):43-67.

Sarker, I. H. (2021). Machine learning: Algorithms, real-world applications and research directions. SN computer science, 2(3):160.

Settles, B. (2009). Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison.

Settles, B. (2010). Active learning literature survey. Computer Sciences Technical Report.

Silva, R., Gonçalves, M. A., and Veloso, A. (2011). Rule-based active sampling for learning to rank. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 240-255. Springer.

Silva, R. D. M. (2012). Aprendizado ativo para ordenação de resultados. Instituto de Ciências Exatas.

Zhao, Y., Xu, C., and Cao, Y. (2006). Research on query-by-committee method of active learning and application. In Advanced Data Mining and Applications: Second International Conference, ADMA 2006, Xi'an, China, August 14-16, 2006 Proceedings 2, pages 985-991. Springer.
Published
2023-04-11
BRANDÃO, Maicon; ACORDI, Marcelo; DAL BIANCO, Guilherme. Uma Proposta para Redução do Conjunto de Treinamento Utilizando Aprendizagem Ativa. In: REGIONAL DATABASE SCHOOL (ERBD), 18. , 2023, Palmas/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 41-50. ISSN 2595-413X. DOI: https://doi.org/10.5753/erbd.2023.229494.