Combining active learning and graph-based semi-supervised learning

  • Jhonatan Candao Universidade Federal de São Paulo
  • Lilian Berton Universidade Federal de São Paulo


The scarcity of labeled data is a common problem in many applications. Semi-supervised learning (SSL) aims to minimize the need for human annotation combining a small set of label data with a huge amount of unlabeled data. Similarly to SSL, Active Learning (AL) reduces the annotation efforts selecting the most informative points for annotation. Few works explore AL and graph-based SSL, in this work, we combine both strategies and explore different techniques: two graph-based SSL and two query strategy of AL in a pool-based scenario. Experimental results in artificial and real datasets indicate that our approach requires significantly less labeled instances to reach the same performance of random label selection.

Palavras-chave: Machine learning, Semi-supervised learning, Active learning, Label propagation


Berton, L., de Andrade Lopes, A., and Vega-Oliveros, D. A. (2018). A comparison of graph construction methods for semi-supervised learning. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8.

Berton, L., de Paulo Faleiros, T., Valejo, A., Valverde-Rebaza, J., and de Andrade Lopes, A. (2017). Rgcli: Robust graph that considers labeled instances for semi-supervised learning. Neurocomputing, 226:238 – 248.

Berton, L., Valverde-Rebaza, J., and de Andrade Lopes, A. (2015). Link prediction in graph construction for supervised and semi-supervised learning. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1–8.

Calma, A., Reitmaier, T., and Sick, B. (2018). Semi-supervised active learning for support vector machines: A novel approach that exploits structure information in data. Information Sciences, 456:13–33.

Chapelle, O., Schlkopf, B., and Zien, A. (2010). Semi-Supervised Learning. The MIT Press, 1st edition.

Chellapriyadharshini, M., Toffy, A., Raghavan K. M., S., and Ramasubramanian, V. (2018). Semi-supervised and active-learning scenarios: Efficient acoustic model refinement for a low resource indian language. pages 1041–1045.

Chen, X. and Wang, T. (2017). Combining active learning and semi-supervised learning by using selective label spreading. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pages 850–857.

Han, W., Coutinho, E., Ruan, H., Li, H., Schuller, B., Yu, X., and Zhu, X. (2016). Semisupervised active learning for sound classification in hybrid learning environments. PLOS ONE, 11(9):1–23.

Lim, T.-S., Loh, W.-Y., and Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40(3):203–228.

Mitchell, T. M. (1997). Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition.

Riccardi, G. and Hakkani-Tur, D. (2005). Active learning: theory and applications to automatic speech recognition. IEEE Transactions on Speech and Audio Processing, 13(4):504–511.

Settles, B. (2010). Active learning literature survey. Technical report, University of Wisconsin–Madison.

Tomanek, K. and Hahn, U. (2009). Semi-supervised active learning for sequence labeling. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, ACL ’09, pages 1039–1047, Stroudsburg, PA, USA. Association for Computational Linguistics.

Vega-Oliveros, D. A., Berton, L., Eberle, A. M., de Andrade Lopes, A., and Zhao, L. (2014). Regular graph construction for semi-supervised learning. Journal of Physics: Conference Series, 490:012022.

Wang, M. and Hua, X.-S. (2011). Active learning in multimedia annotation and retrieval: A survey. ACM Trans. Intell. Syst. Technol., 2(2):10:1–10:21.

Zhang, Y., Coutinho, E., Zhang, Z., Quan, C., and Schuller, B. (2015). Dynamic active learning based on agreement and applied to emotion recognition in spoken interactions. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI ’15, pages 275–278, New York, NY, USA. ACM.

Zhou, D., Bousquet, O., Lal, T. N., Weston, J., and Schölkopf, B. (2003). Learning with local and global consistency. In Proceedings of the 16th International Conference on Neural Information Processing Systems, NIPS’03, pages 321–328, Cambridge, MA, USA. MIT Press.

Zhu, X. (2005). Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison.

Zhu, X. and Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University.

Zhu, X., Lafferty, J., and Ghahramani, Z. (2003). Combining active learning and semisupervised learning using gaussian fields and harmonic functions. In ICML 2003 workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pages 58–65.
CANDAO, Jhonatan; BERTON, Lilian. Combining active learning and graph-based semi-supervised learning. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 16. , 2019, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 694-704. ISSN 2763-9061. DOI: