A Study on Example Labeling in Multiview Semi-Supervised Learning

  • Ígor Assis Braga USP
  • Edson Takashi Matsubara USP
  • Maria Carolina Monard USP

Abstract


Semi-supervised learning combines labeled and unlabeled data during the training phase. CO-TRAINING is a widely used semi-supervised learning algorithm which can be applied in domains where the training examples can be described by two different views, using a method to combine the two classifiers related to each view during the labeling step. Thus, it is important to make as few errors as possible while labeling examples during the training phase in order to avoid the degradation of the generated model. As CO-TRAINING treats both classifiers in an independent manner, some examples may not be equally labeled by the classifiers. This work proposes another method to combine the decision of both classifiers aiming to delay the labeling of this sort of examples. The method is illustrated using a well known dataset.

References

Balcan, M.-F. e Blum, A. (2006). An augmented PAC model for semi-supervised learning. In Semi-Supervised Learning (Adaptive Computation and Machine Learning), páginas 397–420.

Balcan, M.-F., Blum, A., e Yang, K. (2005). “CO-TRAINING and expansion: Towards bridging theory and practice”. In NIPS ’04: Advances in Neural Information Processing Systems 17, páginas 89–96.

Blum, A. e Mitchell, T. (1998). “Combining labeled and unlabeled data with CO-TRAINING”. In COLT ’98: Proceedings of the 11th Annual Conference on Computational Learning Theory, páginas 92–100.

Fawcett, T. (2004). ROC graphs: Notes and practical considerations for researchers. Relatório técnico, HP Laboratories. [link].

Gupta, S., Kim, J., Grauman, K., e Mooney, R. (2008). “Watch, listen & learn: CO-TRAINING on captioned images and videos”. In ECML/PKDD ’08: Proceedings of the 2008 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, páginas 457–472.

Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159–165.

Matsubara, E. T. (2004). O algoritmo de aprendizado semi-supervisionado CO-TRAINING e sua aplicação na rotulação de documentos. Dissertação de Mestrado, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. [link].

Matsubara, E. T., Monard, M. C., e Prati, R. C. (2006). “On the class distribution labelling step sensitivity of CO-TRAINING”. In IFIP AI ’06: Artificial Intelligence in Theory and Practice, páginas 199–208.

Mitchell, T. M. (1999). “The role of unlabeled data in supervised learning”. In Proceedings of the 6th International Colloquium on Cognitive Science, páginas 1–8.

Muslea, I., Minton, S., e Knoblock, C. (2002). “Active + semi-supervised learning = robust multi-view learning”. In ICML ’02: Proceedings of the 19th International Conference on Machine Learning, páginas 435–432.

Nigam, K. e Ghani, R. (2000). “Analyzing the effectiveness and applicability of CO-TRAINING”. In CIKM ’00: Proceedings of the 9th International Conference on Information and Knowledge Management, páginas 86–93.
Published
2009-07-20
BRAGA, Ígor Assis; MATSUBARA, Edson Takashi; MONARD, Maria Carolina. A Study on Example Labeling in Multiview Semi-Supervised Learning. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 7. , 2009, Bento Gonçalves/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2009 . p. 432-441. ISSN 2763-9061.