An Adaptation of Binary Relevance for Multi-Label Classification applied to Functional Genomics

  • Erica Akemi Tanaka USP
  • José Augusto Baranauskas USP

Resumo


Many classification problems, especially in the field of bioinformatics are associated with more than one class, known as multi-label classification problems. In this study we propose a new adaptation for the Binary Relevance method taking into account the correlation among labels, focusing on the interpretability of the model, not only its performance. The experimental results shown that our proposal has a performance comparable to other methods as the same time it provides an interpretable model from the multi-label problem.

Referências

Alves, R. T., Delgado, M. R., and Freitas, A. A. (2008). Multi-label hierarchical classification of protein functions with artificial immune systems. Advances in Bioinformatics and Computational Biology, pages 1–12.

Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57:289–300.

Blockeel, H., Raedt, L. D., and Ramon, J. (1998). Top-down induction of clustering trees. In Proceedings of the 15th International Conference on Machine Learning, ICML ’98, pages 55–63.

Blockeel, H., Schietgat, L., Struyf, J., Clare, A., and Dzeroski, S. (2006). Hierarchical multilabel classification trees for gene function prediction.

Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984). Classification and Regression Trees. Chapman and Hall/CRC, 1 edition.

Cherman, E. A., Metz, J., and Monard, M. C. (2010). Métodos multirrótulo independentes de algoritmo: um estudo de caso. In Anais da XXXVI Conferencia Latinoamericana de Informática (CLEI), pages 1–14, Asuncion, Paraguay. Publicado em CD-ROM.

Clare, A. and King, R. D. (2001). Knowledge discovery in multi-label phenotype data. Lecture Notes in Computer Science, pages 42–53.

Clark, P. and Niblett, T. (1989). The cn2 induction algorithm. In MACHINE LEARNING, volume 3, pages 261–283.

Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11(1):86–92.

Mewes, H. W., Frishman, D., Mayer, K. F. X., Münsterkötter, M., Noubibou, O., Rattei, T., Oesterheld, M., and Stümpflen, V. (2004). Mips: Analysis and annotation of proteins from whole genomes. Nucleic Acids Res, 32:41–44.

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann. San Francisco, CA.

Ruepp, A., Zollner, A., Maier, D., Albermann, K., Hani, J., Mokrejs, M., Tetko, I., Güldener, U., Mannhaupt, G., Münsterkötter, M., and Mewes, H. W. (2004). The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research, 32(18):5539–5545.

Schietgat, L., Vens, C., Struyf, J., Blockeel, H., Kocev, D., and Dzeroski, S. (2010). Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinformatics, 11(1):2+.

Shen, X., Boutell, M., Luo, J., and Brown, C. (2004). Multi-label Machine Learning and Its Application to Semantic Scene Classification. In Storage and Retrieval Methods and Applications for Multimedia, pages 18–199.

Suzuki, E., Gotoh, M., and Choki, Y. (2001). Bloomy decision tree for multi-objective classification. pages 436–447.

Tsoumakas, G., Katakis, I., and Vlahavas, I. (2010). Mining multi-label data. pages 667–685. Springer.

Tsoumakas, G. and Vlahavas, I. (2007). Random k-labelsets: An ensemble method for multilabel classification. Machine Learning: ECML 2007, 4701:406–417.

Witten, I. H. and Frank, E. (1999). Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, volume 1. Morgan Kaufmann.

Zhang, M. L. (2006). Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization. IEEE Transactions on Knowledge and Data Engineering, 18(10):1338–1351.

Zhang, M. L. and Zhou, Z. H. (2007). Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn., 40(7):2038–2048.
Publicado
16/07/2012
TANAKA, Erica Akemi; BARANAUSKAS, José Augusto. An Adaptation of Binary Relevance for Multi-Label Classification applied to Functional Genomics. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 12. , 2012, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2012 . p. 31-40. ISSN 2763-8952.