A study on the selection of local training sets for hierarchical classification tasks
Resumo
In hierarchical classification tasks using the local approach, an important decision concerns the selection of training examples to build the local classifiers. To this end, several policies, which take into account the class taxonomy information, have been proposed. However, a study of a comprehensive comparison concerning the performance of these policies is still lacking. This paper presents a comprehensive empirical evaluation of eight different policies using 13 datasets. The results have shown that three of these policies outperformed the other five policies with statistically significant differences.Referências
Ceci, M. and Malerba, D. (2003). Hierarchical classification of HTML documents with WebClass II. In Proceedings of the 25th European conference on IR research, ECIR’03, pages 57–72, Berlin, Heidelberg. Springer-Verlag.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1–30.
Eisner, R., Poulin, B., Szafron, D., Lu, P., and Greiner, R. (2005). Improving protein function prediction using the hierarchical structure of the gene ontology. In In Proc. IEEE CIBCB, pages 1–10.
Fagni, T. and Sebastiani, F. (2007). On the selection of negative examples for hierarchical text categorization. In Proceedings of The 3rd Language Technology Conference, pages 24–28.
Freitas, A. and de Carvalho, A. C. (2007). A tutorial on hierarchical classification with applications in bioinformatics. In Taniar, D., editor, Research and Trends in Data Mining Technologies and Applications, chapter 7, pages 175–208. IGI Global.
Kiritchenko, S., Matwin, S., Nock, R., and Famili, A. (2006). Learning and evaluation in the presence of class hierarchies: Application to text categorization. In Lamontagne, L. and Marchand, M., editors, Advances in Artificial Intelligence, volume 4013 of Lecture Notes in Computer Science, pages 395–406. Springer Berlin / Heidelberg.
Mladenić, D. and Grobelnik, M. (1998). Feature selection for classification based on text hierarchy. In Text and the Web, Conference on Automated Learning and Discovery CONALD-98, pages 1–6.
Schapire, R. E., Singer, Y., and Singhal, A. (1998). Boosting and Rocchio applied to text filtering. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’98, pages 215–223, New York, NY, USA. ACM.
Silla Jr, C. and Freitas, A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 1:1–42.
Singhal, A., Mitra, M., and Buckley, C. (1997). Learning routing queries in a query zone. In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’97, pages 25–32, New York, NY, USA. ACM.
Sun, A., Lim, E., and Ng, W. (2003). Performance measurement framework for hierarchical text classification. Journal of the American Society for Information Science and Technology, 54:1014–1028.
Tsoumakas, G., Katakis, I., and Vlahavas, I. (2010). Mining multi-label data. In Maimon, O. and Rokach, L., editors, Data Mining and Knowledge Discovery Handbook, pages 667–685. Springer US.
Vens, C., Struyf, J., Schietgat, L., Džeroski, S., and Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 73(2):185–214.
Witten, I. H., Frank, E., Holmes, G., and Hall, M. (2011). Data Mining: Practical machine learning tools and techniques, volume 1. Morgan Kaufmann Publishers Inc., San Francisco, California, USA, 3rd edition.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1–30.
Eisner, R., Poulin, B., Szafron, D., Lu, P., and Greiner, R. (2005). Improving protein function prediction using the hierarchical structure of the gene ontology. In In Proc. IEEE CIBCB, pages 1–10.
Fagni, T. and Sebastiani, F. (2007). On the selection of negative examples for hierarchical text categorization. In Proceedings of The 3rd Language Technology Conference, pages 24–28.
Freitas, A. and de Carvalho, A. C. (2007). A tutorial on hierarchical classification with applications in bioinformatics. In Taniar, D., editor, Research and Trends in Data Mining Technologies and Applications, chapter 7, pages 175–208. IGI Global.
Kiritchenko, S., Matwin, S., Nock, R., and Famili, A. (2006). Learning and evaluation in the presence of class hierarchies: Application to text categorization. In Lamontagne, L. and Marchand, M., editors, Advances in Artificial Intelligence, volume 4013 of Lecture Notes in Computer Science, pages 395–406. Springer Berlin / Heidelberg.
Mladenić, D. and Grobelnik, M. (1998). Feature selection for classification based on text hierarchy. In Text and the Web, Conference on Automated Learning and Discovery CONALD-98, pages 1–6.
Schapire, R. E., Singer, Y., and Singhal, A. (1998). Boosting and Rocchio applied to text filtering. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’98, pages 215–223, New York, NY, USA. ACM.
Silla Jr, C. and Freitas, A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 1:1–42.
Singhal, A., Mitra, M., and Buckley, C. (1997). Learning routing queries in a query zone. In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’97, pages 25–32, New York, NY, USA. ACM.
Sun, A., Lim, E., and Ng, W. (2003). Performance measurement framework for hierarchical text classification. Journal of the American Society for Information Science and Technology, 54:1014–1028.
Tsoumakas, G., Katakis, I., and Vlahavas, I. (2010). Mining multi-label data. In Maimon, O. and Rokach, L., editors, Data Mining and Knowledge Discovery Handbook, pages 667–685. Springer US.
Vens, C., Struyf, J., Schietgat, L., Džeroski, S., and Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 73(2):185–214.
Witten, I. H., Frank, E., Holmes, G., and Hall, M. (2011). Data Mining: Practical machine learning tools and techniques, volume 1. Morgan Kaufmann Publishers Inc., San Francisco, California, USA, 3rd edition.
Publicado
19/07/2011
Como Citar
METZ, Jean; FREITAS, Alex A.; MONARD, Maria Carolina; CHERMAN, Everton Alvares.
A study on the selection of local training sets for hierarchical classification tasks. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 8. , 2011, Natal/RN.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2011
.
p. 572-583.
ISSN 2763-9061.