Aggressive and Effective Attribute Selection using Genetic Programming
Abstract
A major challenge in automatic classification is to deal with scenarios of high dimensionality. Several feature selection (FS) strategies have been proposed for dimensionality reduction. However, they potentially perform poorly in face of unbalanced data. In this work, we propose a FS strategy based on Genetic Programming in order to overcome this issue. The proposed strategy aims at combining the feature sets selected by distinct FS metrics in order to obtain a more effective set of most discriminative features. We show that our proposal is able to dramatically reduce the data dimensionality, while achieving a more accurate classification.References
Danziger, S. A., Baronio, R., Ho, L., Hall, L., Salmon, K., Hatfield, G. W., Kaiser, P., and Lathrop, R. H. (2009). Predicting positive p53 cancer rescue regions using most informative positive (mip) active learning. PLoS Comput Biol, 5(9):e1000498.
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 3:1289–1305.
Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA.
Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection (Complex Adaptive Systems). Cambridge, MA, USA.
Lewis, D. D. (1995). Evaluating and optimizing autonomous text classification systems. In Eighteenth Annual, International ACM-SIGIR Conference, pages 264–254.
Mladenic, D. (1998). Machine learning on non-homogeneous, distributed text data. PhD thesis, University of Ljubljana, Faculty of Computer and Information Science.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34:1–47.
Weinbrenner, T. (1997). Genetic programming techniques applied to measurement data. Diploma Thesis.
Zheng, Z., Wu, X., and Srihari, R. (2004). Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter, 6:80–89.
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 3:1289–1305.
Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA.
Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection (Complex Adaptive Systems). Cambridge, MA, USA.
Lewis, D. D. (1995). Evaluating and optimizing autonomous text classification systems. In Eighteenth Annual, International ACM-SIGIR Conference, pages 264–254.
Mladenic, D. (1998). Machine learning on non-homogeneous, distributed text data. PhD thesis, University of Ljubljana, Faculty of Computer and Information Science.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34:1–47.
Weinbrenner, T. (1997). Genetic programming techniques applied to measurement data. Diploma Thesis.
Zheng, Z., Wu, X., and Srihari, R. (2004). Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter, 6:80–89.
Published
2012-07-16
How to Cite
VIEGAS, Felipe; SANDIN, Isac; SALLES, Thiago; ROCHA, Leonardo.
Aggressive and Effective Attribute Selection using Genetic Programming. In: SBC UNDERGRADUATE RESEARCH CONTEST (CTIC-SBC), 31. , 2012, Curitiba/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2012
.
p. 71-80.