Algoritmos Genéticos Multi-objetivo para a Seleção de Atributos
Resumo
A ocorrência de atributos irrelevantes e/ou redundantes em Bases de Dados pode prejudicar o desempenho de processos computacionais de extração de conhecimento, o que motiva a aplicação da tarefa de Seleção de Atributos. Os Algoritmos Genéticos Multi-objetivo podem contribuir para identificar subconjuntos de atributos que otimizam combinações entre diferentes medidas ou critérios de importância de atributos, eventualmente conflitantes. Este trabalho apresenta o uso de Algoritmos Genéticos Multi-objetivo para a Seleção de Atributos, investigando o uso de distintas combinações de critérios de importância de atributos em dados rotulados e não-rotulados.Referências
Arauzo-Azofra, A., Benitez, J. M., and Castro, J. L. (2008). Consistency measures for feature selection. Journal of Intelligent Information Systems, 30(3):273–292.
Asuncion, A. and Newman, D. (2007). UCI machine learning repository. [link].
Bleuler, S., Laumanns, M., Thiele, L., and Zitzler, E. (2003). PISA — a platform and programming language independent interface for search algorithms. In Evolutionary Multi-Criterion Optimization, pages 494–508.
Bruzzone, L. and Persello, C. (2009). A novel approach to the selection of spatially invariant features for the classification of hyperspectral images with improved generalization capability. IEEE transactions on geoscience and remote sensing, 47:3180–3191.
Deb, K., Agrawal, S., Pratap, A., and Meyarivan, T. (2000). A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. Technical report, Indian Institute of Technology Kanpur - India.
Filho, I. G. C. (2003). Comparative analysis of clustering methods for gene expression data. Dissertação de mestrado, Universidade Federal de Pernambuco.
Han, J. and Kamber, M. (2006). Data mining: concepts and techniques. Morgan Kaufmann.
He, X., Cai, D., and Niyogi, P. (2005). Laplacian score for feature selection. In Advances in Neural Information Processing Systems, pages 507–514.
Jain, A. K. and Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, Inc., New Jersey, Estados Unidos.
Kruskal, W. and Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. American Statistical Association, 47:583–621.
Lee, H. D., Monard, M. C., and Wu, F. C. (2006). A fractal dimension based filter algorithm to select features for supervised learning. In Ibero-American Conference on Artificial Intelligence - Brazilian Symposium on Artificial Intelligence, pages 278–288.
Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers.
Liu, H. and Motoda, H. (2008). Computational Methods of Feature Selection. Chapman & Hall/CRC.
Mitchell, T. M. (1997). Machine Learning. Hardcover.
Morey, L. C., Blashfield, R. K., and Skinner, H. A. (1983). A comparison of cluster analysis techniques within a sequential validation frame work. Multivariate Behavioral Research, 18:309–329.
Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1:317–328.
Santana, L. E. A., Silva, L., and Canuto, A. M. P. (2009). Feature selection in heterogeneous structure of ensembles: a genetic algorithm approach. In International Joint Conference on Neural Networks, pages 1491–1498.
Spolaôr, N. (2010). Aplicação de algoritmos genéticos multiobjetivo ao problema de seleção de atributos. Dissertação de mestrado, Universidade Federal do ABC.
Spolaôr, N., Lorena, A. C., and Lee, H. D. (2011). Multiobjective genetic algorithm evaluation in feature selection. In Takahashi, R. H. C., Deb, K., Wanner, E. F., and Greco, S., editors, Lecture Notes in Computer Science (Evolutionary Multi-criterion Optimization Proceedings), pages 462–476. Springer-Verlag.
Wang, C.-M. and Huang, Y.-F. (2009). Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data. Expert Systems with Applications, 36(3):5900–5908.
Wilson, D. R. and Martinez, T. R. (1997). Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6:1–34.
Witten, I. H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.
Yan, W. (2007). Fusion in multi-criterion feature ranking. In International Conference on Information Fusion, pages 01–06.
Zaharie, D., Holban, S., Lungeanu, D., and Navolan, D. (2007). A computational intelligence approach for ranking risk factors in preterm birth. In International Symposium on Applied Computational Intelligence and Informatics, pages 135–140.
Zeleny, M. (1973). An introduction to multiobjetive optimization. In Cochrane, J. L. and Zeleny, M., editors, Multiple criteria decision making, pages 262–301. University of South Carolina Press.
Asuncion, A. and Newman, D. (2007). UCI machine learning repository. [link].
Bleuler, S., Laumanns, M., Thiele, L., and Zitzler, E. (2003). PISA — a platform and programming language independent interface for search algorithms. In Evolutionary Multi-Criterion Optimization, pages 494–508.
Bruzzone, L. and Persello, C. (2009). A novel approach to the selection of spatially invariant features for the classification of hyperspectral images with improved generalization capability. IEEE transactions on geoscience and remote sensing, 47:3180–3191.
Deb, K., Agrawal, S., Pratap, A., and Meyarivan, T. (2000). A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. Technical report, Indian Institute of Technology Kanpur - India.
Filho, I. G. C. (2003). Comparative analysis of clustering methods for gene expression data. Dissertação de mestrado, Universidade Federal de Pernambuco.
Han, J. and Kamber, M. (2006). Data mining: concepts and techniques. Morgan Kaufmann.
He, X., Cai, D., and Niyogi, P. (2005). Laplacian score for feature selection. In Advances in Neural Information Processing Systems, pages 507–514.
Jain, A. K. and Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, Inc., New Jersey, Estados Unidos.
Kruskal, W. and Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. American Statistical Association, 47:583–621.
Lee, H. D., Monard, M. C., and Wu, F. C. (2006). A fractal dimension based filter algorithm to select features for supervised learning. In Ibero-American Conference on Artificial Intelligence - Brazilian Symposium on Artificial Intelligence, pages 278–288.
Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers.
Liu, H. and Motoda, H. (2008). Computational Methods of Feature Selection. Chapman & Hall/CRC.
Mitchell, T. M. (1997). Machine Learning. Hardcover.
Morey, L. C., Blashfield, R. K., and Skinner, H. A. (1983). A comparison of cluster analysis techniques within a sequential validation frame work. Multivariate Behavioral Research, 18:309–329.
Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1:317–328.
Santana, L. E. A., Silva, L., and Canuto, A. M. P. (2009). Feature selection in heterogeneous structure of ensembles: a genetic algorithm approach. In International Joint Conference on Neural Networks, pages 1491–1498.
Spolaôr, N. (2010). Aplicação de algoritmos genéticos multiobjetivo ao problema de seleção de atributos. Dissertação de mestrado, Universidade Federal do ABC.
Spolaôr, N., Lorena, A. C., and Lee, H. D. (2011). Multiobjective genetic algorithm evaluation in feature selection. In Takahashi, R. H. C., Deb, K., Wanner, E. F., and Greco, S., editors, Lecture Notes in Computer Science (Evolutionary Multi-criterion Optimization Proceedings), pages 462–476. Springer-Verlag.
Wang, C.-M. and Huang, Y.-F. (2009). Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data. Expert Systems with Applications, 36(3):5900–5908.
Wilson, D. R. and Martinez, T. R. (1997). Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6:1–34.
Witten, I. H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.
Yan, W. (2007). Fusion in multi-criterion feature ranking. In International Conference on Information Fusion, pages 01–06.
Zaharie, D., Holban, S., Lungeanu, D., and Navolan, D. (2007). A computational intelligence approach for ranking risk factors in preterm birth. In International Symposium on Applied Computational Intelligence and Informatics, pages 135–140.
Zeleny, M. (1973). An introduction to multiobjetive optimization. In Cochrane, J. L. and Zeleny, M., editors, Multiple criteria decision making, pages 262–301. University of South Carolina Press.
Publicado
19/07/2011
Como Citar
SPOLAÔR, Newton; LORENA, Ana Carolina; LEE, Huei Diana.
Algoritmos Genéticos Multi-objetivo para a Seleção de Atributos. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 8. , 2011, Natal/RN.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2011
.
p. 938-949.
ISSN 2763-9061.