Interpretable Components Using Genetic Programming Employing Instruction-like Structure

  • Arthur Hiratsuka Rezende USP
  • Thiago Ambiel USP
  • Rafael Souza e Silva USP
  • André C. P. L. F. de Carvalho USP

Resumo


This paper introduces a novel feature extraction method, IGP, that generates components through both linear and non-linear combinations of features using Genetic Programming (GP). Unlike traditional GP approaches that rely on expression trees, IGP utilizes an instruction line structure. The study evaluates IGP’s performance against 5 established feature extraction methods across 23 datasets, encompassing binary and multiclass classification tasks. The results demonstrate that IGP excels in several instances, particularly in binary classification, with further analysis exploring how the relationship between the number of classes, features, and instances contributes to its performance. Additionally, the scope for future investigations of IGP are commented.
Palavras-chave: Genetic Programming, Interpretable Components, Dimensionality Reduction

Referências

Cooper, K. D. and Torczon, L. (2023). Chapter 11 - instruction selection. In Cooper, K. D. and Torczon, L., editors, Engineering a Compiler (Third Edition), pages 575–616. Morgan Kaufmann, Philadelphia, third edition edition.

Espejo, P. G., Ventura, S., and Herrera, F. (2010). A survey on the application of genetic programming to classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(2):121–144.

Farasat, A., Menhaj, M. B., Mansouri, T., and Moghadam, M. R. S. (2010). Aro: A new model-free optimization algorithm inspired from asexual reproduction. Applied Soft Computing, 10(4):1284–1292. Optimisation Methods Applications in Decision-Making Processes.

Gambella, C., Ghaddar, B., and Naoum-Sawaya, J. (2021). Optimization problems for machine learning. European Journal of Operational Research, 290(3):807–828.

Ingwer Borg, P. J. F. G. (2005). The Four Purposes of Multidimensional Scaling. Springer New York, New York, NY.

Jolliffe, I. (2011). Principal component analysis in international encyclopedia of statistical science. Berlin, Heidelberg: Springer Berlin Heidelberg, pages 1094–1096.

Khanteymoori, A., Alamdar, F., and Ghorbani, F. (2021). Arp: asexual reproduction programming. Connection Science, 33(2):256–277.

Kishore, J. et al. (2000). Application of genetic programming for multicategory pattern classification. IEEE Transactions on Evolutionary Computation, 4(3):242–258.

Kramer, M. A. (1991). Nonlinear principal component analysis using autoassociative neural networks. AIChE journal, 37(2):233–243.

Lensen, A. et al. (2019). Can genetic programming do manifold learning too? In Sekanina, L., Hu, T., Lourenço, N., Richter, H., and García-Sánchez, P., editors, Genetic Programming, pages 114–130, Cham. Springer International Publishing.

Lensen, A., Zhang, M., and Xue, B. (2020). Multi-objective genetic programming for manifold learning: balancing quality and dimensionality. Genetic Programming and Evolvable Machines, 21(3):399–431.

Ma, J. and Gao, X. (2020). Designing genetic programming classifiers with feature selection and feature construction. Applied Soft Computing, 97:106826.

Ma, J., Gao, X., and Li, Y. (2023). Multi-generation multi-criteria feature construction using genetic programming. Swarm and Evolutionary Computation, 78:101285.

Ma, J. and Teng, G. (2019). A hybrid multiple feature construction approach for classification using genetic programming. Applied Soft Computing, 80:687–699.

Meng, W. et al. (2024). Ensemble classifiers using multi-objective genetic programming for unbalanced data. Applied Soft Computing, 158:111554.

Mitchell, R. J. (”1991”). ”Expression Trees”, pages ”219–231”. ”Macmillan Education UK”, ”London”.

Muni, D., Pal, N., and Das, J. (2004). A novel approach to design classifiers using genetic programming. IEEE Transactions on Evolutionary Computation, 8(2):183–196.

Pedregosa, F. et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326.

Shafique, K., Khawaja, B. A., Sabir, F., Qazi, S., and Mustaqim, M. (2020). Internet of things (iot) for next-generation smart systems: A review of current challenges, future trends and prospects for emerging 5g-iot scenarios. IEEE Access, 8:23022–23040.

Van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-sne. Journal of machine learning research, 9(11).

Wang, P. et al. (2014). Multiobjective genetic programming for maximizing roc performance. Neurocomputing, 125:102–118. Advances in Neural Network Research and Applications Advances in Bio-Inspired Computing: Techniques and Applications.

Zhan, Z.-H. et al. (2022). A survey on evolutionary computation for complex continuous optimization. Artificial Intelligence Review, 55(1):59–110.

Zhang, M. and Smart, W. (2006). Using gaussian distribution to construct fitness functions in genetic programming for multiclass object classification. Pattern Recognition Letters, 27(11):1266–1274. Evolutionary Computer Vision and Image Understanding.

Zhang, Y., Tiňo, P., Leonardis, A., and Tang, K. (2021). A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, 5(5):726–742.
Publicado
17/11/2024
REZENDE, Arthur Hiratsuka; AMBIEL, Thiago; SOUZA E SILVA, Rafael; CARVALHO, André C. P. L. F. de. Interpretable Components Using Genetic Programming Employing Instruction-like Structure. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 21. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 565-576. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2024.244561.

Artigos mais lidos do(s) mesmo(s) autor(es)