FeatGeNN: Improving Model Performance for Tabular Data with Correlation-Based Feature Extraction

Silva, Sammuel Ramos; Silva, Rodrigo

doi:10.1007/978-3-031-45368-7_17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14195))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

226 Accesses

Abstract

Automated Feature Engineering (AutoFE) has become an important task for any machine learning project, as it can help improve model performance and gain more information for statistical analysis. However, most current approaches for AutoFE rely on manual feature creation or use methods that can generate a large number of features, which can be computationally intensive and lead to overfitting. To address these challenges, we propose a novel convolutional method called FeatGeNN that extracts and creates new features using correlation as a pooling function. Unlike traditional pooling functions like max-pooling, correlation-based pooling considers the linear relationship between the features in the data matrix, making it more suitable for tabular data. We evaluate our method on various benchmark datasets and demonstrate that FeatGeNN outperforms existing AutoFE approaches regarding model performance. Our results suggest that correlation-based pooling can be a promising alternative to max-pooling for AutoFE in tabular data applications.

The author would like to acknowledge FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais), CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), UFOP (Universidade Federal de Ouro Preto) and Cloudwalk, Inc, for the financial support which has been instrumental in the successful execution of our research endeavors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Domingos, P.M.: A few useful things to know about machine learning. Commun. ACM 55, 78–87 (2012)
Article Google Scholar
Guo, H., Tang, R., Ye, Y., Li, Z., He, X.: DeepFM: a factorization-machine based neural network for CTR prediction. arXiv (2017). https://doi.org/10.48550/arXiv.1703.04247
Cheng, H., et al.: Wide & deep learning for recommender systems. arXiv (2016). https://doi.org/10.48550/arXiv.1606.07792
Kanter, J.M., Veeramachaneni, K.: Deep feature synthesis: towards automating data science endeavors. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10 (2015)
Google Scholar
Khurana, U., Turaga, D., Samulowitz, H., Parthasrathy, S.: Cognito: automated feature engineering for supervised learning (2016)
Google Scholar
Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E.B., Turaga, D.S.: Learning feature engineering for classification. In: IJCAI (2017)
Google Scholar
Zhu, G., Xu, Z., Guo, X., Yuan, C., Huang, Y.: DIFER: differentiable automated feature engineering. ArXiv abs/2010.08784 (2020)
Google Scholar
Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. Ann. Statist. 35(6), 2769–2794 (2007). https://doi.org/10.1214/009053607000000505
Article MathSciNet MATH Google Scholar
Liu, B., Tang, R., Chen, Y., Yu, J., Guo, H., Zhang, Y.: Feature generation by convolutional neural network for click-through rate prediction. In: The World Wide Web Conference (2019)
Google Scholar
Horn, F., Pack, R.T., Rieger, M.: The autofeat python library for automatic feature engineering and selection. ArXiv abs/1901.07329 (2019)
Google Scholar
Chen, X., et al.: Neural feature search: a neural architecture for automated feature engineering. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 71–80 (2019)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005)
Article Google Scholar
Katoch, S., Chauhan, S.S., Kumar, V.: A review on genetic algorithm: past, present, and future. Multimed. Tools Appl. 80, 8091–8126 (2021)
Article Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2015)
Google Scholar
Iandola, F.N., Moskewicz, M.W., Ashraf, K., Han, S., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)1MB model size. ArXiv abs/1602.07360 (2016)
Google Scholar
Cheng, H.-T., et al.: Wide & deep learning for recommender systems. In: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (2016)
Google Scholar
Kaul, A., Maheshwary, S., Pudi, V.: AutoLearn - automated feature generation and selection. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 217–226 (2017). https://doi.org/10.1109/ICDM.2017.31
Goldberg, D.E., Deb, K.: A comparative analysis of selection schemes used in genetic algorithms. In: Rawlins, G.J.E. (ed.) Foundations of Genetic Algorithms, pp. 69–93. Morgan Kaufmann Publishers Inc., San Francisco (1991)
Google Scholar
Pearson, K.: Note on regression and inheritance in the case of two parents. Proc. R. Soc. Lond. 58, 240–242 (1895)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universidade Federal de Ouro Preto, Ouro Preto, 35402-163, Brazil
Sammuel Ramos Silva & Rodrigo Silva
Cloudwalk, Inc., São Paulo, São Paulo, 05425-070, Brazil
Sammuel Ramos Silva

Authors

Sammuel Ramos Silva
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sammuel Ramos Silva .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Murilo C. Naldi
Centro Universitario da FEI, São Bernardo do Campo, Brazil
Reinaldo A. C. Bianchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Silva, S.R., Silva, R. (2023). FeatGeNN: Improving Model Performance for Tabular Data with Correlation-Based Feature Extraction. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14195. Springer, Cham. https://doi.org/10.1007/978-3-031-45368-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-45368-7_17
Published: 12 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45367-0
Online ISBN: 978-3-031-45368-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

FeatGeNN: Improving Model Performance for Tabular Data with Correlation-Based Feature Extraction