Skip to main content

FeatGeNN: Improving Model Performance for Tabular Data with Correlation-Based Feature Extraction

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2023)

Abstract

Automated Feature Engineering (AutoFE) has become an important task for any machine learning project, as it can help improve model performance and gain more information for statistical analysis. However, most current approaches for AutoFE rely on manual feature creation or use methods that can generate a large number of features, which can be computationally intensive and lead to overfitting. To address these challenges, we propose a novel convolutional method called FeatGeNN that extracts and creates new features using correlation as a pooling function. Unlike traditional pooling functions like max-pooling, correlation-based pooling considers the linear relationship between the features in the data matrix, making it more suitable for tabular data. We evaluate our method on various benchmark datasets and demonstrate that FeatGeNN outperforms existing AutoFE approaches regarding model performance. Our results suggest that correlation-based pooling can be a promising alternative to max-pooling for AutoFE in tabular data applications.

The author would like to acknowledge FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais), CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), UFOP (Universidade Federal de Ouro Preto) and Cloudwalk, Inc, for the financial support which has been instrumental in the successful execution of our research endeavors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Domingos, P.M.: A few useful things to know about machine learning. Commun. ACM 55, 78–87 (2012)

    Article  Google Scholar 

  2. Guo, H., Tang, R., Ye, Y., Li, Z., He, X.: DeepFM: a factorization-machine based neural network for CTR prediction. arXiv (2017). https://doi.org/10.48550/arXiv.1703.04247

  3. Cheng, H., et al.: Wide & deep learning for recommender systems. arXiv (2016). https://doi.org/10.48550/arXiv.1606.07792

  4. Kanter, J.M., Veeramachaneni, K.: Deep feature synthesis: towards automating data science endeavors. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10 (2015)

    Google Scholar 

  5. Khurana, U., Turaga, D., Samulowitz, H., Parthasrathy, S.: Cognito: automated feature engineering for supervised learning (2016)

    Google Scholar 

  6. Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E.B., Turaga, D.S.: Learning feature engineering for classification. In: IJCAI (2017)

    Google Scholar 

  7. Zhu, G., Xu, Z., Guo, X., Yuan, C., Huang, Y.: DIFER: differentiable automated feature engineering. ArXiv abs/2010.08784 (2020)

    Google Scholar 

  8. Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. Ann. Statist. 35(6), 2769–2794 (2007). https://doi.org/10.1214/009053607000000505

    Article  MathSciNet  MATH  Google Scholar 

  9. Liu, B., Tang, R., Chen, Y., Yu, J., Guo, H., Zhang, Y.: Feature generation by convolutional neural network for click-through rate prediction. In: The World Wide Web Conference (2019)

    Google Scholar 

  10. Horn, F., Pack, R.T., Rieger, M.: The autofeat python library for automatic feature engineering and selection. ArXiv abs/1901.07329 (2019)

    Google Scholar 

  11. Chen, X., et al.: Neural feature search: a neural architecture for automated feature engineering. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 71–80 (2019)

    Google Scholar 

  12. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005)

    Article  Google Scholar 

  13. Katoch, S., Chauhan, S.S., Kumar, V.: A review on genetic algorithm: past, present, and future. Multimed. Tools Appl. 80, 8091–8126 (2021)

    Article  Google Scholar 

  14. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2015)

    Google Scholar 

  15. Iandola, F.N., Moskewicz, M.W., Ashraf, K., Han, S., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)1MB model size. ArXiv abs/1602.07360 (2016)

    Google Scholar 

  16. Cheng, H.-T., et al.: Wide & deep learning for recommender systems. In: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (2016)

    Google Scholar 

  17. Kaul, A., Maheshwary, S., Pudi, V.: AutoLearn - automated feature generation and selection. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 217–226 (2017). https://doi.org/10.1109/ICDM.2017.31

  18. Goldberg, D.E., Deb, K.: A comparative analysis of selection schemes used in genetic algorithms. In: Rawlins, G.J.E. (ed.) Foundations of Genetic Algorithms, pp. 69–93. Morgan Kaufmann Publishers Inc., San Francisco (1991)

    Google Scholar 

  19. Pearson, K.: Note on regression and inheritance in the case of two parents. Proc. R. Soc. Lond. 58, 240–242 (1895)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sammuel Ramos Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Silva, S.R., Silva, R. (2023). FeatGeNN: Improving Model Performance for Tabular Data with Correlation-Based Feature Extraction. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14195. Springer, Cham. https://doi.org/10.1007/978-3-031-45368-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45368-7_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45367-0

  • Online ISBN: 978-3-031-45368-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics