Threshold Feature Selection PCA

Felipe de Melo Battisti; Tiago Buarque Assunção de Carvalho

doi:10.5753/kdmile.2022.227718

Felipe de Melo Battisti Universidade Federal do Agreste de Pernambuco
Tiago Buarque Assunção de Carvalho Universidade Federal do Agreste de Pernambuco

DOI: https://doi.org/10.5753/kdmile.2022.227718

Resumo

Classification algorithms encounter learning difficulties when data has non-discriminant features. Dimensionality reduction techniques such as PCA are commonly applied. However, PCA has the disadvantage of being an unsupervised method, ignoring relevant class information on data. Therefore, this paper proposes the Threshold Feature Selector (TFS), a new supervised dimensionality reduction method that employs class thresholds to select more relevant features. We also present the Threshold PCA (TPCA), a combination of our supervised technique with standard PCA. During experiments, TFS achieved higher accuracy in 90% of the datasets compared with the original data. The second proposed technique, TPCA, outperformed the standard PCA in accuracy gain in 70% of the datasets.

Palavras-chave: dimensionality reduction, machine learning, principal component analysis, feature selection, classification problems

Referências

Ali, A. K. and Erçelebi, E. Automatic modulation classification using different neural network and pca combinations. Expert Systems with Applications vol. 178, pp. 114931, 2021.

Arun Kumar, R., Vijay Franklin, J., and Koppula, N. A comprehensive survey on metaheuristic algorithm for feature selection techniques. Materials Today: Proceedings, 2022.

Beiranvand, F., Mehrdad, V., and Dowlatshahi, M. B. Unsupervised feature selection for image classification: A bipartite matching-based principal component analysis approach. Knowledge-Based Systems, 2022.

Biagetti, G., Crippa, P., Falaschetti, L., Luzzi, S., and Turchetti, C. Classification of alzheimer’s disease from eeg signal using robust-pca feature extraction. Procedia Computer Science vol. 192, pp. 3114–3122, 2021. Knowledge-Based and Intelligent Information Engineering Systems: Proceedings of the 25th International Conference, 2021.

de Carvalho, T. B. A., Sibaldo, M. A. A., and Tsang, I. R. Principal component analysis for supervised learning: a minimum classification error approach. JOURNAL OF INFORMATION AND DATA MANAGEMENT 8 (2), 2017.

Eustáquio, F. and Nogueira, T. Evaluating the numerical instability in fuzzy clustering validation of high-dimensional data. Theoretical Computer Science vol. 805, pp. 19–36, 2020.

Ganjei, M. A. and Boostani, R. A hybrid feature selection scheme for high-dimensional data. Engineering Applications of Artificial Intelligence vol. 113, pp. 104894, 2022.

Gárate-Escamila, A. K., Hajjam El Hassani, A., and Andrès, E. Classification models for heart disease prediction using feature selection and pca. Informatics in Medicine Unlocked vol. 19, pp. 100330, 2020.

Huang, P., Ye, Q., Zhang, F., Yang, G., Zhu, W., and Yang, Z. Double l2,p-norm based pca for feature extraction. Information Sciences vol. 573, pp. 345–359, 2021.

Jolliffe, I. pp. 10–28. In , Mathematical and Statistical Properties of Population Principal Components. Springer New York, New York, NY, pp. 10–28, 2002.

Jolliffe, I. A 50-year personal journey through time with principal component analysis. Journal of Multivariate Analysis vol. 188, pp. 104820, 2022.

Liang, N., Tuo, Y., Deng, Y., and He, T. Pca-based svm classification for simulated ice floes in front of sluice gates. Polar Science, 2022.

Liu, Y. and Durlofsky, L. J. 3d cnn-pca: A deep-learning-based parameterization for complex geomodels. Computers Geosciences vol. 148, pp. 104676, 2021.

Maćkiewicz, A. and Ratajczak, W. Principal components analysis (pca). Computers Geosciences 19 (3): 303–342, 1993.

Mi, J.-X., Yang, L.-J., Zhou, L.-F., Sun, Y.-R., and Heng, K. Symmetrical feature extraction via novel mirror pca. Neurocomputing vol. 452, pp. 690–704, 2021.

Priyanka and Kumar, D. Feature extraction and selection of kidney ultrasound images using glcm and pca. Procedia Computer Science vol. 167, pp. 1722–1731, 2020.

Song, P., Zhao, C., and Huang, B. Sfnet: A slow feature extraction network for parallel linear and nonlinear dynamic process monitoring. Neurocomputing vol. 488, pp. 359–380, 2022.

Tang, C., Liu, X., Li, M., Wang, P., Chen, J., Wang, L., and Li, W. Robust unsupervised feature selection via dual self-representation and manifold regularization. Knowledge-Based Systems vol. 145, pp. 109–120, 2018.

Tumer, K. and Ghosh, J. Estimating the bayes error rate through classifier combining. In Proceedings of 13th International Conference on Pattern Recognition. Vol. 2. pp. 695–699 vol.2, 1996.

Vinodhini, G. and Chandrasekaran, R. M. Sentiment classification using principal component analysis based neural network model. In International Conference on Information Communication and Embedded Systems (ICICES2014). pp. 1–6, 2014.

Wang, Y., Liu, W., and Liu, X. Explainable ai techniques with application to nba gameplay prediction. Neurocomputing vol. 483, pp. 59–71, 2022.

Weiwei, H. Classification of sport actions using principal component analysis and random forest based on three-dimensional data. Displays vol. 72, pp. 102135, 2022.

Woo, S. and Lee, C. Incremental feature extraction based on decision boundaries. Pattern Recognition vol. 77, pp. 65–74, 2018.

Zhou, R., Gao, W., Ding, D., and Liu, W. Supervised dimensionality reduction technology of generalized discriminant component analysis and its kernelization forms. Pattern Recognition vol. 124, pp. 108450, 2022.

Zhu, T., Cheng, X., Cheng, W., Tian, Z., and Li, Y. Principal component analysis based data collection for sustainable internet of things enabled cyber–physical systems. Microprocessors and Microsystems vol. 88, pp. 104032, 2022.