Adoption of Feature Selection as Anti-phishing Mechanism: Applicability and Impacts

  • Mateus Barros UFRPE
  • Carlo Silva UFPE
  • Péricles de Miranda UFRPE

Abstract


Phishing websites are fake page that deceive victims by passing on legitimate from banks or companies to obtain personal information without their consent. Although learning algorithms have been widely used for phishing detection, there is no consensus as to what attributes are relevant to a better description of a malicious page. This article presents an experimental study that investigates and analyzes the degree of relevance of attributes in different phishing databases. The results showed that a suitable methodology for the selection of attributes is able to reduce the computational cost of the classification process, and still to reach satisfactory results of accuracy and F1 Score.

Keywords: Anti-phishing, feature selection, Classification

References

Abdelhamid, N., Ayesh, A., and Thabtah, F. (2014). Phishing detection based associative classification data mining. Expert Systems with Applications, 41(13):5948–5959.

Banu, M. N. and Banu, S. M. (2013). A comprehensive study of phishing attacks. International Journal of Computer Science and Information Technologies, 4(6):783–786.

Dash, M. and Liu, H. (1997). Feature selection for classification. Intelligent data analysis, 1(1-4):131–156.

Dua, D. and Graff, C. (2017). UCI machine learning repository.

Fadheel, W., Abusharkh, M., and Abdel-Qader, I. (2017). On feature selection for the prediction of phishing websites. In 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pages 871–876. IEEE.

Fette, I., Sadeh, N., and Tomasic, A. (2007). Learning to detect phishing emails. In Proceedings of the 16th international conference on World Wide Web, pages 649–656. ACM.

Geurts, P., Ernst, D., and Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1):3–42.

Ibrahim, D. R. and Hadi, A. H. (2017). Phishing websites prediction using classification techniques. In 2017 International Conference on New Trends in Computing Sciences (ICTCS), pages 133–137. IEEE.

Jagatic, T. N., Johnson, N. A., Jakobsson, M., and Menczer, F. (2007). Social phishing. Communications of the ACM, 50(10):94–100.

Jolliffe, I. (2011). Principal component analysis. Springer.

Kumar, A. (2018). Phishing website dataset.

Mohammad, R., McCluskey, T., and Thabtah, F. A. (2013). Predicting phishing websites using neural network trained with back-propagation. In Predicting phishing websites using neural network trained with back-propagation. World Congress in Computer Science, Computer Engineering, and Applied Computing.

Mohammad, R. M., Thabtah, F., and McCluskey, L. (2014). Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25(2):443–458.

Mohammad, R. M., Thabtah, F., and McCluskey, L. (2015). Tutorial and critical analysis of phishing websites methods. Computer Science Review, 17:1–24.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830.

Sumathi, R. and Prakash, M. R. V. (2012). Prediction of phishing websites using optimization techniques. International Journal of Modern Engineering Research (IJMER), pages 341–348.

Tan, C. L., Chiew, K. L., et al. (2017). Phishing webpage detection using weighted url tokens for identity keywords retrieval. In 9th International Conference on Robotic, Vision, Signal Processing and Power Applications, pages 133–139. Springer.

Van Der Maaten, L., Postma, E., and Van den Herik, J. (2009). Dimensionality reduction: a comparative. J Mach Learn Res, 10(66-71):13.

Celik, T. (2018). Website phishing dataset.
Published
2019-10-15
BARROS, Mateus; SILVA, Carlo; MIRANDA, Péricles de. Adoption of Feature Selection as Anti-phishing Mechanism: Applicability and Impacts. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 16. , 2019, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 214-225. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2019.9285.