Análise Comparativa entre Abordagens de Aprendizado de Máquina para Classificação Automática de Currículos de Profissionais de TIC
Abstract
Resume Screening plays a crucial role in recruiting talent in companies. However, dealing with a large volume of resumes can be time-consuming and complex. In order to automate this task, several works have explored natural language processing techniques and machine learning algorithms. In this context, this paper presents a comparative analysis of different approaches for automatically classifying professional ICT resumes. The investigated approaches include traditional algorithms, models based on deep neural networks, and pre-trained language neural models. Experiments were conducted using a set of 27,405 CVs, divided into eight categories related to ICT professionals. The results show that, in general, the pre-trained models achieved the best performances, especially the RoBERTa-base model, which obtained a performance superior to 93.00% in all the evaluation measures used.
References
Bhatia, V., Rawat, P., Kumar, A., and Shah, R. R. (2019). End-to-End Resume Parsing and Finding Candidates for a Job Description using BERT.
Buttiker, F., Roth, S., Steinacher, T., and Hanne, T. (2021). Comparative analysis of tools for matching work-related skill profiles with cv data and other unstructured data. University of South Florida (USF) M3 Publishing, 5(2021):97.
Cabrera-Diego, L. A., El-Bèze, M., Torres-Moreno, J.-M., and Durette, B. (2019). Ranking résumés automatically using only résumés: A method free of job offers. Expert Systems with Applications, 123:91–107.
Caliskan, A., Ajay, P. P., Charlesworth, T., Wolfe, R., and Banaji, M. R. (2022). Gender bias in word embeddings. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. ACM.
Çelik, D. and Elçi, A. (2013). An ontology-based information extraction approach for résumés. In Pervasive Computing and the Networked World: Joint International Conference, ICPCA/SWS 2012, Istanbul, Turkey, November 28-30, 2012, Revised Selected Papers, pages 165–179. Springer.
Chamberlain, A. (2017). How long does it take to hire? interview duration in 25 countries. Retrieved from Glassdoor. com website: [link].
Deng, L. and Liu, Y. (2018). A joint introduction to natural language processing and to deep learning. Deep learning in natural language processing, pages 1–22.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.
Fareri, S., Melluso, N., Chiarello, F., and Fantoni, G. (2021). Skillner: Mining and mapping soft skills from any text. Expert Systems with Applications, 184:115544.
Gopalakrishna, S. T. and Vijayaraghavan, V. (2019). Automated tool for resume classification using sementic analysis. International Journal of Artificial Intelligence and Applications (IJAIA), 10(1).
Hetzner, E. (2008). A simple method for citation metadata extraction using hidden markov models. In Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’08, page 280–284, New York, NY, USA. Association for Computing Machinery.
Jiechieu, K. F. F. and Tsopze, N. (2021). Skills prediction based on multi-label resume classification using cnn with model predictions explanation. Neural Computing and Applications, 33:5069–5087.
Jorge, T. V. and Costa, E. C. D. (2022). Análise das modalidades de contratações CLT E PJ para os profissionais de Tecnologia da Informação. Revista Interface Tecnológica, 18(2):91–104.
Kumaran, V. S. and Sankar, A. (2013). Towards an automated system for intelligent screening of candidates for recruitment using ontology mapping (EXPERT). International Journal of Metadata, Semantics and Ontologies, 8(1):56.
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.
Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P. S., and He, L. (2022). A survey on text classification: From traditional to deep learning. ACM Trans. Intell. Syst. Technol., 13(2).
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Najjar, A., Amro, B., and Macedo, M. (2021). An intelligent decision support system for recruitment: resumes screening and applicants ranking. Informatica, 45(4).
Neelima, A. and Mehrotra, S. (2023). A comprehensive review on word embedding techniques. In 2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS), pages 538–543.
Patil, P., Raul, S., Raut, D., and Nagarhalli, T. (2023). Hate speech detection using deep learning and text analysis. In 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS), pages 322–330.
Rajath, V., Fareed, R. T., and Kaganurmath, S. (2021). Resume Classification and Ranking using KNN and Cosine Similarity. IJERTV10IS080057, 10(08).
Ransing, R., Mohan, A., Emberi, N. B., and Mahavarkar, K. (2021). Screening and Ranking Resumes using Stacked Model. In 2021 5th International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT), pages 643–648, Mysuru, India. IEEE.
Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
Roopesh, N. and Babu, C. N. (2021). Robotic Process Automation for Resume Processing System. In 2021 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT), pages 180–184, Bangalore, India. IEEE.
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
Satheesh, K., Jahnavi, A., Iswarya, L., Ayesha, K., Bhanusekhar, G., and Hanisha, K. (2020). Resume Ranking based on Job Description using SpaCy NER model.
Silva, W. D. d., Parreiras, F. S., Maia, L. C. G., and Brandão, W. C. (2018). Anotação semântica automática do currículo Lattes utilizando Linked Open Data. Perspectivas em Ciência da Informação, 23(4):53–72.
Silveira, A. C. J. d. and Tonini, A. M. (2021). Análise sobre a regulamentação do profissional do setor de tecnologia da informação e comunicação no Brasil. Revista HISTEDBR On-line, 21:e021022.
Singh, A., Rose, C., Visweswariah, K., Chenthamarakshan, V., and Kambhatla, N. (2010). PROSPECT: A System for Screening Candidates for Recruitment. In Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM ’10, page 659, New York, New York, USA. ACM Press.
Sinha, A. K., Amir Khusru Akhtar, M., and Kumar, A. (2021). Resume Screening Using Natural Language Processing and Machine Learning: A Systematic Review. In Machine Learning and Information Processing: Proceedings of ICMLIP 2020, pages 207–214.
Tosik, M. (2014). Internship report: Sequence labelling using distributional semantic vectors and conditional random fields.
Yu, K., Guan, G., and Zhou, M. (2005). Resume information extraction with cascaded hybrid model. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 499–506.
