Análise Comparativa entre Abordagens de Aprendizado de Máquina para Classificação Automática de Currículos de Profissionais de TIC

  • Renato Santos Pereira Instituto Federal do Espírito Santo
  • Hilário Tomaz Alves de Oliveira Instituto Federal do Espirito Santo

Resumo


A triagem de currículos desempenha um papel crucial no recrutamento de talentos nas empresas. Contudo, lidar com um grande volume de currículos pode ser demorado e complexo. Com o objetivo de automatizar essa tarefa, diversos trabalhos têm explorado técnicas de processamento de linguagem natural e algoritmos de aprendizado de máquina. Nesse contexto, este trabalho apresenta uma análise comparativa de diferentes abordagens para a classificação automática de currículos de profissionais de Tecnologia da Informação e Comunicação (TIC). As abordagens investigadas incluem algoritmos tradicionais, modelos baseados em redes neurais profundas e modelos neurais de linguagem pré-treinados. Foram realizados experimentos utilizando um conjunto de 27.405 currículos, distribuídos em oito categorias relacionadas aos profissionais de TIC. Os resultados obtidos revelam que, de maneira geral, os modelos pré-treinados alcançaram os melhores desempenhos, especialmente, o modelo RoBERTa-base, que obteve resultados superiores a 93,00% em todas as medidas de avaliação utilizadas.

Palavras-chave: Triagem de currículos, Recrutamento de talentos, Processamento de linguagem natural, Algoritmos de aprendizado de máquina, Redes Neurais

Referências

Ali, I., Mughal, N., Khan, Z. H., Ahmed, J., and Mujtaba, G. (2022). Resume Classification System using Natural Language Processing and Machine Learning Techniques. Mehran University Research Journal of Engineering and Technology, 41(1):65–79.

Bhatia, V., Rawat, P., Kumar, A., and Shah, R. R. (2019). End-to-End Resume Parsing and Finding Candidates for a Job Description using BERT.

Buttiker, F., Roth, S., Steinacher, T., and Hanne, T. (2021). Comparative analysis of tools for matching work-related skill profiles with cv data and other unstructured data. University of South Florida (USF) M3 Publishing, 5(2021):97.

Cabrera-Diego, L. A., El-Bèze, M., Torres-Moreno, J.-M., and Durette, B. (2019). Ranking résumés automatically using only résumés: A method free of job offers. Expert Systems with Applications, 123:91–107.

Caliskan, A., Ajay, P. P., Charlesworth, T., Wolfe, R., and Banaji, M. R. (2022). Gender bias in word embeddings. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. ACM.

Çelik, D. and Elçi, A. (2013). An ontology-based information extraction approach for résumés. In Pervasive Computing and the Networked World: Joint International Conference, ICPCA/SWS 2012, Istanbul, Turkey, November 28-30, 2012, Revised Selected Papers, pages 165–179. Springer.

Chamberlain, A. (2017). How long does it take to hire? interview duration in 25 countries. Retrieved from Glassdoor. com website: [link].

Deng, L. and Liu, Y. (2018). A joint introduction to natural language processing and to deep learning. Deep learning in natural language processing, pages 1–22.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.

Fareri, S., Melluso, N., Chiarello, F., and Fantoni, G. (2021). Skillner: Mining and mapping soft skills from any text. Expert Systems with Applications, 184:115544.

Gopalakrishna, S. T. and Vijayaraghavan, V. (2019). Automated tool for resume classification using sementic analysis. International Journal of Artificial Intelligence and Applications (IJAIA), 10(1).

Hetzner, E. (2008). A simple method for citation metadata extraction using hidden markov models. In Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’08, page 280–284, New York, NY, USA. Association for Computing Machinery.

Jiechieu, K. F. F. and Tsopze, N. (2021). Skills prediction based on multi-label resume classification using cnn with model predictions explanation. Neural Computing and Applications, 33:5069–5087.

Jorge, T. V. and Costa, E. C. D. (2022). Análise das modalidades de contratações CLT E PJ para os profissionais de Tecnologia da Informação. Revista Interface Tecnológica, 18(2):91–104.

Kumaran, V. S. and Sankar, A. (2013). Towards an automated system for intelligent screening of candidates for recruitment using ontology mapping (EXPERT). International Journal of Metadata, Semantics and Ontologies, 8(1):56.

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.

Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P. S., and He, L. (2022). A survey on text classification: From traditional to deep learning. ACM Trans. Intell. Syst. Technol., 13(2).

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

Najjar, A., Amro, B., and Macedo, M. (2021). An intelligent decision support system for recruitment: resumes screening and applicants ranking. Informatica, 45(4).

Neelima, A. and Mehrotra, S. (2023). A comprehensive review on word embedding techniques. In 2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS), pages 538–543.

Patil, P., Raul, S., Raut, D., and Nagarhalli, T. (2023). Hate speech detection using deep learning and text analysis. In 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS), pages 322–330.

Rajath, V., Fareed, R. T., and Kaganurmath, S. (2021). Resume Classification and Ranking using KNN and Cosine Similarity. IJERTV10IS080057, 10(08).

Ransing, R., Mohan, A., Emberi, N. B., and Mahavarkar, K. (2021). Screening and Ranking Resumes using Stacked Model. In 2021 5th International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT), pages 643–648, Mysuru, India. IEEE.

Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.

Roopesh, N. and Babu, C. N. (2021). Robotic Process Automation for Resume Processing System. In 2021 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT), pages 180–184, Bangalore, India. IEEE.

Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

Satheesh, K., Jahnavi, A., Iswarya, L., Ayesha, K., Bhanusekhar, G., and Hanisha, K. (2020). Resume Ranking based on Job Description using SpaCy NER model.

Silva, W. D. d., Parreiras, F. S., Maia, L. C. G., and Brandão, W. C. (2018). Anotação semântica automática do currículo Lattes utilizando Linked Open Data. Perspectivas em Ciência da Informação, 23(4):53–72.

Silveira, A. C. J. d. and Tonini, A. M. (2021). Análise sobre a regulamentação do profissional do setor de tecnologia da informação e comunicação no Brasil. Revista HISTEDBR On-line, 21:e021022.

Singh, A., Rose, C., Visweswariah, K., Chenthamarakshan, V., and Kambhatla, N. (2010). PROSPECT: A System for Screening Candidates for Recruitment. In Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM ’10, page 659, New York, New York, USA. ACM Press.

Sinha, A. K., Amir Khusru Akhtar, M., and Kumar, A. (2021). Resume Screening Using Natural Language Processing and Machine Learning: A Systematic Review. In Machine Learning and Information Processing: Proceedings of ICMLIP 2020, pages 207–214.

Tosik, M. (2014). Internship report: Sequence labelling using distributional semantic vectors and conditional random fields.

Yu, K., Guan, G., and Zhou, M. (2005). Resume information extraction with cascaded hybrid model. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 499–506.
Publicado
25/09/2023
PEREIRA, Renato Santos; OLIVEIRA, Hilário Tomaz Alves de. Análise Comparativa entre Abordagens de Aprendizado de Máquina para Classificação Automática de Currículos de Profissionais de TIC. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 20. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 359-373. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2023.234140.