CROSSAGE: A cross-attentional graph and Transformer architecture for skill and knowledge recognition in job descriptions

Antônio dos Santos Ramos Neto; João Paulo Felix; Wylliams Santos; Byron Leite Dantas Bezerra; Cleyton Mário de Oliveira Rodrigues

doi:10.5753/stil.2025.37838

Antônio dos Santos Ramos Neto UPE
João Paulo Felix UPE
Wylliams Santos UPE
Byron Leite Dantas Bezerra UPE
Cleyton Mário de Oliveira Rodrigues UPE

DOI: https://doi.org/10.5753/stil.2025.37838

Abstract

Automatically extracting skills and knowledge from job descriptions supports recruitment, reskilling, and labor market analysis, yet traditional NER models struggle with ambiguous and syntactically complex spans. This work proposes CROSSAGE, a lightweight hybrid architecture that combines contextual embeddings from Transformers with structural features from dependency graphs via cross-attention. Results on the SKILLSPAN dataset show that CROSSAGE with JobSpanBERT achieves the highest F1 for SKILL entities (49.8), while CROSSAGE (BERT) matched the best baseline for KNOWLEDGE (64.1) and improves recall (68.8). Gains are especially notable in complex domains like house, where CROSSAGE reaches 51.5 F1 for SKILL. These findings highlight CROSSAGE’s potential as an effective alternative to heavier hybrid models.

References

Abbas, F., Zhang, F., Ismail, M., Khan, G., Iqbal, J., Alrefaei, A., and Albeshr, M. (2023). Optimizing machine learning algorithms for landslide susceptibility mapping along the karakoram highway, gilgit baltistan, pakistan: a comparative study of baseline, bayesian, and metaheuristic hyperparameter optimization techniques. Sensors, 23:6843.

Bajestani, S., Khalilzadeh, M., Azarnoosh, M., and Kobravi, H. (2024). Transentgat: a sentiment-based lexical psycholinguistic graph attention network for personality prediction. Ieee Access, 12:59630–59642.

Carbonell, M., Riba, P., Villegas, M., Fornés, A., and Lladós, J. (2021). Named entity recognition and relation extraction with graph neural networks in semi structured documents. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 9622–9627. IEEE.

Clavié, B. and Soulié, G. (2023). Large language models as batteries-included zero-shot esco skills matchers. arXiv preprint arXiv:2307.03539.

Decorte, J.-J., Van Hautte, J., Demeester, T., and Develder, C. (2021). Jobbert: Understanding job titles through skills. arXiv preprint arXiv:2109.09605.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186.

Dong, X., Chowdhury, S., Qian, L., Li, X., Guan, Y., Yang, J., and Yu, Q. (2019). Deep learning for named entity recognition on chinese electronic medical records: combining deep transfer learning with multitask bi-directional lstm rnn. Plos One, 14:e0216046.

Gao, X., Yan, M., Zhang, C., Wu, G., Shang, J., Zhang, C., and Yang, K. (2025). Mdnndta: a multimodal deep neural network for drug-target affinity prediction. Frontiers in Genetics, 16.

Google (2019). Google colaboratory. [link]. Accessed: 2025-05-24.

Honnibal, M. and Montani, I. (2017). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear.

Hu, S. and Weng, Q. (2025). Graph-based deep fusion for architectural text representation. Peerj Computer Science, 11:e2735.

Jensen, K., Zhang, M., and Plank, B. (2021). De-identification of privacy-related entities in job postings. In Proceedings of the 23rd Nordic Conference on Computational Linguistics, United States. Association for Computational Linguistics. NoDaLiDa 2021 ; Conference date: 31-05-2021.

Lai, P.-T. and Lu, Z. (2020). Bert-gt: cross-sentence n-ary relation extraction with bert and graph transformer. Bioinformatics, 36(24):5678–5685.

Li, J., Sun, A., Han, J., and Li, C. (2022). A survey on deep learning for named entity recognition. Ieee Transactions on Knowledge and Data Engineering, 34:50–70.

Li, Q., Han, Z., and Wu, X. (2018). Deeper insights into graph convolutional networks for semi-supervised learning. Proceedings of the Aaai Conference on Artificial Intelligence, 32.

Liu, N., Feng, Q., and Hu, X. (2022). Interpretability in graph neural networks. pages 121–147.

Long, J., Li, Z., Xuan, Q., Fu, C., Peng, S., and Min, Y. (2023). Social media opinion analysis model based on fusion of text and structural features. Applied Sciences, 13:7221.

Nguyen, K. C., Zhang, M., Montariol, S., and Bosselut, A. (2024). Rethinking skill extraction in the job market domain using large language models. arXiv preprint arXiv:2402.03832.

Nikolentzos, G., Tixier, A., and Vazirgiannis, M. (2020). Message passing attention networks for document understanding. Proceedings of the Aaai Conference on Artificial Intelligence, 34:8544–8551.

Nivre, J., de Marneffe, M.-C., Ginter, F., Hajič, J., Manning, C. D., Pyysalo, S., Schuster, S., Tyers, F., and Zeman, D. (2020). Universal Dependencies v2: An evergrowing multilingual treebank collection. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4034–4043, Marseille, France. European Language Resources Association.

Optuna (2025). Optuna: A hyperparameter optimization framework. [link]. Accessed: 2025-05-18.

Senger, E., Zhang, M., van der Goot, R., and Plank, B. (2024). Deep learning-based computational job market analysis: A survey on skill extraction and classification from job postings. arXiv preprint arXiv:2402.05617.

Shaaban, Y., Korashy, H., and Medhat, W. (2022). Arabic emotion cause extraction using deep learning. The Egyptian Journal of Language Engineering, 0:0–0.

Sun, C., Qiu, X., Xu, Y., and Huang, X. (2019). How to fine-tune bert for text classification? pages 194–206.

Tamburri, D. A., Van Den Heuvel, W.-J., and Garriga, M. (2020). Dataops for societal intelligence: a data pipeline for labor market skills extraction and matching. In 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), pages 391–394. IEEE.

Wang, W. and Yan, X. (2018). Early stopping criterion combining probability density function with validation error for improving the generalization capability of the backpropagation neural network. DEStech Transactions on Engineering and Technology Research.

Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. (2021). A comprehensive survey on graph neural networks. Ieee Transactions on Neural Networks and Learning Systems, 32:4–24.

Xiang, X., Jing, K., and Xu, J. (2024). A neural architecture predictor based on gnnenhanced transformer. In International Conference on Artificial Intelligence and Statistics, pages 1729–1737. PMLR.

Yang, Y. and Cui, X. (2021). Bert-enhanced text graph neural network for classification. Entropy, 23:1536.

Zhang, M. (2024). Computational job market analysis with natural language processing. arXiv preprint arXiv:2404.18977.

Zhang, M., Jensen, K. N., and Plank, B. (2022a). Kompetencer: Fine-grained skill classification in danish job postings via distant supervision and transfer learning. arXiv preprint arXiv:2205.01381.

Zhang, M., Jensen, K. N., Sonniks, S., and Plank, B. (2022b). SkillSpan: Hard and soft skill extraction from English job postings. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4962–4984, Seattle, United States. Association for Computational Linguistics.

Zhang, M., van der Goot, R., Kan, M.-Y., and Plank, B. (2024). Nnose: Nearest neighbor occupational skill extraction. arXiv preprint arXiv:2401.17092.

Zhang, M., Van Der Goot, R., and Plank, B. (2023). Escoxlm-r: Multilingual taxonomydriven pre-training for the job market domain. arXiv preprint arXiv:2305.12092.

Zhang, Z., Liu, D., Zhang, M., and Qin, X. (2021). Combining data augmentation and domain information with tener model for clinical event detection. BMC Medical Informatics and Decision Making, 21.

Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., and Sun, M. (2020). Graph neural networks: a review of methods and applications. Ai Open, 1:57–81.