Pre-trained Language Models for Multi-Label Text Classification of Competitive Programming Problems
Resumo
This paper explores the use of pre-trained language models for classifying programming problems from online judges based on topics commonly addressed in competitive programming. State-of-the-art language models were employed as text classifiers, including Long Short-Term Memory (LSTM) and Bidirectional LSTM with pre-trained Word2Vec embeddings, Bidirectional Encoder Representations from Transformers (BERT), and Llama3.1-8B. Experiments were conducted using two different representations of the programming problems: standalone statement and statement with source code. The results showed that Llama3.1-8B achieved the best overall Macro F1-Score, outperforming the other models by a significant margin.
Palavras-chave:
online judges, problem solving, multi-label text classification, language models, LLM
Referências
Ainslie, J., Lee-Thorp, J., de Jong, M., Zemlyanskiy, Y., Lebron, F., and Sanghai, S. (2023). Gqa: Training generalized multi-query transformer models from multi-head checkpoints. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA. Association for Computational Linguistics.
Brown, T. B. (2020). Language models are few-shot learners. arXiv preprint ArXiv:2005.14165.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.
Fonseca, S. C., Pereira, F. D., Oliveira, E. H., Oliveira, D. B., Carvalho, L. S., and Cristea, A. I. (2020). Automatic subject-based contextualisation of programming assignment lists. International Educational Data Mining Society.
GenAI Meta (2023). Llama 2: Open foundation and fine-tuned chat models.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8):1735–1780. [Online; accessed 2024-08-18].
Huang, T., Sun, Z., Jin, Z., Li, G., and Lyu, C. (2024). Knowledge-aware code generation with large language models. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension, pages 52–63.
Iancu, B., Mazzola, G., Psarakis, K., and Soilis, P. (2019). Multi-label classification for automatic tag prediction in the context of programming challenges. arXiv preprint arXiv:1911.12224.
Kim, J., Cho, E., Kim, D., and Na, D. (2023). Problem-solving guide: Predicting the algorithm tags and difficulty for competitive programming problems.
Llama team (2024). The llama 3 herd of models.
Lobanov, A., Bogomolov, E., Golubev, Y., Mirzayanov, M., and Bryksin, T. (2023). Predicting tags for programming tasks by combining textual and source code data.
Moreira, J., Silva, C., Santos, A., Ferreira, L., and Reis, J. (2024). Abordagem não-supervisionada para inferência do tópico de um exercício de programação a partir do código solução. In Anais do XXXII Workshop sobre Educação em Computação, pages 842–853, Porto Alegre, RS, Brasil. SBC.
Mountantonakis, M., Mertzanis, L., Bastakis, M., and Tzitzikas, Y. (2024). A comparative evaluation for question answering over Greek texts by using machine translation and BERT. Language Resources and Evaluation.
Pinnow, N., Ramadan, T., Islam, T. Z., Phelps, C., and Thiagarajan, J. J. (2021). Comparative code structure analysis using deep learning for performance prediction. arXiv preprint arXiv:2102.07660.
Qiao, Y., Xiong, C., Liu, Z., and Liu, Z. (2019). Understanding the behaviors of bert in ranking.
Shalaby, M., Mehrez, T., El Mougy, A., Abdulnasser, K., and Al-Safty, A. (2017). Automatic algorithm recognition of source-code using machine learning. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 170–177.
Shao, Z., Yu, Z., Wang, M., and Yu, J. (2023). Prompting large language models with answer heuristics for knowledge-based visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14974–14983.
Suciu, V., Giang, I., Zhao, B., Runandy, J., and Dang, M. (2021). Generating hints for programming problems without a solution. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, pages 1382–1382.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2023). Attention is all you need.
Wilkho, R. S., Chang, S., and Gharaibeh, N. G. (2024). Ff-bert: A bert-based ensemble for automated classification of web-based text on flash flood events. Advanced Engineering Informatics, 59:102293.
Yilmaz, R. and Yilmaz, F. G. K. (2023). The effect of generative artificial intelligence (ai)-based tool use on students’ computational thinking skills, programming self-efficacy and motivation. Computers and Education: Artificial Intelligence, 4:100147.
Zhang, B., Haddow, B., and Birch, A. (2023). Prompting large language model for machine translation: A case study. In International Conference on Machine Learning, pages 41092–41110. PMLR.
Zhang, H., Yu, P. S., and Zhang, J. (2024a). A systematic survey of text summarization: From statistical methods to large language models. arXiv preprint arXiv:2406.11289.
Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., and Liu, X. (2019). A novel neural source code representation based on abstract syntax tree. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pages 783–794. IEEE.
Zhang, Z., Dong, Z., Shi, Y., Price, T., Matsuda, N., and Xu, D. (2024b). Students’ perceptions and preferences of generative artificial intelligence feedback for programming. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 23250–23258.
Zhao, C., Feng, R., Sun, X., Shen, L., Gao, J., and Wang, Y. (2024). Enhancing aspect-based sentiment analysis with bert-driven context generation and quality filtering. Natural Language Processing Journal, 7:100077.
Zhou, Y. and Tao, C. (2020). Multi-task bert for problem difficulty prediction. In 2020 International Conference on Communications, Information System and Computer Engineering (CISCE), pages 213–216. IEEE.
Brown, T. B. (2020). Language models are few-shot learners. arXiv preprint ArXiv:2005.14165.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.
Fonseca, S. C., Pereira, F. D., Oliveira, E. H., Oliveira, D. B., Carvalho, L. S., and Cristea, A. I. (2020). Automatic subject-based contextualisation of programming assignment lists. International Educational Data Mining Society.
GenAI Meta (2023). Llama 2: Open foundation and fine-tuned chat models.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8):1735–1780. [Online; accessed 2024-08-18].
Huang, T., Sun, Z., Jin, Z., Li, G., and Lyu, C. (2024). Knowledge-aware code generation with large language models. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension, pages 52–63.
Iancu, B., Mazzola, G., Psarakis, K., and Soilis, P. (2019). Multi-label classification for automatic tag prediction in the context of programming challenges. arXiv preprint arXiv:1911.12224.
Kim, J., Cho, E., Kim, D., and Na, D. (2023). Problem-solving guide: Predicting the algorithm tags and difficulty for competitive programming problems.
Llama team (2024). The llama 3 herd of models.
Lobanov, A., Bogomolov, E., Golubev, Y., Mirzayanov, M., and Bryksin, T. (2023). Predicting tags for programming tasks by combining textual and source code data.
Moreira, J., Silva, C., Santos, A., Ferreira, L., and Reis, J. (2024). Abordagem não-supervisionada para inferência do tópico de um exercício de programação a partir do código solução. In Anais do XXXII Workshop sobre Educação em Computação, pages 842–853, Porto Alegre, RS, Brasil. SBC.
Mountantonakis, M., Mertzanis, L., Bastakis, M., and Tzitzikas, Y. (2024). A comparative evaluation for question answering over Greek texts by using machine translation and BERT. Language Resources and Evaluation.
Pinnow, N., Ramadan, T., Islam, T. Z., Phelps, C., and Thiagarajan, J. J. (2021). Comparative code structure analysis using deep learning for performance prediction. arXiv preprint arXiv:2102.07660.
Qiao, Y., Xiong, C., Liu, Z., and Liu, Z. (2019). Understanding the behaviors of bert in ranking.
Shalaby, M., Mehrez, T., El Mougy, A., Abdulnasser, K., and Al-Safty, A. (2017). Automatic algorithm recognition of source-code using machine learning. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 170–177.
Shao, Z., Yu, Z., Wang, M., and Yu, J. (2023). Prompting large language models with answer heuristics for knowledge-based visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14974–14983.
Suciu, V., Giang, I., Zhao, B., Runandy, J., and Dang, M. (2021). Generating hints for programming problems without a solution. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, pages 1382–1382.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2023). Attention is all you need.
Wilkho, R. S., Chang, S., and Gharaibeh, N. G. (2024). Ff-bert: A bert-based ensemble for automated classification of web-based text on flash flood events. Advanced Engineering Informatics, 59:102293.
Yilmaz, R. and Yilmaz, F. G. K. (2023). The effect of generative artificial intelligence (ai)-based tool use on students’ computational thinking skills, programming self-efficacy and motivation. Computers and Education: Artificial Intelligence, 4:100147.
Zhang, B., Haddow, B., and Birch, A. (2023). Prompting large language model for machine translation: A case study. In International Conference on Machine Learning, pages 41092–41110. PMLR.
Zhang, H., Yu, P. S., and Zhang, J. (2024a). A systematic survey of text summarization: From statistical methods to large language models. arXiv preprint arXiv:2406.11289.
Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., and Liu, X. (2019). A novel neural source code representation based on abstract syntax tree. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pages 783–794. IEEE.
Zhang, Z., Dong, Z., Shi, Y., Price, T., Matsuda, N., and Xu, D. (2024b). Students’ perceptions and preferences of generative artificial intelligence feedback for programming. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 23250–23258.
Zhao, C., Feng, R., Sun, X., Shen, L., Gao, J., and Wang, Y. (2024). Enhancing aspect-based sentiment analysis with bert-driven context generation and quality filtering. Natural Language Processing Journal, 7:100077.
Zhou, Y. and Tao, C. (2020). Multi-task bert for problem difficulty prediction. In 2020 International Conference on Communications, Information System and Computer Engineering (CISCE), pages 213–216. IEEE.
Publicado
17/11/2024
Como Citar
SOUZA, Bruno Vargas de; SILVESTRE, Ana Sofia S.; LISBOA, Victor Hugo F.; BORGES, Vinicius R. P..
Pre-trained Language Models for Multi-Label Text Classification of Competitive Programming Problems. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 21. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 73-84.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2024.245222.