Avaliação de Modelos Neurais para Sumarização de Código-fonte

Leandro Baêta Lustosa Pontes; Hilário Tomaz Alves de Oliveira; Francisco de Assis Boldt

doi:10.5753/semish.2022.223154

Leandro Baêta Lustosa Pontes IFES
Hilário Tomaz Alves de Oliveira IFES
Francisco de Assis Boldt IFES

DOI: https://doi.org/10.5753/semish.2022.223154

Resumo

Sumarização de código-fonte é a tarefa de criar automaticamente uma descrição em linguagem natural a partir de um trecho de código-fonte. Nos últimos anos, diversos modelos baseados em algoritmos de Aprendizado Profundo têm sido propostos na literatura para essa tarefa. Neste trabalho, realizamos uma análise comparativa entre quatro modelos neurais (CodeBERT, CodeT5, CodeTrans e PLBART) do estado da arte utilizando duas bases de dados comumente usadas para a linguagem de programação Java. Os resultados experimentais demonstram que o modelo CodeTrans obteve o melhor desempenho com base em diferentes medidas de avaliação e que existe uma grande variabilidade nas descrições geradas pelos modelos avaliados.

Palavras-chave: Sumarização de código-fonte, Modelos Neurais, Aprendizado Profundo

Referências

Ahmad, W., Chakraborty, S., Ray, B., and Chang, K.-W. (2021). Unified pre-training for program understanding and generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655–2668, Online. Association for Computational Linguistics.

Banerjee, S. and Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan. Association for Computational Linguistics.

Elnaggar, A., Ding, W., Jones, L., Gibbs, T., Feher, T., Angerer, C., Severini, S., Matthes, F., and Rost, B. (2021). Codetrans: Towards cracking the language of silicone’s code through self-supervised deep learning and high performance computing. CoRR, abs/2104.02443.

Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., and Zhou, M. (2020). CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1536–1547, Online. Association for Computational Linguistics.

Haiduc, S., Aponte, J., Moreno, L., and Marcus, A. (2010). On the use of automated text summarization techniques for summarizing source code. In 2010 17th Working Conference on Reverse Engineering, pages 35–44. IEEE.

Hu, X., Li, G., Xia, X., Lo, D., and Jin, Z. (2018). Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension, ICPC ’18, page 200–210, New York, NY, USA. Association for Computing Machinery.

Husain, H., Wu, H., Gazit, T., Allamanis, M., and Brockschmidt, M. (2019). Codesearchnet challenge: Evaluating the state of semantic code search. CoRR, abs/1909.09436.

Iyer, S., Konstas, I., Cheung, A., and Zettlemoyer, L. (2016). Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2073–2083, Berlin, Germany. Association for Computational Linguistics.

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871– 7880, Online. Association for Computational Linguistics.

Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In MarieFrancine Moens, S. S., editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.

Lin, C.-Y. and Och, F. J. (2004). ORANGE: a method for evaluating automatic evaluation metrics for machine translation. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, pages 501–507, Geneva, Switzerland. COLING.

Liu, S., Chen, Y., Xie, X., Siow, J. K., and Liu, Y. (2020). Automatic code summarization via multi-dimensional semantic fusing in gnn. arXiv preprint arXiv:2006.05405.

Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., Clement, C., Drain, D., Jiang, D., Tang, D., Li, G., Zhou, L., Shou, L., Zhou, L., Tufano, M., GONG, M., Zhou, M., Duan, N., Sundaresan, N., Deng, S. K., Fu, S., and LIU, S. (2021). CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.

Rodeghero, P., McMillan, C., McBurney, P. W., Bosch, N., and D’Mello, S. (2014). Improving automated source code summarization via an eye-tracking study of programmers. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, page 390–401, New York, NY, USA. Association for Computing Machinery.

Sommerville, I. (2011). Engenharia de software. Pearson Prentice Hall.

Sridhara, G., Hill, E., Muppaneni, D., Pollock, L., and Vijay-Shanker, K. (2010). Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE ’10, page 43–52, New York, NY, USA. Association for Computing Machinery.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems 30, pages 5998–6008. Curran Associates, Inc.

Wan, Y., Zhao, Z., Yang, M., Xu, G., Ying, H., Wu, J., and Yu, P. S. (2018). Improving automatic source code summarization via deep reinforcement learning. CoRR, abs/1811.07234.

Wang, Y., Wang, W., Joty, S., and Hoi, S. C. (2021). CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

Xia, X., Bao, L., Lo, D., Xing, Z., Hassan, A. E., and Li, S. (2018). Measuring program comprehension: A large-scale field study with professionals. In Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, page 584, New York, NY, USA. Association for Computing Machinery.

Yang, G., Chen, X., Cao, J., Xu, S., Cui, Z., Yu, C., and Liu, K. (2021). Comformer: Code comment generation via transformer and fusion method-based hybrid code representation. In 8th International Conference on Dependable Systems and Their Applications, DSA 2021, Yinchuan, China, August 5-6, 2021, pages 30–41. IEEE.

Zhang, J., Wang, X., Zhang, H., Sun, H., and Liu, X. (2020). Retrieval-based neural source code summarization. In Proceedings of the 42nd International Conference on Software Engineering. IEEE.

Zhu, Y. and Pan, M. (2019). Automatic code summarization: A systematic literature review. ArXiv, abs/1909.04352.