Divinator: A Visual Studio Code Extension to Source Code Summarization

Rafael S. Durelli; Vinicius H. S. Durelli; Raphael W. Bettio; Diego R. C. Dias; Alfredo Goldman

doi:10.5753/vem.2022.226187

Rafael S. Durelli UFLA
Vinicius H. S. Durelli UFSJ
Raphael W. Bettio UFLA
Diego R. C. Dias UFSJ
Alfredo Goldman USP

DOI: https://doi.org/10.5753/vem.2022.226187

Resumo

Software developers spend a substantial amount of time reading and understanding code. Research has shown that code comprehension tasks can be expedited by reading the available documentation. However, documentation is expensive to generate and maintain, so the available documentation is often missing or outdated. Thus, automated generation of brief natural language descriptions for source code is desirable and has the potential to play a key role in source code comprehension and development. In particular, recent advances in deep learning have led to sophisticated summary generation techniques. Nevertheless, to the best of our knowledge, no study has fully integrated a state-of-the-art code summarization technique into an integrated development environment (IDE). In hopes of filling this gap, we developed a VS Code extension that allows developers to take advantage of state-of-the-art code summarization from within the IDE. This paper describes Divinator, our IDE-integrated tool for source code summarization.

Palavras-chave: learning, source-code summarization, deep-learning

Referências

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020).

Utkarsh Desai, Giriprasad Sridhara, and Srikanth Tamilselvam. 2021. Advances in Code Summarization. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 330-331.

Ahmed Elnaggar, Wei Ding, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Silvia Severini, Florian Matthes, and Burkhard Rost. 2021. CodeTrans: Towards Cracking the Language of Silicone's Code Through Self-Supervised Deep Learning and High Performance Computing. CoRR abs/2104.02443 (2021).

Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In 2010 17th Working Conference on Reverse Engineering. IEEE, 35-44.

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2073-2083.

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. CoRR abs/1909.11942 (2019). arXiv:1909.11942 http://arxiv.org/abs/1909.11942.

J Devlin M Chang K Lee and K Toutanova. 2018. Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019).

Paul W. McBurney and Collin McMillan. 2014. Automatic Documentation Generation via Source Code Summarization of Method Context. In Proceedings of the 22nd International Conference on Program Comprehension. ACM, 279-290.

Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K Vijay-Shanker. 2013. Automatic generation of natural language summaries for java classes. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 23-32.

Paige Rodeghero, Collin McMillan, Paul W McBurney, Nigel Bosch, and Sidney D'Mello. 2014. Improving automated source code summarization via an eye-tracking study of programmers. In Proceedings of the 36th international conference on Software engineering. 390-401.

Lin Shi, Hao Zhong, Tao Xie, and Mingshu Li. 2011. An Empirical Study on Evolution of API Documentation. In Proceedings of the 14th International Conference on Fundamental Approaches to Software Engineering: Part of the Joint European Conferences on Theory and Practice of Software. Springer-Verlag, 416-431.

Ian Sommerville. 2015. Software engineering 10th Edition. ISBN-10 137035152 (2015), 18.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Åukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., 15-25.

Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. 2020. mT5: A massively multilingual pre-trained text-to-text transformer. CoRR abs/2010.11934 (2020).

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. CoRR abs/1906.08237 (2019). arXiv:1906.08237 http://arxiv.org/abs/1906.08237

Hao Zhong and Zhendong Su. 2013. Detecting API Documentation Errors. SIGPLAN Notices 48, 10 (2013), 803-816.

Yuxiang Zhu and Minxue Pan. 2019. Automatic Code Summarization: A Systematic Literature Review. CoRR abs/1909.04352 (2019).