On the Challenges of Using Large Language Models for NCL Code Generation

Daniel de Sousa Moraes; Polyana Bezerra da Costa; Antonio J. G. Busson; José Matheus Carvalho Boaro; Carlos de Salles Soares Neto; Sergio Colcher

doi:10.5753/webmedia_estendido.2023.236175

Daniel de Sousa Moraes PUC-Rio
Polyana Bezerra da Costa PUC-Rio
Antonio J. G. Busson BTG Pactual
José Matheus Carvalho Boaro PUC-Rio
Carlos de Salles Soares Neto UFMA
Sergio Colcher PUC-Rio

DOI: https://doi.org/10.5753/webmedia_estendido.2023.236175

Resumo

A significant concern raised in the domain of authoring tools for interactive Digital TV (iDTV) has been their usability when considering the target audience, which typically consists of content creators and not necessarily programmers. NCL (Nested Context Language), the declarative language for developing interactive applications for Brazilian Digital TV and an ITU-T Recommendation for IPTV services, is a simple declarative language but not an easy tool for non-technical authors. The proliferation of Large Language Models (LLMs) has recently instigated substantial transformations across several domains, including synthesizing code with remarkable potential. This paper proposes an investigation into the challenges of using LLMs to aid automatic NCL code generation/synthesis in authoring tools for iDTV content production. It shows initial evidence that current pre-trained LLMs cannot synthesize NCL code with satisfactory quality. In this context, we raise the main challenges for NCL code generation using LLMs and some issues related to the good practices for engineering prompts and integrating pre-trained LLMs into multimedia authoring tools.

Palavras-chave: NCL, LLMs, Code Generation, Authoring

Referências

Toufique Ahmed and Premkumar Devanbu. 2022. Few-shot training LLMs for project-specific code-summarization. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–5.

Rohan Anil and Andrew M. Dai. 2023. PaLM 2 Technical Report. (2023). arXiv:2305.10403 [cs.CL]

Roberto Gerson Albuquerque Azevedo, Carlos de Salles Soares Neto, Mario Meireles Teixeira, Rodrigo Costa Mesquita Santos, and Thiago Alencar Gomes. 2011. Textual authoring of interactive digital TV applications. In Proceedings of the 9th European Conference on Interactive TV and Video. 235–244.

Stephen H Bach, Victor Sanh, Zheng-Xin Yong, Albert Webson, Colin Raffel, Nihal V Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, et al. 2022. Promptsource: An integrated development environment and repository for natural language prompts. arXiv preprint arXiv:2202.01279 (2022).

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

Antonio José G Busson, André Luiz de B Damasceno, Thacyla de S Lima, and Carlos de Salles Soares Neto. 2016. Scenesync: A hypermedia authoring language for temporal synchronism of learning objects. In Proceedings of the 22nd Brazilian Symposium on Multimedia and the Web. 175–182.

Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, et al. 2023. Low-code LLM: Visual Programming over LLMs. arXiv preprint arXiv:2304.08103 (2023).

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).

Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. 2023. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128 (2023).

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).

André Luiz de B Damasceno, Thacyla de Sousa Lima, Carlos de Salles Soares Neto, et al. 2014. Cacuriá: Uma Ferramenta de Autoria Multimídia para Objetos de Aprendizagem. In Anais dos Workshops do Congresso Brasileiro de Informática na Educação, Vol. 3. 76.

Douglas Paulo de Mattos and Débora C Muchaluat-Saade. 2018. Steve: A hypermedia authoring tool based on the simple interactive multimedia model. In Proceedings of the ACM Symposium on Document Engineering 2018. 1–10.

Alexander J Fiannaca, Chinmay Kulkarni, Carrie J Cai, and Michael Terry. 2023. Programming without a Programming Language: Challenges and Opportunities for Designing Developer Tools for Prompt Programming. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–7.

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, et al. 2023. Textbooks Are All You Need. arXiv preprint arXiv:2306.11644 (2023).

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790–2799.

Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, and Robert McHardy. 2023. Challenges and applications of large language models. arXiv preprint arXiv:2307.10169 (2023).

Heidy Khlaaf, Pamela Mishkin, Joshua Achiam, Gretchen Krueger, and Miles Brundage. 2022. A hazard analysis framework for code synthesis large language models. arXiv preprint arXiv:2207.14157 (2022).

Rodrigo Laiola Guimarães, Romualdo Monteiro de Resende Costa, and Luiz Fernando Gomes Soares. 2008. Composer: Authoring tool for iTV programs. In Changing Television Environments: 6th European Conference, EUROITV 2008, Salzburg, Austria, July 3-4, 2008 Proceedings 6. Springer, 61–71.

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. 2022. Competition-level code generation with alphacode. Science 378, 6624 (2022), 1092–1097.

Vadim Liventsev, Anastasiia Grishina, Aki Härmä, and Leon Moonen. 2023. Fully Autonomous Programming with Large Language Models. arXiv preprint arXiv:2304.10423 (2023).

Carlos de Salles Soares Neto, Thacyla de Sousa Lima, André Luiz de B Damasceno, and Antonio José G Busson. 2017. Creating Multimedia Learning Objects. In Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web. 19–21.

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022).

Dina Nogueira, Lois Nascimento, Michael Mello, and Rodrigo Braga. 2020. NuGinga Playcode: A web NCL/NCLua authoring tool for Ginga-NCL digital TV applications. In Anais Estendidos do XXVI Simpósio Brasileiro de Sistemas Multimídia e Web. SBC, 75–78.

Long Ouyang, JeffWu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. (2022). arXiv:2203.02155 [cs.CL]

Douglas Paulo de Mattos, Júlia Varanda da Silva, and Débora Christina Muchaluat-Saade. 2013. NEXT: graphical editor for authoring NCL documents supporting composite templates. In Proceedings of the 11th european conference on Interactive TV and video. 89–98.

Hedvan Fernandes Pinto, Antonio José Grandson Busson, Carlos de Salles Soares Neto, and Samyr Beliche Vale. 2016. Creating Non-Linear Interactive Narratives with Fábulas Model. In Proceedings of the 22nd Brazilian Symposium on Multimedia and the Web. 207–210.

Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan Boyd-Graber, and Lijuan Wang. 2022. Prompting gpt-3 to be reliable. arXiv preprint arXiv:2210.09150 (2022).

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).

Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. 2020. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur) 53, 3 (2020), 1–34.

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).

Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. 2022. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910 (2022).