Indentation-Sensitive Parsers for Free-Form Languages
Resumo
Consistent source code indentation is crucial for code readability, even in free-form languages that are oblivious to whitespace. However, as in a free-form language programmers can format the code as they desire, this may lead to mismatched indentation styles. Automatic code formatters address this problem by rewriting code to a standard layout, however they impose a specific style and only work for syntactically valid code. We describe an extension of Parsing Expression Grammars (PEGs) that can model indentation information and we show that it can be used to specify different indentation styles. A parser based on such extension can check for indentation inconsistencies and inform the developer about them. To evaluate our approach, we also implemented a full parser for the Lua programming language and used it to parse a well-known Lua codebase. We observed that 98% of source code lines were well-indented according to our specification.
Referências
Michael D. Adams and Ömer S. Ağacan. 2014. Indentation-sensitive parsing for Parsec. In Proceedings of the 2014 ACM SIGPLAN Symposium on Haskell (Gothenburg, Sweden) (Haskell ’14). Association for Computing Machinery, New York, NY, USA, 121–132. DOI: 10.1145/2633357.2633369
Joëlle Coutaz. 1985. A layout abstraction for user-system interface. SIGCHI Bull. 16, 3 (Jan. 1985), 18–24. DOI: 10.1145/1044201.1044202
Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-based Syntactic Foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Venice, Italy) (POPL ’04). ACM, New York, NY, USA, 111–122.
Roberto Ierusalimschy. 2016. Programming in lua (4 ed.). [link].
Sérgio Medeiros and Fabio Mascarenhas. 2018. Syntax error recovery in parsing expression grammars. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing (Pau, France) (SAC ’18). Association for Computing Machinery, New York, NY, USA, 1195–1202. DOI: 10.1145/3167132.3167261
Richard J. Miara, Joyce A. Musselman, Juan A. Navarro, and Ben Shneiderman. 1983. Program indentation and comprehensibility. Commun. ACM 26, 11 (Nov. 1983), 861–867. DOI: 10.1145/182.358437
Emma Nilsson-Nyman, Torbjörn Ekman, and Görel Hedin. 2009. Practical Scope Recovery Using Bridge Parsing. In Software Language Engineering, Dragan Gašević, Ralf Lämmel, and Eric Van Wyk (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 95–113.
Dereck C. Oppen. 1980. Prettyprinting. ACM Trans. Program. Lang. Syst. 2, 4 (Oct. 1980), 465–483. DOI: 10.1145/357114.357115
Jan Ouwens. 2024. Why are there no decent code formatters for Java? [link]
Terence Parr and Jurgen Vinju. 2016. Towards a universal code formatter through machine learning. In Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering (Amsterdam, Netherlands) (SLE 2016). Association for Computing Machinery, New York, NY, USA, 137–151. DOI: 10.1145/2997364.2997383
Sérgio Queiroz de Medeiros, Gilney de Azevedo Alvez Junior, and Fabio Mascarenhas. 2020. Automatic syntax error reporting and recovery in parsing expression grammars. Science of Computer Programming 187 (2020), 102373. DOI: 10.1016/j.scico.2019.102373
Philip Wadler. 2003. A prettier printer. Palgrave Macmillan. 223–243 pages.
