Pest control: A formal model of the Pest parser generator
Resumo
Parsing expressions grammars (PEGs) are a recognition-based formalism for parsers, which has been gaining popularity because they avoid some problems present in context-free grammar based specifications. In the context of the Rust programming language, the most used PEG-based parser generator is Pest. An interesting feature of Pest grammars is that they support new operators for repetition and handling an implicit stack during parsing. In this work, we propose a formalization of these new constructs and extend a type inference algorithm which guarantee that all well-typed Pest-style grammars do not loop on inputs.
Referências
Anonymous authors. [n. d.]. SECRET Library - online repository. [link].
Casper Bach Poulsen, Arjen Rouvoet, Andrew Tolmach, Robbert Krebbers, and Eelco Visser. 2017. Intrinsically-typed definitional interpreters for imperative languages. Proc. ACM Program. Lang. 2, POPL, Article 16 (Dec. 2017), 34 pages. DOI: 10.1145/3158104
Jean-Philippe Bernardy and Patrik Jansson. 2016. Certified Context-Free Parsing: A formalisation of Valiant’s Algorithm in Agda. Logical Methods in Computer Science 12 (2016). Issue 2. DOI: 10.2168/LMCS-12(2:6)2016
Elton M. Cardoso, Regina Sarah Monferrari Amorim De Paula, Daniel Freitas Pereira, Leonardo Vieira dos Santos Reis, and Rodrigo Geraldo Ribeiro. 2023. Type-based Termination Analysis for Parsing Expression Grammars. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, SAC 2023, Tallinn, Estonia, March 27-31, 2023, Jiman Hong, Maart Lanperne, JuwWon Park, Tomás Cerný, and Hossain Shahriar (Eds.). ACM, 1372–1379. DOI: 10.1145/3555776.3577620
Ryan Culpepper, Matthias Felleisen, Matthew Flatt, and Shriram Krishnamurthi. 2019. From Macros to DSLs: The Evolution of Racket. In 3rd Summit on Advances in Programming Languages, SNAPL 2019, May 16-17, 2019, Providence, RI, USA (LIPIcs, Vol. 136), Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 5:1–5:19. DOI: 10.4230/LIPICS.SNAPL.2019.5
Denis Firsov and Tarmo Uustalu. 2013. Certified Parsing of Regular Languages. In Certified Programs and Proofs - Third International Conference, CPP 2013, Melbourne, VIC, Australia, December 11-13, 2013, Proceedings (Lecture Notes in Computer Science, Vol. 8307), Georges Gonthier and Michael Norrish (Eds.). Springer, 98–113. DOI: 10.1007/978-3-319-03545-1_7
Denis Firsov and Tarmo Uustalu. 2014. Certified CYK parsing of context-free languages. J. Log. Algebraic Methods Program. 83, 5-6 (2014), 459–468. DOI: 10.1016/j.jlamp.2014.09.002
Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-Based Syntactic Foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Venice, Italy) (POPL ’04). Association for Computing Machinery, New York, NY, USA, 111–122. DOI: 10.1145/964001.964011
Dick Grune and Ceriel J.H. Jacobs. 1990. Parsing Techniques - A Practical Guide. Ellis Horwood, Chichester, England. 300+ pages. [link] PDF online free.
Trevor Jim, Yitzhak Mandelbaum, and David Walker. 2010. Semantics and algorithms for data-dependent grammars. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Madrid, Spain) (POPL ’10). Association for Computing Machinery, New York, NY, USA, 417–430. DOI: 10.1145/1706299.1706347
Steve Klabnik and Carol Nichols. 2018. The Rust Programming Language. No Starch Press, USA.
Neelakantan R. Krishnaswami and Jeremy Yallop. 2019. A Typed, Algebraic Approach to Parsing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). Association for Computing Machinery, NewYork, NY, USA, 379–393. DOI: 10.1145/3314221.3314625
Raul Lopes, Rodrigo Geraldo Ribeiro, and Carlos Camarão. 2016. Certified Derivative-Based Parsing of Regular Expressions. In Programming Languages - 20th Brazilian Symposium, SBLP 2016, Maringá, Brazil, September 22-23, 2016, Proceedings (Lecture Notes in Computer Science, Vol. 9889), Fernando Castor and Yu David Liu (Eds.). Springer, 95–109. DOI: 10.1007/978-3-319-45279-1_7
Stefan Lucks, Norina Marie Grosch, and Joshua König. 2017. Taming the Length Field in Binary Data: Calc-Regular Languages. In 2017 IEEE Security and Privacy Workshops (SPW). 66–79. DOI: 10.1109/SPW.2017.33
Ulf Norell. 2008. Dependently Typed Programming in Agda. In Proceedings of the 6th International Conference on Advanced Functional Programming (Heijen, The Netherlands) (AFP’08). Springer-Verlag, Berlin, Heidelberg, 230–266.
François Pottier and Didier Rémy. 2005. The Essence of ML Type Inference. In Advanced Topics in Types and Programming Languages, Benjamin C. Pierce (Ed.). MIT Press, Chapter 10, 389–489.
Rodrigo Ribeiro, Leonardo V. S. Reis, Samuel Feitosa, and Elton M. Cardoso. 2019. Towards Typed Semantics for Parsing Expression Grammars. In Proceedings of the XXIII Brazilian Symposium on Programming Languages (Salvador, Brazil) (SBLP 2019). Association for Computing Machinery, New York, NY, USA, 70–77. DOI: 10.1145/3355378.3355388
Tartasprint. [n. d.]. On the hunt of grammars making the parser run indefinitely.
Pest Development Team. [n. d.]. Pest - online repository. [link].
Jialun Zhang, Greg Morrisett, and Gang Tan. 2023. Interval Parsing Grammars for File Format Parsing. Proc. ACM Program. Lang. 7, PLDI, Article 150 (jun 2023), 23 pages. DOI: 10.1145/3591264
