ABSTRACT
Regular expressions (REs) are pervasive in computing. We use REs in text editors, string search tools (like GNU-Grep) and lexical analysers generators. Most of these tools rely on converting REs to their corresponding finite state machines or use REs derivatives to directly parse an input string. In this work, we investigate the suitability of another approach: instead of using derivatives or generating a finite state machine for a given RE, we developed an operational semantics for parsing regular languages, in such a way that a RE is merely a program executed according to the semantics over the input string. We show that the proposed semantics is sound w.r.t. a standard inductive semantics for RE and that the evidence produced by it denotes a valid parsing result. All of our results are formalized using Coq proof assistant.
- Andrea Asperti, Claudio Sacerdoti Coen, and Enrico Tassi. 2010. Regular Expressions, au point. CoRR abs/1010.2604 (2010). arxiv:1010.2604http://arxiv.org/abs/1010.2604Google Scholar
- Fahad Ausaf, Roy Dyckhoff, and Christian Urban. 2016. POSIX Lexing with Derivatives of Regular Expressions (Proof Pearl). In Interactive Theorem Proving, Jasmin Christian Blanchette and Stephan Merz (Eds.). Springer International Publishing, Cham, 69–86.Google Scholar
- Adam Chlipala. 2013. Certified Programming with Dependent Types: A Pragmatic Introduction to the Coq Proof Assistant. The MIT Press.Google ScholarDigital Library
- Russ Cox. 2009. Regular Expression Matching: the Virtual Machine Approach. (2009). https://swtch.com/ rsc/regexp/regexp2.htmlGoogle Scholar
- Thales Delfino and Rodrigo Ribeiro. 2018. Towards certified virtual machine-based regular expression parsing — On-line repository. https://github.com/lives-group/regex-transformation.Google Scholar
- Christian Doczkal, Jan Oliver Kaiser, and Gert Smolka. 2013. A constructive theory of regular languages in Coq. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8307 LNCS (2013), 82–97. https://doi.org/10.1007/978-3-319-03545-1_6Google ScholarDigital Library
- Danny Dubé and Marc Feeley. 2000. Efficiently Building a Parse Tree from a Regular Expression. Acta Inf. 37, 2 (Oct. 2000), 121–144. https://doi.org/10.1007/s002360000037Google ScholarDigital Library
- Denis Firsov and Tarmo Uustalu. 2013. Certified Parsing of Regular Languages. In Proceedings of the Third International Conference on Certified Programs and Proofs - Volume 8307. Springer-Verlag New York, Inc., New York, NY, USA, 98–113. https://doi.org/10.1007/978-3-319-03545-1_7Google ScholarDigital Library
- Sebastian Fischer, Frank Huch, and Thomas Wilke. 2010. A play on regular expressions. ACM SIGPLAN Notices 45, 9 (2010), 357. https://doi.org/10.1145/1932681.1863594Google ScholarDigital Library
- Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-based Syntactic Foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Venice, Italy) (POPL ’04). ACM, New York, NY, USA, 111–122. https://doi.org/10.1145/964001.964011Google ScholarDigital Library
- Alain Frisch and Luca Cardelli. 2004. Greedy Regular Expression Matching. ICALP 2004 - International Colloquium on Automata, Languages and Programming 3142 (2004), 618–629.Google Scholar
- Fritz Henglein and Lasse Nielsen. 2011. Regular Expression Containment: Coinductive Axiomatization and Computational Interpretation. SIGPLAN Not. 46, 1 (Jan. 2011), 385–398. https://doi.org/10.1145/1925844.1926429Google ScholarDigital Library
- Roberto Ierusalimschy. 2009. A text patternmatching tool based on parsing expression grammars. Software - Practice and Experience (2009). https://doi.org/10.1002/spe.892Google ScholarCross Ref
- Simon Peyton Jones (Ed.). 2002. Haskell 98 Language and Libraries: The Revised Report. http://haskell.org/. 277 pages. http://haskell.org/definition/haskell98-report.pdfGoogle Scholar
- Donald E. Knuth. 1971. Top-down Syntax Analysis. Acta Inf. 1, 2 (June 1971), 79–110. https://doi.org/10.1007/BF00289517Google ScholarDigital Library
- Raul Lopes, Rodrigo Ribeiro, and Carlos Camarão. 2016. Certified Derivative-Based Parsing of Regular Expressions. In Programming Languages — Lecture Notes in Computer Science 9889. Springer, 95–109.Google Scholar
- Raul Felipe Pimenta Lopes. 2018. Certified Derivative-based Parsing of Regular Expressions — Master Thesis. (2018).Google Scholar
- Sérgio Medeiros and Roberto Ierusalimschy. 2008. A parsing machine for PEGs. Proceedings of the 2008 symposium on Dynamic languages - DLS ’08 (2008), 1–12. https://doi.org/10.1145/1408681.1408683Google ScholarDigital Library
- Sérgio Medeiros, Fabio Mascarenhas, and Roberto Ierusalimschy. 2014. From Regexes to Parsing Expression Grammars. Sci. Comput. Program. 93 (Nov. 2014), 3–18. https://doi.org/10.1016/j.scico.2012.11.006Google ScholarDigital Library
- Lasse Nielsen and Fritz Henglein. 2011. Bit-coded Regular Expression Parsing. In Language and Automata Theory and Applications, Adrian-Horia Dediu, Shunsuke Inenaga, and Carlos Martín-Vide (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 402–413.Google Scholar
- Asiri Rathnayake and Hayo Thielecke. 2011. Regular Expression Matching and Operational Semantics. Electronic Proceedings in Theoretical Computer Science 62, Sos (2011), 31–45. https://doi.org/10.4204/EPTCS.62.3 arxiv:1108.3126Google ScholarCross Ref
- Rodrigo Ribeiro and André Du Bois. 2017. Certified Bit-Coded Regular Expression Parsing. Proceedings of the 21st Brazilian Symposium on Programming Languages - SBLP 2017 (2017), 1–8. http://dl.acm.org/citation.cfm?doid=3125374.3125381Google ScholarDigital Library
- Martin Sulzmann and Kenny Zhuo Ming Lu. 2014. POSIX Regular Expression Parsing with Derivatives. In Functional and Logic Programming, Michael Codish and Eijiro Sumii (Eds.). Springer International Publishing, Cham, 203–220.Google Scholar
Recommendations
Towards certified virtual machine-based regular expression parsing
SBLP '18: Proceedings of the XXII Brazilian Symposium on Programming LanguagesRegular expressions (REs) are pervasive in computing. We use REs in text editors, string search tools (like GNU-Grep) and lexical analysers generators. Most of these tools rely on converting regular expressions to its corresponding finite state machine ...
Regular right part grammars and their parsers
This paper introduces an alternative to context-free grammars called regular right part (RRP) grammars, which resemble PASCAL syntax diagrams. Formally, RRP grammars have production right parts, which are nondeterministic finite state machines (FSMs), ...
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languagesFor decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Comments