skip to main content
10.1145/3624309.3624317acmotherconferencesArticle/Chapter ViewAbstractPublication PagessblpConference Proceedingsconference-collections
research-article

A Verified Operational Semantics for Regular Expression Parsing

Published:02 November 2023Publication History

ABSTRACT

Regular expressions (REs) are pervasive in computing. We use REs in text editors, string search tools (like GNU-Grep) and lexical analysers generators. Most of these tools rely on converting REs to their corresponding finite state machines or use REs derivatives to directly parse an input string. In this work, we investigate the suitability of another approach: instead of using derivatives or generating a finite state machine for a given RE, we developed an operational semantics for parsing regular languages, in such a way that a RE is merely a program executed according to the semantics over the input string. We show that the proposed semantics is sound w.r.t. a standard inductive semantics for RE and that the evidence produced by it denotes a valid parsing result. All of our results are formalized using Coq proof assistant.

References

  1. Andrea Asperti, Claudio Sacerdoti Coen, and Enrico Tassi. 2010. Regular Expressions, au point. CoRR abs/1010.2604 (2010). arxiv:1010.2604http://arxiv.org/abs/1010.2604Google ScholarGoogle Scholar
  2. Fahad Ausaf, Roy Dyckhoff, and Christian Urban. 2016. POSIX Lexing with Derivatives of Regular Expressions (Proof Pearl). In Interactive Theorem Proving, Jasmin Christian Blanchette and Stephan Merz (Eds.). Springer International Publishing, Cham, 69–86.Google ScholarGoogle Scholar
  3. Adam Chlipala. 2013. Certified Programming with Dependent Types: A Pragmatic Introduction to the Coq Proof Assistant. The MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Russ Cox. 2009. Regular Expression Matching: the Virtual Machine Approach. (2009). https://swtch.com/ rsc/regexp/regexp2.htmlGoogle ScholarGoogle Scholar
  5. Thales Delfino and Rodrigo Ribeiro. 2018. Towards certified virtual machine-based regular expression parsing — On-line repository. https://github.com/lives-group/regex-transformation.Google ScholarGoogle Scholar
  6. Christian Doczkal, Jan Oliver Kaiser, and Gert Smolka. 2013. A constructive theory of regular languages in Coq. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8307 LNCS (2013), 82–97. https://doi.org/10.1007/978-3-319-03545-1_6Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Danny Dubé and Marc Feeley. 2000. Efficiently Building a Parse Tree from a Regular Expression. Acta Inf. 37, 2 (Oct. 2000), 121–144. https://doi.org/10.1007/s002360000037Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Denis Firsov and Tarmo Uustalu. 2013. Certified Parsing of Regular Languages. In Proceedings of the Third International Conference on Certified Programs and Proofs - Volume 8307. Springer-Verlag New York, Inc., New York, NY, USA, 98–113. https://doi.org/10.1007/978-3-319-03545-1_7Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sebastian Fischer, Frank Huch, and Thomas Wilke. 2010. A play on regular expressions. ACM SIGPLAN Notices 45, 9 (2010), 357. https://doi.org/10.1145/1932681.1863594Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-based Syntactic Foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Venice, Italy) (POPL ’04). ACM, New York, NY, USA, 111–122. https://doi.org/10.1145/964001.964011Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Alain Frisch and Luca Cardelli. 2004. Greedy Regular Expression Matching. ICALP 2004 - International Colloquium on Automata, Languages and Programming 3142 (2004), 618–629.Google ScholarGoogle Scholar
  12. Fritz Henglein and Lasse Nielsen. 2011. Regular Expression Containment: Coinductive Axiomatization and Computational Interpretation. SIGPLAN Not. 46, 1 (Jan. 2011), 385–398. https://doi.org/10.1145/1925844.1926429Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Roberto Ierusalimschy. 2009. A text patternmatching tool based on parsing expression grammars. Software - Practice and Experience (2009). https://doi.org/10.1002/spe.892Google ScholarGoogle ScholarCross RefCross Ref
  14. Simon Peyton Jones (Ed.). 2002. Haskell 98 Language and Libraries: The Revised Report. http://haskell.org/. 277 pages. http://haskell.org/definition/haskell98-report.pdfGoogle ScholarGoogle Scholar
  15. Donald E. Knuth. 1971. Top-down Syntax Analysis. Acta Inf. 1, 2 (June 1971), 79–110. https://doi.org/10.1007/BF00289517Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Raul Lopes, Rodrigo Ribeiro, and Carlos Camarão. 2016. Certified Derivative-Based Parsing of Regular Expressions. In Programming Languages — Lecture Notes in Computer Science 9889. Springer, 95–109.Google ScholarGoogle Scholar
  17. Raul Felipe Pimenta Lopes. 2018. Certified Derivative-based Parsing of Regular Expressions — Master Thesis. (2018).Google ScholarGoogle Scholar
  18. Sérgio Medeiros and Roberto Ierusalimschy. 2008. A parsing machine for PEGs. Proceedings of the 2008 symposium on Dynamic languages - DLS ’08 (2008), 1–12. https://doi.org/10.1145/1408681.1408683Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sérgio Medeiros, Fabio Mascarenhas, and Roberto Ierusalimschy. 2014. From Regexes to Parsing Expression Grammars. Sci. Comput. Program. 93 (Nov. 2014), 3–18. https://doi.org/10.1016/j.scico.2012.11.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lasse Nielsen and Fritz Henglein. 2011. Bit-coded Regular Expression Parsing. In Language and Automata Theory and Applications, Adrian-Horia Dediu, Shunsuke Inenaga, and Carlos Martín-Vide (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 402–413.Google ScholarGoogle Scholar
  21. Asiri Rathnayake and Hayo Thielecke. 2011. Regular Expression Matching and Operational Semantics. Electronic Proceedings in Theoretical Computer Science 62, Sos (2011), 31–45. https://doi.org/10.4204/EPTCS.62.3 arxiv:1108.3126Google ScholarGoogle ScholarCross RefCross Ref
  22. Rodrigo Ribeiro and André Du Bois. 2017. Certified Bit-Coded Regular Expression Parsing. Proceedings of the 21st Brazilian Symposium on Programming Languages - SBLP 2017 (2017), 1–8. http://dl.acm.org/citation.cfm?doid=3125374.3125381Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Martin Sulzmann and Kenny Zhuo Ming Lu. 2014. POSIX Regular Expression Parsing with Derivatives. In Functional and Logic Programming, Michael Codish and Eijiro Sumii (Eds.). Springer International Publishing, Cham, 203–220.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    SBLP '23: Proceedings of the XXVII Brazilian Symposium on Programming Languages
    September 2023
    110 pages
    ISBN:9798400716287
    DOI:10.1145/3624309

    Copyright © 2023 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 2 November 2023

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate22of50submissions,44%
  • Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)8

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format