Sedlex: Unicode-friendly lexer generator for OCaml

sedlex is a lexer generator for OCaml, similar to ocamllex, but supporting Unicode. It is the sucessor of the ulex project. Contrary to ulex which was implemented as a Camlp4 syntax extension, sedlex is based on the new "-ppx" technology of OCaml, which allow rewriting OCaml parse trees through external rewriters. (And what a better name than "sed" for a rewriter?)

As any -ppx rewriter, sedlex does not touch the concrete syntax of the language: lexer specifications are written in source file which comply with the standard grammar of OCaml programs. sedlex reuse the syntax for pattern matching in order to describe lexers (regular expressions are encoded within OCaml patterns). A nice consequence is that your editor (vi, emacs, ...) won't get confused (indentation, coloring) and you don't need to learn new priority rules. Moreover, sedlex is compatible with any front-end parsing technology: it works fine even if you use camlp4 or camlp5, with the standard or revised syntax.

Disclaimer: this is the first version of sedlex, and the syntax of lexer specifications might change in the future. This release is mostly intended to gather feedback from the community and also demonstrate the use of the -ppx technology. Let me know if we plan to use sedlex for a serious project.

Main contact: Alain Frisch.

Download

sedlex is available as an OPAM package (called, surprisingly, "sedlex").

Licensing

The package sedlex is released under the terms of an MIT-like license (see the LICENSE file in the package).

Requirements

sedlex relies on recent additions to OCaml. Currently, it will not work with any released version of OCaml. You will need to install a development version of OCaml from its SVN repository.

sedlex has no external dependency in addition to the OCaml toolchain.