Syntax extensions without Camlp4

Alain Frisch 2011-04-20

In this post, I’d like to propose a new approach to extending the syntax of OCaml that does not depend on Camlp4.

This proposal is the result of discussions with Xavier Clerc, Fabrice Le Fessant, Maxence Guesdon, and has been presented to the latest meeting of the Caml Consortium. It was also quickly mentioned by Xavier Leroy during his talk at the latest OCaml Users Meeting.

Camlp4 makes it possible to alter the concrete grammar recognized by the OCaml parser. New constructions can be added to the language provided that they can be mapped to the standard language in a purely syntactic way. Camlp4 is a complex beast. It implements its own unique parsing technology, it defines its own interpretation of the OCaml Abstract Syntax Tree (AST), and it allows code generators to use the concrete syntax of OCaml. These features might be needed or wanted in some cases, but I believe that a lot of existing syntax extensions, currently implemented with Camlp4, don’t really need any of these complex things.

The proposal can be explained in two points:

Extend the OCaml syntax once and for all with attributes. Syntax extensions are simply functions mapping an AST to another AST.

Attributes

What is an attribute? Just some new syntax to allow generic data to be attached to various kinds of nodes in the AST: expressions, type declarations, type expressions, methods, … Somehow like Java annotations http://download.oracle.com/javase/tutorial/java/javaOO/annotations.html, or C# attributes (http://msdn.microsoft.com/en-us/library/aa288059.aspx, http://msdn.microsoft.com/en-us/library/z0w1kczw.aspx). One should decide what kind of data an attribute defines and what is the concrete syntax for attributes. I don’t have a concrete proposal here, but for the sake of the discussion, let’s assume that attributes share the same syntax as OCaml expressions and that they can be attached to a node in the AST with a @@ syntax. For instance, a type declaration with some attributes could be written:

 type my_type @@ [with_xml] =
     | A of int
     | B of string @@ {xml=base64}
     | C @@ {xml=ignore}

Note that [with_xml], {xml=base64} and {xml=ignore} are three syntactically valid OCaml expressions. They are here used as attributes, attached respectively to the type declaration, the argument of the B constructor and the C constructor itself.

Imagine that the OCaml parser is extended to accept such attributes. I mean, the real OCaml parser, the one implemented in parsing/parser.mly that produces AST described in parsing/parsetree.mli. This is quite easy to do. Just add some constructors or record fields to the definition of AST types and a few new rules in the parser itself. Once and for all. For now, one can assume that the OCaml type-checker simply ignores attributes (although one could also argue that it should propagate them to the typed tree, but this is another story).

Preprocessors = AST-transforming functions

The OCaml compiler has an option -pp to plug an external preprocessor. A preprocessor is an arbitrary command which will be called by the compiler; it receives the name of the source file as a command-line argument and it is supposed to dump its result to its standard output, either as source code in text form, or as a marshaled version of the OCaml AST. I propose to add a new command-line option -ppx. It also comes with an external command to be executed, but this command does not take a file name as argument. Instead, it receives an already parsed OCaml AST (as a marshaled value) and it must return an OCaml AST as well. Several -ppx commands can be used at the same time. The compiler parses the source file itself and then pipes it through each external commands specified with -ppx. Of course, the OCaml AST and parser have been extended to recognise attributes.

A typical use of Camlp4 is to generate some code based on type declarations. As an example, consider a syntax extension that generates XML parsers and pretty-printers from type declarations. This could be implemented quite simply as a -ppx preprocessor. Basically, the task is to write an OCaml function of type Parsetree.structure -> Parsetree.structure (one must of course give access to parsetree.cmi in order to compile this piece of code). The function would simply traverse the AST, detect some specific attributes that trigger or control its code generation (like the attributes in the example above) and produce new nodes in the AST. There is also some driver code to read a marshaled value from the standard input, convert it to a value of type Parsetree.structure, call the function above, and send a marshaled version of the resulting Parsetree.structure to the standard output. Probably a few lines of code to be robust (check magic numbers, etc). These lines and the traversal code is quite generic; I’m sure people will quickly define some reusable routines (as functions or classes).

And that’s all.

I lied a little bit. The “XML parser and pretty-printer code generator” above does not really extend any syntax. It simply interprets some attributes already parsed but otherwise ignored by the regular OCaml compiler. Still, for many syntax extensions currently implemented with Camlp4, this would be good enough. And the ones that generate code from type declarations tend to be amongst the most interesting ones, in my opinion.

For a decent OCaml developper, reading parsetree.mli (from the OCaml distribution) and becoming familiar with the OCaml AST should be quite easy. An order of magnitude easier than learning Camp4 anyway. No need to learn a new parsing technology and other advanced concepts either.

Quotations

A long time ago, before the local-open was available in OCaml, I wrote a tiny Camlp4 extension that would simulate this feature. The extension would rewrite let open M in e into something like:

let module XXX = struct open M let r = e end in XXX.r

How would such an extension be implemented using the proposal above? Well, one simply needs to encode the new construction with attributes. For instance, instead of let open M in e, one could write (() @@ open_in M) e, using an attribute on an expression node. One cannot use open because it is a keyword and the syntax looks quite ugly. Here is an extra little proposal. In addition to extending the OCaml syntax with attributes, one could also extend various kinds of syntactic items with a new “quotation” construction. Again, the syntax needs to be decided. For instance, let’s use double curly braces. So the OCaml AST and parser would accept a new kind of expression (and other items), written {{…}}. The … is not parsed by OCaml (one simply needs to agree on lexical conventions to detect its end), and it is represented simply as a string in the AST.

Contrary to attributes, the type-checker has no option but to reject quotations. But an external preprocessor could detect quotations and eliminate them (and part of their context). For instance, a preprocessor could detect {{open M}}; e and do the rewriting as above. Not as nice as a full fledged syntax extension, but maybe good enough.

Conclusion

A lot of questions still need to be addressed before this proposal could even be considered for inclusion into OCaml: what exactly are attributes (OCaml expressions, s-exp, …), how they are attached to syntactic components and which syntactic categories are supported, whether quotations should be introduced in addition to attributes. The order in which preprocessors are applied is also an issue (one could imagine iterating until a fix point is reached).

Still, I wanted to describe these preliminaries ideas. Any feedback will be appreciated!

LexiFi • 892 rue Yves Kermen • F-92100 Boulogne-Billancourt • France