Syntax extensions without Camlp4: let's do it!

In a previous post, I've described a proposal on how to get rid of Camlp4 to write simple syntax extensions. This proposal slowly becomes a reality:

  • compilerlibs are now installed in a standard way by OCaml, which give developers everything they need to write AST mappers;
  • the -ppx flag is now available in the trunk version of OCaml (likely to become OCaml 4.01).

The only missing part is to add to the OCaml syntax a number of generic extension points: attributes, quotations, and maybe a few others (for instance, I'm thinking about a "customizable" let-like construction). As for any syntax decision, this will probably require some long, difficult, and boring discussions, but I'm confident this can be achieved!

Nevertheless, even without new extension points, it is already possible to use the -ppx flag to write some interesting AST mappers.

I've started to write a generic AST mapper as a class, with one method per syntactic category or construction in the OCaml grammar. The idea is that a concrete AST mapper can inherit from this class and override some methods. This way, it benefits from the generic traversal, implemented once and for all. The current version of the generic class, which is not finished yet, can be found in the OCaml SVN repository ([edit 2012-01-28] in OCaml's trunk, Ast_mapper is now part of the compiler-libs; the code below has been adapted to the current version). Note that this generic class is just one way to traverse the AST. I expect people to come up with other kinds of helper functions; this is just user-land.

Here is the complete source code for a small "syntax extension":

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
(* This filter implements the following rewriting on module expressions:
 
   IFDEF(X)(<m1>)(<m2>)
               ---> <m1>      if the environment variable X is defined
               ---> <m2>      otherwise
 
   And, on expressions:
 
   GETENV X    ---> the string literal representing the compile-time value
                    of environment variable X
*)
 
open Ast_mapper
open Parsetree
open Longident
open Location
 
let getenv s = try Sys.getenv s with Not_found -> ""
 
let ifdef =
  object(this)
    inherit Ast_mapper.create as super
 
    method! module_expr = function
      | {pmod_desc = Pmod_apply(
         {pmod_desc = Pmod_apply(
          {pmod_desc = Pmod_apply(
           {pmod_desc = Pmod_ident {txt = Lident "IFDEF"}},
           {pmod_desc = Pmod_ident {txt = Lident sym}}
          )},
          body_def)}, body_not_def)} ->
            if getenv sym <> "" then this # module_expr body_def
           else this # module_expr body_not_def
 
      | {pmod_desc = Pmod_ident {txt = Lident "IFDEF"}; pmod_loc = loc} ->
          Format.printf "%a@.Improper use of IFDEF. The correct form is: IFDEF(<var_name:uident>)(<then:modtype>)(<body:modtype>)@."
            Location.print_loc loc;
          exit 2
      | x -> super # module_expr x
 
    method! expr = function
      | {pexp_desc = Pexp_construct (
         {txt = Lident "GETENV"},
         Some {pexp_loc = loc; pexp_desc = Pexp_construct ({txt = Lident sym}, None, _)}, _
        )} ->
          E.strconst ~loc (getenv sym)
      | x -> super # expr x
  end
 
let () = ifdef # main

This extension allows the programmer to use environment variables at compile time, either for conditional compilation:

1
2
3
4
5
6
include IFDEF(DEBUGMODE)(struct
  let debug = print_endline
end)
(struct
  let debug _ = ()
end)

or for using the compile-time value of an environment variable as a string literal:

1
2
3
4
let () =
  Printf.printf "compiled by user %s in directory %s\n%!"
    (GETENV USER)
    (GETENV PWD)

The code for the extension can be compiled by:

  ocamlc -o ifdef.exe ocamlcommon.cma ast_mapper.cmo ifdef.ml

and used by adding -ppx ./ifdef.exe to the compiler's command-line.

Let's compare the code for ifdef.ml with a similar feature implemented as a Camlp4 syntax extension. First, let's consider some drawbacks of our approach:

  • We need to reuse existing syntax: this can create conflicts with existing code that use IFDEF as a module name, or GETENV as a constructor name. With Camlp4, one could create new ad hoc syntax for the new constructions, to avoid such conflicts.
  • The patterns used to detect specific constructions to be rewritten in the AST are rather large, and somehow less pretty than Camlp4's quotations.

The second drawback can be addressed to some extent if we tell the OCaml compiler to focus automatically on the interesting fields like pmod_desc: see this proposal. Note that the proposal is not really specific to "syntax extensions". A lot of code in the OCaml itself compiler, in LexiFi's codebase, and in other projects would benefit from such a proposal as well (or from a deeper and less ad hoc solution). I think this is a nice consequence of avoiding Camlp4: instead of growing new concepts like quotations, and suffer from their complexity, we are encouraged to think about how to improve OCaml itself, which is good. In particular, OCaml is supposed to shine at symbolic manipulation; the fact that writing simple AST transformers require cumbersome patterns deserves some attention at the level of the base language!

Now, some advantages of this approach compared to Camlp4:

  • It piggy-backs existing syntax. This ensures that syntactic tools, like your favorite IDE or text editor, are not confused by the extension.
  • Even with the "cumbersome" pattern, the code is not too complex, and one can argue that any decent programmer, with some understanding of the OCaml AST, is able to read it and understand precisely what it does. No need to understand new concepts (quotations, anti-quotations), to learn new syntax (for extending the OCaml grammar) or how a complex parsing engine works. The control of locations is explicit.
  • No dependency on a complex code base. The extension is built directly on the core OCaml compiler, plus a tiny re-usable component (Ast_mapper), written completely in "user-land". No complex bootstrapping required if a new feature is introduced to OCaml! If you maintain your own version of OCaml, with local extensions (like we do at LexiFi), you'll certainly appreciate that!

All in all, I feel such an approach is much more future-proof than relying on a complex beast like Camlp4. I hope the OCaml community will agree and progressively adopt this more direct approach.