OCaml extensions at LexiFi, semi-implicit laziness.

Alain Frisch

Being lazy is sometimes good, especially if you don’t have to work too hard to be lazy.

[Edit 2014-06-05]: We have finally decided to drop our local extension for lazy record fields. Our experience is that it was just too dangerous to forget that a field is lazy and force it inadvertently. So we first restricted the feature to very limited situations and finally decided to drop it, both to be on the safe side and to reduce our diff with OCaml. The extension for the lazy let is much more widely used in our code base and we are quite happy with it. Some interesting programming patterns, esp. for GUI definitions, would not be realistic without it, and this feature is much more local than lazy record fields, and thus a lot less dangerous. [/Edit]

In programming languages, laziness refers to an evaluation strategy where a computation is performed only when its result is actually needed (for the first time). OCaml makes the introduction of laziness explicit. If e is an expression of type 'a, one can write lazy e to produce immediately a value of type 'a Lazy.t without evaluating e. One can then force the computation by applying the function Lazy.force to the lazy value. The result is then memoized for later use: the expression e is evaluated at most once.

OCaml has actually quite a good support for laziness. For instance, the runtime system uses a custom representation to avoid some of the overhead associated with laziness. Since version 3.11, OCaml has a way to force lazy values within a pattern, so that Lazy.force could actually be implemented as (fun (lazy x) -> x). This makes it easier to use laziness in data structures.

At LexiFi, we have adapted the compiler to make it even easier to work with laziness. I’ll describe in this post two extensions, which are very easy to add to the compiler. They are purely cosmetic, but they cannot be implemented as syntax extensions. I think they encourage to use laziness in situations where it is useful and thus create some useful idioms. They need to be used with caution, though; they make some uses of laziness more implicit and lightweight, but one should not forget that laziness is involved.

Lazy record fields

By default, OCaml data structures are immutable. Mutability is introduced explicitly by marking record fields as mutable. We apply the same approach to laziness:

type t =
  {
    x: int;
    lazy s: string;
  }

let make x = {x; s = string_of_int x}

let get_s_1 r = r.s
let get_s_2 {s} = s
let get_s_3 = function
  | {s = "0"} -> 0
  | {s = "1"} -> 1
  | _ -> max_int

let get_lazy_1 r = lazy r.s
let get_lazy_2 {lazy s} = s

The field s is marked lazy. The effect is that the expression for this field (in a record literal expression) is implicitly wrapped in a lazy construction. In the example above, the function string_of_int is not evaluated when the record is built, but only if/when its field s is forced. Forcing can be done with the usual dot notation (as in get_s_1) or by matching the field (as in get_s_2 or get_s_3). It is also possible to extract the underlying lazy thunk, either by using the lazy keyword on the field pattern (get_lazy_2), or by wrapping the field projection with an explicit lazy (we have a special optimisation that extracts the lazy thunk instead of creating a new one in that case).

The code above is thus equivalent to:

type t =
  {
    x: int;
    s: string Lazy.t;
  }

let make x = {x; s = lazy (string_of_int x)}

let get_s_1 r = Lazy.force r.s
let get_s_2 {s} = Lazy.force s
let get_s_3 = function
  | {s = lazy "0"} -> 0
  | {s = lazy "1"} -> 1
  | _ -> max_int

let get_lazy_1 r = r.s
let get_lazy_2 {s} = s

This idea of declaring some record fields lazy is not new! It was actually how laziness was implemented in early versions of Caml. See the Section 5 of the INRIA technical report 137 by Michel Mauny: http://www.mauny.net/data/papers/mauny-1992a.pdf. Note that we have only implemented lazy record fields at LexiFi, not (yet?) lazy constructor arguments.

A typical use of lazy record fields in our code base is to bundle a value with some derived data that can be expensive to compute. This is a lightweight approach to implementing memoization/cache for unary functions. Here is an example:

type contract =
{
  name: string;
  definition: contract_definition;
  lazy pending_options: pending_option list;
  lazy pending_fixings: pending_fixing list;
}

let mk_contract name definition =
{
  name;
  definition;
  pending_options = compute_pending_options definition;
  pending_fixings = compute_pending_fixings definition;
}

let dump_fixings ppf {name; pending_fixings} =
  Format.fprintf ppf "Pending fixings for contract %s: %a@."
    name
    (pp_fixings pending_fixings)

The functions compute_pending_* can take some time, so it is a good idea to avoid applying them when not needed and also to cache their result.

Lazy let-bindings

We apply the same treatment to local bindings as to record fields. Here is an artificial example:

let foobar x y =
  lazy let s = try int_of_string x with _ -> failwith "foobar: x is not an integer" in
  if y > 0 then s + y
  else if y = 0 then s * s
  else 0

Note the lazy keyword on the let binding. The code is equivalent to:

let foobar x y =
  let s = lazy (try int_of_string x with _ -> failwith "foobar: x is not an integer") in
  if y > 0 then Lazy.force s + y
  else if y = 0 then Lazy.force s * Lazy.force s
  else 0

The rule is simple: when a let-binding is marked lazy, the bound expression is implicitly wrapped in a lazy construction, and any occurrence of the bound variable forces the lazy thunk. This also works for multiple and/or recursive bindings. All the binding patterns must be simple variables.

A similar system is used in F#; see this paper by Don Syme at the ML 2005 workshop: http://research.microsoft.com/apps/pubs/default.aspx?id=79951. In our extension, however, lazy let-bindings are marked with an explicit keyword, they don’t need to be recursive, and we don’t force bound value immediately after the binding.

A well-known fact, made explicit in Don’s paper with a lot of good examples, is that that laziness allows to compile some useful recursive definitions that would otherwise require explicit mutation and/or less abstract APIs. A good illustration for that is the use of lazy recursive bindings to define GUI components. Typical GUI APIs forces the programmer to name individual GUI widgets and attach callbacks to react to GUI events after their creation. Relying on laziness allows for a more functional API, where callbacks are passed directly to the widget constructors. Here is an example code using such an imaginary API:

lazy let rec button1 =
  button ~click:(fun () -> button2 # disable) "Button1"
and button2 =
  button ~click:(fun () -> button1 # disable) "Button2"
and my_form =
  form ~title:"Dialog example"
    (hbox
       [
         button1;
         button2;
         button ~click:(fun () -> my_form # close) "Close";
       ]
    )
in
my_form # run_modal

Imagine the desugared version of it, with explicit introduction of laziness and explicit Lazy.force all over the place. It’s quite ugly, isn’t it? The simple extension for lazy let-bindings makes nicer and more functional GUI APIs practical.

Our two extensions related to laziness play well together. Our example of using lazy record fields as memorization slots could be written as:

let mk_contract name definition =
  lazy let pending_options = ...
  and pending_fixings = ... in
  {
    name;
    definition;
    pending_options;
    pending_fixings;
  }

By the way, a further extension could be to treat simple patterns on lazy record fields in a special way. Instead of forcing the value, they could produce the equivalent of a lazy local binding. For instance:

let foo {name; pending_options} =
  if name = "" then [] else pending_options
  (* pending_options is not forced when name = "" *)

I don’t really know how useful this would be in practice.