OCaml extensions at LexiFi: semi-implicit laziness

[Edit 2014-06-05]: We have finally decided to drop our local extension for lazy record fields. Our experience is that it was just too dangerous to forget that a field is lazy and force it inadvertently. So we first restricted the feature to very limited situations and finally decided to drop it, both to be on the safe side and to reduce our diff with OCaml. The extension for the lazy let is much more widely used in our code base and we are quite happy with it. Some interesting programming patterns, esp. for GUI definitions, would not be realistic without it, and this feature is much more local than lazy record fields, and thus a lot less dangerous. [/Edit]

Being lazy is sometimes good, especially if you don't have to work too hard to be lazy.

In programming languages, laziness refers to an evaluation strategy where a computation is performed only when its result is actually needed (for the first time). OCaml makes the introduction of laziness explicit. If e is an expression of type 'a, one can write lazy e to produce immediately a value of type 'a Lazy.t without evaluating e. One can then force the computation by applying the function Lazy.force to the lazy value. The result is then memoized for later use: the expression e is evaluated at most once.

OCaml has actually quite a good support for laziness. For instance, the runtime system uses a custom representation to avoid some of the overhead associated with laziness. Since version 3.11, OCaml has a way to force lazy values within a pattern, so that Lazy.force could actually be implemented as (fun (lazy x) -> x). This makes it easier to use laziness in data structures.

At LexiFi, we have adapted the compiler to make it even easier to work with laziness. I'll describe in this post two extensions, which are very easy to add to the compiler. They are purely cosmetic, but they cannot be implemented as syntax extensions. I think they encourage to use laziness in situations where it is useful and thus create some useful idioms. They need to be used with caution, though; they make some uses of laziness more implicit and lightweight, but one should not forget that laziness is involved.

Lazy record fields

By default, OCaml data structures are immutable. Mutability is introduced explicitly by marking record fields as mutable. We apply the same approach to laziness:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
type t =
  {
    x: int;
    lazy s: string;
  }
 
let make x = {x; s = string_of_int x}
 
let get_s_1 r = r.s
let get_s_2 {s} = s
let get_s_3 = function
  | {s = "0"} -> 0
  | {s = "1"} -> 1
  | _ -> max_int
 
let get_lazy_1 r = lazy r.s
let get_lazy_2 {lazy s} = s

The field s is marked lazy. The effect is that the expression for this field (in a record literal expression) is implicitly wrapped in a lazy construction. In the example above, the function string_of_int is not evaluated when the record is built, but only if/when its field s is forced. Forcing can be done with the usual dot notation (as in get_s_1) or by matching the field (as in get_s_2 or get_s_3). It is also possible to extract the underlying lazy thunk, either by using the lazy keyword on the field pattern (get_lazy_2), or by wrapping the field projection with an explicit lazy (we have a special optimization that extracts the lazy thunk instead of creating a new one in that case).

The code above is thus equivalent to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
type t =
  {
    x: int;
    s: string Lazy.t;
  }
 
let make x = {x; s = lazy (string_of_int x)}
 
let get_s_1 r = Lazy.force r.s
let get_s_2 {s} = Lazy.force s
let get_s_3 = function
  | {s = lazy "0"} -> 0
  | {s = lazy "1"} -> 1
  | _ -> max_int
 
let get_lazy_1 r = r.s
let get_lazy_2 {s} = s

This idea of declaring some record fields lazy is not new! It was actually how laziness was implemented in early versions of Caml. See the Section 5 of the INRIA technical report 137 by Michel Mauny: http://www.mauny.net/data/papers/mauny-1992a.pdf. Note that we have only implemented lazy record fields at LexiFi, not (yet?) lazy constructor arguments.

A typical use of lazy record fields in our code base is to bundle a value with some derived data that can be expensive to compute. This is a lightweight approach to implementing memoization/cache for unary functions. Here is an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
type contract =
  {
    name: string;
    definition: contract_definition;
    lazy pending_options: pending_option list;
    lazy pending_fixings: pending_fixing list;
  }
 
let mk_contract name definition =
  {
    name;
    definition;
    pending_options = compute_pending_options definition;
    pending_fixings = compute_pending_fixings definition;
  }
 
let dump_fixings ppf {name; pending_fixings} =
  Format.fprintf ppf "Pending fixings for contract %s: %a@."
    name
    (pp_fixings pending_fixings)

The functions compute_pending_* can take some time, so it is a good idea to avoid applying them when not needed and also to cache their result.

Lazy let-bindings

We apply the same treatment to local bindings as to record fields. Here is an artifical example:

1
2
3
4
5
let foobar x y =
  lazy let s = try int_of_string x with _ -> failwith "foobar: x is not an integer" in
  if y > 0 then s + y
  else if y = 0 then s * s
  else 0

Note the lazy keyword on the let binding. The code is equivalent to:

1
2
3
4
5
let foobar x y =
  let s = lazy (try int_of_string x with _ -> failwith "foobar: x is not an integer") in
  if y > 0 then Lazy.force s + y
  else if y = 0 then Lazy.force s * Lazy.force s
  else 0

The rule is simple: when a let-binding is marked lazy, the bound expression is implicitly wrapped in a lazy construction, and any occurence of the bound variable forces the lazy thunk. This also works for multiple and/or recursive bindings. All the binding patterns must be simple variables.

A similar system is used in F#; see this paper by Don Syme at the ML 2005 workshop: http://research.microsoft.com/apps/pubs/default.aspx?id=79951. In our extension, however, lazy let-bindings are marked with an explicit keyword, they don't need to be recursive, and we don't force bound value immediatly after the binding.

A well-known fact, made explicit in Don's paper with a lot of good examples, is that that laziness allows to compile some useful recursive definitions that would otherwise require explicit mutation and/or less abstract APIs. A good illustration for that is the use of lazy recursive bindings to define GUI components. Typical GUI APIs forces the programmer to name invidiual GUI widgets and attach callbacks to react to GUI events after their creation. Relying on laziness allows for a more functional API, where callbacks are passed directly to the widget constructors. Here is an example code using such an imaginary API:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
lazy let rec button1 =
  button ~click:(fun () -> button2 # disable) "Button1"
and button2 =
  button ~click:(fun () -> button1 # disable) "Button2"
and my_form =
  form ~title:"Dialog example"
   (hbox
     [
       button1;
       button2;
       button ~click:(fun () -> my_form # close) "Close";
     ]
   )
in
my_form # run_modal

Imagine the desugared version of it, with explicit introduction of laziness and explicit Lazy.force all over the place. It's quite ugly, isn't it? The simple extension for lazy let-bindings makes nicer and more functional GUI APIs practical.

Our two extensions related to laziness play well together. Our example of using lazy record fields as memoization slots could be written as:

1
2
3
4
5
6
7
8
9
let mk_contract name definition =
  lazy let pending_options = ...
  and pending_fixings = ... in
  {
    name;
    definition;
    pending_options;
    pending_fixings;
  }

By the way, a further extension could be to treat simple patterns on lazy record fields in a special way. Instead of forcing the value, they could produce the equivalent of a lazy local binding. For instance:

1
2
3
let foo {name; pending_options} =
  if name = "" then [] else pending_options
(* pending_options is not forced when name = "" *)

I don't really know how useful this would be in practice.

Interesting! Typos: in

Interesting!
Typos:
in get_s_2 or check should be in get_s_2 or get_s_3
lazy let s = try int_of_string x should be lazy let x = try int_of_string s

Thanks!

Thanks!

Nice

I agree with Anil --- this seems like a great extension. I normally tend to be on the side of things being more explicit, but this change seems just right; the amount of explicitness you give up is well compensated for by the much lighter syntax.

Given that explicit record-level laziness was once part of caml, do you know why it didn't make it into OCaml?

In any case, it seems like a great patch to submit upstream.

ORM laziness

Great post!

Another use-case where this is really useful would be in our ORM database syntax extension. The generated code queries a database iteratively, sometimes to construct recursive records for cyclic types.

This is very hard to do without laziness or (as we currently do) Obj.magic. Having lazy record support in there would be a great solution. Are you planning to submit your patch upstream? :)

ORM laziness

Interesting that you mention the use of lazy fields for ORM. We have developed a library which seems quite close to your ORM project (based on our dynamic types instead of a syntax extension). This library has a special treatment for lazy record fields. By default, query functions (which generates SQL queries) only retrieve non-lazy fields. The lazy fields are extracted from the database only when needed (it is also possible to ask the query functions to retrieve all fields immediately to avoid extra queries, when we know in advance that those fields will be needed). We also have caches, versioning of DB objects, and notifications of updates; this makes a nice programming model for databases.