Inlined records in constructors

I'd like to introduce a new language feature, inlined record arguments on constructors, which I propose for inclusion in OCaml. In a nutshell, it allows you to define a record directly within a sum type constructor:

1
2
3
  type t =
     | A of { x : int; y: string }
     | ...

The argument of the constructor is a full-fledged record type. All features of records are available: dot notation, mutable fields, polymorphic fields, punning syntax, record override. You can write:

1
2
3
4
5
6
7
8
9
10
11
  type t =
     A of {mutable x : int; mutable y : string; f: 'a. 'a -> 'a}
  |  B
 
  let f = function
    | A ({x; y; f} as r) -> r.x <- f x; r.y <- f y
    | B -> ()
 
  let g = function
    | A r -> A {r with y = ""}
    | z -> z

Advantages over using normal records

Compared to declaring explicitly a record type and using it as an argument of the constructor, the new form has two main advantages:

  • In terms of code readability: the new form is syntactically lighter and it brings the definition of fields closer to the constructor itself, which can make the type declaration easier to read (think of a long list of constructors, most of them requiring a record declaration).
  • In terms of performance, the new form uses the same "packed" representation as for n-ary constructors: the fields are stored directly in the block of the constructor. Thanks to mutable fields, this enables some more compact mutable data structures, which would otherwise require either unsafe features or extra indirections/allocations.

Advantages over using n-ary constructors

Now, compared to using normal n-ary constructors, using this new record syntax brings the following advantages:

  • Naming arguments is a good way to document their meaning.
  • Punning on record labels makes the code often as compact as with n-ary constructors, while encouraging to use uniform names to bind fields throughout the code base.
  • The record syntax is much lighter on patterns which ignore many arguments.
  • Extending the constructor with more fields does not break existing patterns (at least those of the form {...; _} when warning 9 is enabled).
  • The full power of records is available, in particular mutable fields.
  • Manipulating the constructor's argument as a first-class value lifts some restriction which seem weird to beginners (which are not helped by the pseudo-tuple syntax on n-ary constructors) and enable more code refactoring.

The inner record type

I've explained that the inner record declared together with a constructor is a full-fledged record type. This enables code such as:

1
2
3
4
5
  type t = A of { x:int; y:string } | B
 
  let f1 = function A r -> r | _ -> assert false
 
  let f2 r = A r

To give the type of those two functions, one needs to refer to the inner record type. The system provides a name for it automatically: t.A. The functions above have type t -> t.A and t.A -> t. The programmer can use t.A directly, in type annotations or in signatures.

Currently, it is not possible to use it in the type declaration itself:

1
2
3
  type t =
    | A of { x : int; y : string }
    | B of t.A list

This is because the inner type t.A is added to the environment at the same time as the constructor A itself, i.e. after the type declaration has been processed. I don't think that allowing this would be really useful, though.

Runtime representation

The type t.A behaves just like a normal (nominal) record type except that internally, it is marked so that its values are allocated directly with the tag for A, not 0 as for normal records. Thanks to it, the A constructor itself is just the identity at runtime.

Parametric types

Parametric types are supported:

1
2
3
  type 'a t =
    | Node of {mutable left: 'a t; mutable right: 'a t; value: 'a}
    | Empty

In such a declaration, the inner type t.Node needs to be parametric as well. The declaration is "morally" equivalent to:

1
2
3
4
5
  type 'a t =
    | Node of 'a t.Node
    | Empty
 
   and 'a t.Node = {...}

In general, the type parameters of the inner record type are the free variables in the record definition, in the order in which they appear syntactically (the first occurrence is considered for each variable). For instance, consider:

1
2
3
4
5
  type ('a, 'b) t =
    | A of {x : 'a}
    | B of {x : 'b}
    | C of {x : int}
    | D of {x : 'b; y : 'a}

In this example, the types t.A and t.B each have one type parameter; t.C has no type parameter; and t.D has two type parameters, which are "swapped" compared to those of t. We thus have:

1
2
# fun (D r) -> r;;
- : ('a, 'b) t -> ('b, 'a) t.D = <fun>

A previous version used the same type parameters for the inner record type as for the sum type, but there was no natural generalization of this for GADT constructors (see below), and record-constructors which don't use all the type parameters introduced "phantom" types which could be annoying.

GADT constructors

GADTs constructors are supported as well:

1
2
3
4
5
6
7
  type t =
    | A: {x : 'a; f : 'a -> unit} -> t
    | B
 
  type _ s =
    | A: {x : 'a; y : 'b} -> ('a * 'b) t
    | B

Again, as for regular constructors with records, the type parameters for t.A are the free variables which appear in the record definition (in order of apperance, the first occurrence of each variable being taking into account). The declarations above are morally equivalent to:

1
2
3
4
5
6
7
8
9
  type t =
    | A: 'a t.A -> t
    | B
  and 'a t.A = {x : 'a; f : 'a -> unit}
 
  type _ s =
    | A: ('a, 'b) s.A -> ('a * b) t
    | B
  and ('a, 'b) s.A = {x : 'a; y : 'b}

Exceptions

Exceptions declarations also benefit from the new record syntax:

1
     exception A of {x : int; mutable y : string}

The inner record type is called exn.A and value of that type directly get the runtime representation of the A exception constructor (so that the constructor itself is the identity at runtime). No extra indirection is involved, as would be the case with a normal record type.

The existing limitation that exception constructors cannot be GADT constructors has not been lifted (but this is an unrelated limitation).

Type-based selection

A recent feature of OCaml is that type information is used to select record labels when multiple record types define the same label names. Thanks to it, there is no problem having multiple constructors defining fields with the same name:

1
2
3
4
5
6
  type t =
    | A of {loc: loc; x: int; y: int}
    | B of {loc: loc; s: string}
 
  let loc = function
    | A {loc; _} | B {loc; _} -> loc

One actually depends crucially on that type-based selection feature, since the labels of the inlined record can only be used through it (technically, they are not inserted in the typing environment). This avoids bad situation where such labels would shadow the ones from "real" records. Most of the time, those inner labels are used in contexts where the type information is readily available, as in the loc function above.

Debate

The interested and patient reader can have a look at the long discussion on this feature on OCaml's bug tracker #5528: Inline records for constructor arguments . The interested but less patient one will have to hope that Gabriel Scherer produces a nice report on the whole discussion :-)

One point under discussion was whether one should only use the record syntax but not allow manipulating the inner records as first-class types and values. My opinion is that allowing this is sometimes useful, not very difficult from an implementation point of view (and perhaps simpler than disallowing it) and moreover, it avoids some bad properties of n-ary constructors with the pseudo-tuple syntax. The only "drawback" is that it requires to extend the notion of type identifier so that the user can write t.A; not a big deal in terms of language complexity, in my opinion. So yes, I'm strongly in favor of seeing inner record types as first-class and full-fledged ones.

Implementation

I've started to work on implementing this proposal. This is available in the branch constructors_with_record3 branch of the OCaml Subversion repository. A local checkout can be obtained with:

  svn co http://caml.inria.fr/svn/ocaml/branches/constructors_with_record3

and a simple way to get it through an OPAM switch should be available soon. Note that ocamldoc hasn't been extended yet, and it needs to disabled for compiling the branch (./configure -no-ocamldoc).

There is also an OPAM switch corresponding to the branch:

opam switch 4.02.0dev+record_constructors

The branch includes all the features mentioned in this post, but it needs some more testing, polishing and code review. Comments and opinions on the proposal are of course much welcome! (Probably to be sent to Mantis for technical points and here for more general comments.)

[Edit 2014-04-18] Specify type parameters of inline record types + their labels are not inserted in the environment.

[Edit 2014-04-22] Added the reference to the OPAM switch.

[Edit 2014-04-22] Referring to t.A from the declaration of t itself does not work (contrary to the previous, more syntactic implementation).

Couple of questions:Is this

Couple of questions: Is this allowed?
1
2
3
4
5
type t = 
| A 
| B of { foo : int }
 
type t2 = t.B = { foo : int } 
If yes, I assume t2 also has non-zero tag as t.B. Is this right? Can I do this
1
let v = { t.B.foo = 5 }

This requires extending syntax since t.B is not a module path.

In your example let f2 r = A r, where f2 has type t.A -> A, is f2 a no-op in runtime?

I like this proposal, I think it can be very useful.

1 type t2 = t.B = { foo : int

1
type t2 = t.B = { foo : int }
is not allowed, since this would allow to export t2 as a normal record type in an interface (but the runtime representation is different).
1
let v = { t.B.foo = 5 }
is not supported either, but you can write:
1
2
let v : t.B = { foo = 5 }
let v = ({ foo = 5 } : t.B)
and of course, simply:
1
let x = B {foo = 5}
Concerning your last question: indeed, wrapping an inlined record with the corresponding constructor is a no-op, and so is the reverse projection.

Typo

I think there is a typo (in « The inner record type »): A.t -> t.A

Fixed! Thanks.

Fixed!

Thanks.

Thank you

For this nice presentation of your proposal and for always providing with your ideas working codes!!

This language feature is great. For the type of the record I'm more in favor of using only the type variable of the constructor, not of the whole type because otherwise that can create phantom type too easily (here t.B is preferable to 'a t.B).

type 'a t =
| A of ... 'a ...
| B of {head = int; tail = int}

Done!

This is indeed a good argument, and it makes the system more regular between GADT and non-GADT constructors with record. I've adapted the implementation accordingly, and also updated the blog post.

It is true that it might become a little bit difficult for users to predict the type variables (although the criterion is quite well-defined), but at least for simple cases, they won't really have to write those types themselves.

Thanks for your feedback!