Dynamic types

Thanks to an ongoing work by Jacques Le Normand, there is some hope to see Generalized Algebraic Datatypes (GADTs) in OCaml soon, so it's time to get familiar with them! In this post, I'll show how to use this new feature to represent OCaml types at runtime and define generic type-driven operations. All the examples in this post can be compiled with the current version of the GADT branch in the OCaml Subversion repository (http://caml.inria.fr/svn/ocaml/branches/gadts).

Types as runtime values

We want to define a type 'a ty so that a value of type 'a ty is a runtime representation of the type 'a. We can start with a few cases:

1
2
3
4
5
type 'a ty =
  | Int: int ty
  | String: string ty
  | List: 'a ty -> 'a list ty
  | Pair: ('a ty * 'b ty) -> ('a * 'b) ty

The power of GADTs is that when we pattern-match a value of type 'a ty and find, for instance, that the constructor is Int, we learn that within the current branch, the type variable 'a is actually equals to int.

With this representation of types at runtime, we can do some generic programming. As an example, let's create a universal untyped representation of OCaml values and functions to operate on them:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
type variant =
  | VInt of int
  | VString of string
  | VList of variant list
  | VPair of variant * variant
 
let rec variantize: type t. t ty -> t -> variant =
  fun ty x ->
    (* type t is abstract here *)
    match ty with
    | Int -> VInt x  (* in this branch: t = int *)
    | String -> VString x (* t = string *)
    | List ty1 ->
        VList (List.map (variantize ty1) x)
        (* t = 'a list for some 'a *)
    | Pair (ty1, ty2) ->
        VPair (variantize ty1 (fst x), variantize ty2 (snd x))
        (* t = ('a, 'b) for some 'a and 'b *)
 
exception VariantMismatch
 
let rec devariantize: type t. t ty -> variant -> t =
  fun ty v ->
    match ty, v with
    | Int, VInt x -> x
    | String, VString x -> x
    | List ty1, VList vl ->
        List.map (devariantize ty1) vl
    | Pair (ty1, ty2), VPair (x1, x2) ->
        (devariantize ty1 x1, devariantize ty2 x2)
    | _ -> raise VariantMismatch

To transform a value to a variant, we can write: variantize (List Int) [10; 20]. With something like what I described in my recent Implicit values and Implicit arguments posts, we could even let the compiler build the type representation List Int for us, and simply write variantize [10; 20].

Dealing with records

Our current definition of 'a ty supports only a limited number of built-in types. We could trivially extend it to support more basic types, arrow types and tuples of arity n for a fixed n. Dealing with tuples of arbitrary arity and with user-defined record and sum types is more tricky. Let's build a solution for records. First, we define a type to represent a field of type 'b in a record of type 'a:

1
2
3
4
5
6
type ('a, 'b) field =
    {
     label: string;
     field_type: 'b ty;
     get: ('a -> 'b);
    }

For a given field, we keep its label, the runtime representation of its type, and a function to extract it from a record value. We can extend the type 'a ty with a new constructor to represent records:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
type 'a ty =
  | ...
  | Record: 'a record -> 'a ty
 
and 'a record =
    {
     path: string;
     fields: 'a field_ list;
    }
 
and 'a field_ =
  | Field: ('a, 'b) field -> 'a field_
 
and ('a, 'b) field = ...

A record type is represented by a string (intended to be its fully qualified path) and the list of fields. Since all fields can have different types, the second type variable 'b must be quantified existantially, and we can use GADTs again to do that.

Note that we have lost a nice property of the original version of 'a ty. It was the case that a value of type int ty could only be Int; that a value of type 'a list ty could only be of the form List .. and so on. The compiler is clever enough to make use of this information and understand that a pattern matching with a single branch Int -> ... is exhaustive when the matched expression has type int ty. Now this property is lost because there is no constraint on the result type for the Record constructor. This is not going to be a problem for what we want to do here.

The variantize function can be extended with a new branch for records:

1
2
3
4
5
6
7
8
9
10
11
12
type variant =
  | ...
  | VRecord of (string * variant) list
 
let rec variantize: type t. t ty -> t -> variant =
  fun ty x ->
    match ty with
    | ...
    | Record {fields} ->
        VRecord
          (List.map (fun (Field{field_type; label; get}) ->
                       (label, variantize field_type (get x))) fields)

Our current representation of record fields allows us to extract a field from a record value, but not to build a new record value from its fields. If we want to extend devariantize with records, we need to add this capability to our representation of records. We will say to each record type comes with a notion of builder: a builder holds a partial version of a record value that we are currently building. A record type representation must be able to create a fresh builder and to produce a record value from a fully filled builder; the representation for each field must be able to fill the corresponding slot of the builder.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
type 'a ty =
  | Int: int ty
  | String: string ty
  | List: 'a ty -> 'a list ty
  | Pair: ('a ty * 'b ty) -> ('a * 'b) ty
  | Record: ('a, 'builder) record -> 'a ty
 
and ('a, 'builder) record =
    {
     path: string;
     fields: ('a, 'builder) field list;
     create_builder: (unit -> 'builder);
     of_builder: ('builder -> 'a);
    }
 
and ('a, 'builder) field =
  | Field: ('a, 'builder, 'b) field_ -> ('a, 'builder) field
 
and ('a, 'builder, 'b) field_ =
  {
   label: string;
   field_type: 'b ty;
   get: ('a -> 'b);
   set: ('builder -> 'b -> unit);
  }
 
let rec devariantize: type t. t ty -> variant -> t =
  fun ty v ->
    match ty, v with
    | ...
    | Record {fields; create_builder; of_builder}, VRecord fl ->
        if List.length fields <> List.length fl then raise VariantMismatch;
        let builder = create_builder () in
        List.iter2
          (fun (Field {label; field_type; set}) (lab, v) ->
            if label <> lab then raise VariantMismatch;
            set builder (devariantize field_type v)
          )
          fields fl;
        of_builder builder
    | _ -> raise VariantMismatch

Now of course, for each record type declaration for which we will want to use a runtime representation, we need to create such a representation. Here is an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
type my_record  =
    {
     a: int;
     b: string list;
    }
 
let my_record =
  let fields =
    [
     Field {label = "a"; field_type = Int;
            get = (fun {a} -> a);
            set = (fun (r, _) x -> r := Some x)};
     Field {label = "b"; field_type = List String;
            get = (fun {b} -> b);
            set = (fun (_, r) x -> r := Some x)};
    ]
  in
  let create_builder () = (ref None, ref None) in
  let of_builder (a, b) =
    match !a, !b with
    | Some a, Some b -> {a; b}
    | _ -> failwith "Some fields are missing in record of type my_record"
  in
  Record {path = "My_module.my_record"; fields; create_builder; of_builder}

We implement builders as tuples of references that can hold optional values (one such slot for each field in the record). The code to build the representation of my_record is quite tedious to write by hand; luckily, it would be quite simple to have a syntax extension that generates it automatically from the type declaration. This would be even simpler to write if the expressions to build the representation of the field types (Int and List String in this example) could be inferred by the compiler. And this is precisely what the implicit values thing is all about!

Dealing with tuples of arbitrary arity can be done with the same approach as for records (only without labels). Choosing a way to represent sum types is not more difficult. I'll leave it as an exercise to the reader!

Type-level operations

Once we have a runtime representation for types, we can define new type-level operations. For instance, we can invent a new notion of subtyping that allows to forget some record fields, is covariant on pairs, lists and record fields, and converts integers to strings. Here is an implementation of a function which checks this subtyping relation and creates a witness for subtyping (a function from the subtype to the supertype).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
exception NotASubtype
 
let rec subtype: type t s. t ty -> s ty -> (t -> s) =
  fun t s ->
    match t, s with
    | Int, Int -> (fun x -> x)
    | String, String -> (fun x -> x)
    | Int, String -> string_of_int
    | List t1, List s1 -> let sub = subtype t1 s1 in List.map sub
    | Pair (t1, t2), Pair (s1, s2) ->
        let sub1 = subtype t1 s1
        and sub2 = subtype t2 s2 in
        (fun (x1, x2) -> sub1 x1, sub2 x2)
    | Record r1, Record r2 ->
        (* for each field in r2, find a matching field in r1 *)
        let fields =
          List.map
            (fun (Field {label; field_type = s1; set}) ->
              let (Field {field_type = t1; get}) =
                try List.find (fun (Field f) -> f.label = label) r1.fields
                with Not_found -> raise NotASubtype
              in
              let sub = subtype t1 s1 in
              (fun x b -> set b (sub (get x)))
            )
            r2.fields
        in
        (fun x ->
          let b = r2.create_builder () in
          List.iter (fun f -> f x b) fields;
          r2.of_builder b
        )
    | _ -> raise NotASubtype
 
 
let subtype t s =
  try Some (subtype t s)
  with NotASubtype -> None

The branch for records is a little bit complex, but it follows closely the definition: a record type t is a subtype of another record s if we can find for every field in s a matching field in t with the same label and whose type is a subtype.

Of course, the choice of a representation depends on the operations we want to implement. If we had to implement, for instance, a function which traverses a value to set to None all optional mutable fields, we would need to use a richer representation (which exposes mutability and functions to change the value of mutable fields in place).