Implicit values

In this post, I'll not describe an existing extension to OCaml that we use at LexiFi, but rather some ideas for a future possible extension.

During last year's meeting of the Caml Consortium, I presented LexiFi's system for dynamic types (which I might describe in another post). Yaron Minsky, from Jane Street, suggested that there might be a more primitive system that could serve to implement dynamic types on top of it, and other type-driven code generation tricks as well.

Being able to implement our system of dynamic types on top of something more primitive would be quite nice. My colleague François Maurel and I gave some thought about this approach. It turns out that something similar to Scala's implicit parameters might be what we are looking for. Today I've created a new branch on OCaml's Subversion repository (http://caml.inria.fr/svn/ocaml/branches/implicits) to play with this idea.

There is a new expression, written _ (underscore), which tells the compiler to produce an implicit value of the type inferred for this expression. The compiler can pick such a value from a set of eligible values in the local environment. For this first experiment, the convention is that any value identifier starting with an underscore is eligible. Here is an example:

1
2
# let _x = 3 and _s = "XXX" in _ ^ string_of_int _;;
- : string = "XXX3"

The compiler can also use eligible functions from the environment in order to produce the requested implicit value. Example:

1
2
# let _x = 3 in let _f = string_of_int in (_ : string);;
- : string = "3"

Here, the compiler must produce a value of type string. It finds a function _f whose result type matches string, so it continues trying to build an argument (of type int) for this function, and it finds _x.

The compiler can also take instances of polymorphic functions in the environment:

1
2
3
4
# fun ((_i, _s) : int * string) ->
     let _lift_opt x = Some x in
     (_ : int option), (_ : string option);;
- : int * string -> int option * string option = <fun>

And values of tuple types can be built implicitly from their components:

1
2
# fun (_i : int) (_s : string) -> (_ : int * string);;
- : int -> string -> int * string = <fun>

The main difficulty is to define the resolution process precisely. (This is, of course, quite similar to type class resolution, and we face the same problems.) In the branch, the approach is simplistic:

  • The definition in the environment are traversed in an unspecified ordering; the first matching instance is returned.
  • The target type must be closed (no type variables). Note that it can contain newtypes (the locally abstract types introduced with the fun (type t) -> ... notation.
  • After instantiating a polymorphic function to match the desired target type, the type for the argument must be closed as well.
  • When the same type is encountered a second time during a recursive descent, this traversal branch fails. There is also a fixed limit to the recursion depth; otherwise, implicit functions of types such as 'a list -> 'a would create infinite loops.

Now, let us look at more interesting examples. First, let's define printers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
module Printers : sig
 
  type 'a t
  val apply: 'a t -> 'a -> string
  val make: ('a -> string) -> 'a t
 
  val _int: int t
  val _string: string t
  val _list: 'a t -> 'a list t
  val _tuple2: ('a t * 'b t) -> ('a * 'b) t
  val _tuple3: ('a t * 'b t * 'c t) -> ('a * 'b * 'c) t
 
end = struct
 
  type 'a t = ('a -> string)
  let apply f x = f x
  let make f = f
 
  let _int = string_of_int
  let _string = String.escaped
  let _list f l =
    Printf.sprintf "[%s]" (String.concat "; " (List.map f l))
  let _tuple2 (f1, f2) (x1, x2) =
    Printf.sprintf "(%s, %s)" (f1 x1) (f2 x2)
  let _tuple3 (f1, f2, f3) (x1, x2, x3) =
    Printf.sprintf "(%s, %s, %s)" (f1 x1) (f2 x2) (f3 x3)
 
end
 
 
type 'a my_type =
  | A of ('a * 'a) list
  | B
 
let _my_type (type a) (_f : a Printers.t) : a my_type Printers.t =
  Printers.make
    (function
      | A x -> Printf.sprintf "A(%s)" Printers.(apply _ x)
      | B -> "B"
    )
 
 
type 'a my_type =
  | A of ('a * 'a) list
  | B
 
let _my_type (type a) (_f : a Printers.t) : a my_type Printers.t =
  Printers.make
    (function
      | A x -> Printf.sprintf "A(%s)" Printers.(apply _ x)
      | B -> "B"
    )
 
let () =
  print_endline Printers.(apply _ [(A [(1,1)], "X"); (B, "Y")])
This produces:
1
[(A([(1, 1)]), X); (B, Y)]

One could imagine a syntax extension to generate automatically the definition for the my_type printer from the type declaration. Note the use of the local open construction Printers.(...) to bring all the _XXX values from the module Printers into the local scope and makes them eligible for resolution. The _ in the _my_type function is automatically replaced by:

1
 _list (_tuple2 (_f, _f))
and the _ on the last line is replaced by:
1
   _list (_tuple2 (_my_type _int, _string))
One can deal with recursive type definitions, including non-regular ones:
1
2
3
4
5
6
7
8
9
10
11
type 'a my_type =
  | A of ('a * 'a) list
  | B of ('a * 'a) my_type
 
let rec _my_type: 'a. 'a Printers.t -> 'a my_type Printers.t =
  fun (type a) (_f : a Printers.t) ->
    (Printers.make
       (function
         | A x -> Printf.sprintf "A(%s)" Printers.(apply _ x)
         | B x -> Printf.sprintf "B(%s)" Printers.(apply _ x)
       ) : a my_type Printers.t)
Type classes can be encoded directly:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
type 'a ops =
    {
     dump: ('a -> unit);
     succ: ('a -> 'a);
    }
 
let _ops_tuple2 (ops1, ops2) =
  {
   dump = (fun (x, y) -> ops1.dump x; ops2.dump y);
   succ = (fun (x, y) -> ops1.succ x, ops2.succ y);
  }
 
let _ops_int =
  {
   dump = (fun x -> print_endline (string_of_int x));
   succ;
  }
 
 
let f ops x =
  ops.dump (ops.succ (ops.succ x))
 
let () =
  f _ (3, 4)
Or even:
1
2
3
4
5
6
7
8
let dump ops x = ops.dump x
let succ ops x = ops.succ x
 
let f (type a) (_ops : a ops) (x : a) =
  dump _ (succ _ (succ _ x))
 
let () =
  f _ (3, 4)
Combined with GADTs, one could do something like (not tested!):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
module Ty = struct
  type 'a ty =
    | Int: int ty
    | String: string ty
    | List: 'a ty -> 'a list ty
    | Tuple2: ('a ty * 'b ty) -> ('a * 'b) ty
 
  let _int = Int
  let _string = String
  let _list t = List t
  let _tuple2 (t1, t2) = Tuple2 (t1, t2)
end
 
let rec print: 'a. 'a Ty.ty -> 'a -> unit =
  fun (type t) (t : t Ty.ty) x ->
      ....
 
let () = print _ (3, ["A";"B"])

And this begings to look very close to what we do with our dynamic type extensions.

Feel free to play with the branch!

Edit (2010-12-02): A follow-up post about implicit arguments.

Monads

This would be a fantastic addition to OCaml. But implicits would not have all the power of Haskell's type classes before higher kinds are added to the language, right? Only then could we model container structures like monads.

Shouldn't it be your editor's job ?

What I like in OCaml is that, in general, you know by reading a program what the program does. When I read your examples, I fill like I will never be able to understand this code at the first glance, and need hours to be sure that the _ is exactly the value I wanted it to be.

I think the problem you are trying to solve is the absence of good refactoring tools: what you would like is to type _ in your editor, and the editor would indeed immediatly replace _ by a value found using the type expected at that point, probably with an annotation telling the editor that it should always display an _ and, if the environment is modified, it should complain if the value is not anymore the one expected.

I am actually running in some similar problem, in the sense that I need to build a map function for values from a huge recursive type, so I was thinking that it would be nice to have some construct in the language for that (a generic map on any type), when actually, it would be much better to have a tool to generate _once_ a map function from the type, in the editor.

Shouldn't it be your editor's job ?

I think it's better and simpler to include support for type-driven resolution in the compiler itself. It's not so much a question of refactoring; you want to be able to do write code in any text editor. If you feel like you'd want to see more explicitly how a _ value is created, it would be easy to have the compiler output this information on request (and bind that to your editor).

Camlp4 has a convenient

Camlp4 has a convenient meta-programmatic generator of Map, Fold functions ( see http://brion.inria.fr/gallium/index.php/Camlp4MapGenerator ). You are free to use them to output the code once, and paste it in your code, though a compile-time generation won't need to be adapted to match any change in your type (hence increased maintainability).

Syntax

The extension is really interesting.

I think the syntax is terrible. _-beginning parameters are already quite overloaded:
- they are used we want to give a name to a non-relevant parameter, suppressing the "unusued variable" warning
- they are used by some people to suggest a "private" identifier that is exposed but should probably not be considered by the user; see for example http://martin.jambon.free.fr/tophide-readme.txt
- Camlp4 users also sometimes use __-identifier as a private namespace where name-conflicts are supposed not to occur; this is a bad practice imho, but still quite popular.

All those uses are subtly different, and adding more confusion to the mix certainly won't help. Debugging code that mixes different uses would be hell : variables that should be private or ignored would get picked for elaboration.

I think it would be much better to have an "implicit" keyword like in Scala.
- on the producer side side : `let implicit int = string_of_int`, `val implicit int : int t`
- potentially¹, on the consumer side `let f (implicit ops) ...`

¹ : your proposed extension does not use implicit annotations on the consumer side for now, but it is a natural extension, which is already present in your "consumer argument" post. Btw., why restrict implicit arguments to labelled parameters ?

Regarding the expression syntax, I mean the use of "_" in the expression syntax to mean "there you need to elaborate something", I think it's ok. It is already used as such in Coq, and I can't see any much better syntax anyway : it really needs to be light to type so packing more meaning is difficult.

I think a syntax for implicit parameters needs to be as explicit as possible, because it may have surprising result so a user using it needs to be sure what she's doing.

Syntax

I certainly agree that a more explicit syntax is desirable. But since this would require a lot more changes to the compiler, including maybe the creation of a new keyword (which is always problematic), I've preferred to start implementing something less intrusive for this first experiment.