References with physical equality.

Alain Frisch

In OCaml, generic functions like comparison and hashing are quite useful but not very flexible. In this post, I’ll explain some tricks that might be useful when you want to tweak the behavior of these generic functions for your own data structures.

References with physical equality

The ‘a ref type is pre-defined as:

 type 'a ref = { mutable contents: 'a }

As for any record type, references are thus compared structurally. Two references are equal is their current content are equal. Similarly, the hash value (computed by Hashtbl.hash) depends on the current content of the reference. This is sometimes not the desired behavior, and one would rather want to see the references as boxes with their own identity: equality should be the physical equality, and ordering/hashing can be arbitrary but stable when the content of the references change. How can we define such references?

The first solution is to rely on objects. Indeed, objects are given a unique identifier (an integer) when they are created and this integer is used to create a total ordering between objects and to compute a hash value. (The unique id can be obtained with the function Oo.id.) This is exactly the behavior we want, and we can thus build physical references on top of objects:

module PhysicalRef : sig
  type 'a t
  val create: 'a -> 'a t
  val set: 'a t -> 'a -> unit
  val get: 'a t -> 'a
end = struct
  class ['a] t x0 = object
    val mutable contents : 'a = x0
    method get = contents
    method set x = contents <- x
  end

  let create x0 = new t x0
  let set r x = r # set x
  let get r = r # get
end

The structural and physical equalities coincide for values of type ‘a PhysicalReferences.t (unless you marshal/unmarshal them in two different processes), and ordering/hashing do not depend on the current content.

You might be concerned by the runtime overhead of using objects and method calls. Also, this implementation of physical references does not allow pattern-matching on the content of the references, which is sometimes useful. I’m going to describe another solution which addresses these issues.

Another solution

You might want to prepare a sick bag. If your initials are X.L., you’d better stop reading this post right now.

Ok, you have been warned.

The solution is based on the fact that the nice behavior of generic comparison/hashing functions on object values is driven by the GC tag for these values. For instance, the relevant piece of code for the comparison function is:

case Object_tag: {
      intnat oid1 = Oid_val(v1);
      intnat oid2 = Oid_val(v2);
      if (oid1 != oid2) return oid1 - oid2;
      break;
    }

Record types always have GC tag 0. What happens if create a record type, but force its values to have a GC tag set to Object_tag (=248)? Well, the comparison function will then believe these values are objects, and compare their unique ids. For objects, the id is stored in the second word (extracted by the Oid_val macro), so we need to arrange the layout of the record type to store such an id in the second word as well.

module PhysicalRef : sig
  type 'a t = private
        {
         mutable contents: 'a;
         id: int;
        }

  val create: 'a -> 'a t
  val set: 'a t -> 'a -> unit
  val get: 'a t -> 'a
end = struct
  type 'a t = {
      mutable contents: 'a;
      id: int;
     }

  let uid = ref 0
  let create x0 =
    let r = {contents = x0; id = (incr uid; !uid)} in
    Obj.(set_tag (repr r) object_tag);
    r

  let set r x = r.contents <- x
  let get r = r.contents
end

The private annotation ensures that values of type ‘a PhysicalRef are only created by the create function (or by the unmarshalling functions…). The generation of unique id is not thread safe. If this is a problem for you, you can replace (incr uid; !uid) with (Oo.id (object end)). Who said objects were not useful in OCaml?

Notice the use of the unsafe Obj module. I’m not aware of any collateral damage of using this trick with the GC tag; the comparison and hashing function behave as expected, and pattern matching/field access don’t look at the GC tag for records. But I want to stress that this solution relies on unspecified behaviors of the language, and that it can easily break in a future version of OCaml. Or maybe it’s already broken. Again, you have been warned!

Other uses of the trick

In the section above, we have seen how to fooling the generic comparison/hash functions by setting the GC tag of record values to Object_tag and managing a (unique) identifier in the second word of the records. We can use the same trick to achieve other effects than references with physical identity.

Hiding parts of data structures to the generic functions Imagine you have a complex data structure, for which the generic functions are just fine. Now you add some kind of extra data which should not participate to the generic comparison/hashing. This can be achieved by wrapping this extra data in a record to which the GC tag trick is applied (using an id of fixed value, say 0, in the second field).

Hash-consing Hash-consing is a technique to improve sharing by making structurally equal values physical equal. With the GC tag trick, it is possible to let the system know about this fact, so that generic comparison behave efficiently on hash-consed values. Here is an implementation of a hash-consing functor that uses this trick.

module HashCons(X : Hashtbl.HashedType) : sig
  type t = private
        {
         contents: X.t;
         uid: int;
        }
  val make: X.t -> t
end = struct
  type t =
      {
       contents: X.t;
       uid: int;
      }
  type s = t

  module W = Weak.Make(struct
    type t = s
    let equal r1 r2 = X.equal r1.contents r2.contents
    let hash r = X.hash r.contents
  end)

  let memo = W.create 256

  let uid = ref 0

  let make x =
    let r = {contents = x; uid = !uid} in
    let r2 = W.merge memo r in
    if r == r2 then begin
      incr uid;
      Obj.(set_tag (repr r) object_tag);
    end;
    r2
end

Thanks to the GC tag trick, one can use generic comparison/hashing on hash-consed values even if the underlyings values themselves don’t support these operations (the contents field is just ignored by these operations).