This post is about the special representation that OCaml uses for arrays of floating point values. My claim is that this special representation is useful, but also harmful in some contexts. There are better alternatives that achieve the same benefits without the drawbacks.

If you are familiar about the OCaml representation of value and the special representation for arrays of floats, you can skip the first two sections.

## A quick reminder about OCaml uniform value representation

OCaml follows a uniform runtime representation: all values can be kept in a single machine word (32- or 64-bit); a value is either an unboxed integer or a pointer, and the two cases are distinguished by the lower bit: 0 for pointers -- which are thus assumed to be 2-aligned; and 1 for integers. Unboxed integers are used to represent OCaml values of type int (encoded as 2*X+1, which restricts them to 31/63 bits), but also values of type bool, char and constant constructors (in sum types and polymorphic variants). Pointers usually refer to blocks in the OCaml heap, which have one header word followed by one or several data words. The header encodes the block's size, 2 bits reserved for the Garbage Collector, and a 8-bit tag, which categorizes the nature of the block (e.g. 0 for tuples, records, arrays; 247 for function closures; 252 for strings) or identifies non-constant constructors.

This representation has many advantages. In particular, it allows simple support for data abstraction, separate compilation, polymorphic code and generic operations (such as the Garbage Collector, generic equality/hash/marshaling functions).

## Floats and float arrays

OCaml has a single type for floating point values (called float) that are double-precision (64-bit) values. Since 64-bit cannot be represented as an unboxed integer even on 64-bit machines (because unbox integers are only 63-bit), a float value is represented by a pointer to a block (tag 253) holding one data word (64-bit) or two (32-bit). Floats are thus boxed, and a single float value requires three or four words (one for representing the pointer, one for the block header, and one or two for the float bits). Note that this representation is only required when floats are stored in data structures, or passed from one function to another. Within a single function's body, local float values can be represented unboxed in memory or in registers, and the OCaml compiler apply such optimizations. There are also plans to allow unboxed calling conventions between functions.

OCaml arrays are normally represented by a pointer to a block with tag 0 holding one data word per array element. The length of the array is thus obtained directly from the size field of the block header, and accessing the nth-element of the array is very simple.

With this representation, an array of N floats requires around 3*N words on the heap (on 64-bit machines, assuming no sharing between boxed floats in the array), which is quite bad. Also, a simple operation such as adding 1 to an element of the array requires one extra memory indirection to read the float value, one memory allocation to hold the result of the arithmetic operation, and several memory writes. Numerical code would suffer quite a bit from such a representation.

To avoid such memory and runtime overhead, OCaml uses a special representation for arrays of floats. They are represented by blocks with a dedicated tag (254) that hold the unboxed floats. So an array of N floats requires around N words (on 64-bit machines), and a simple operation such as adding 1 to an element does not require any allocation.

While introducing the special unboxed representation for arrays of floats might seem a clever and useful hack, it has some negative consequences.

When a function accesses an array (to read or write an element), it typically needs to account for the two possible representations of the array. For instance, the OCaml functions

 1 2 3  let f arr = arr.(0) let g arr = arr.(0) <- arr.(1)   

are compiled into this cmm code (I'm using '-unsafe' mode to avoid cluttering the code with bound checks):

 1 2 3 4 5 6 7 8 9 10 11 12 13 14  (function camlFoo__f_1215 (arr/1216: addr)   (if (!= (load unsigned int8 (+a arr/1216 -8)) 254) (load arr/1216)     (alloc 1277 (load float64u arr/1216))))   (function camlFoo__g_1217 (arr/1218: addr)   (let     newval/1224     (if (!= (load unsigned int8 (+a arr/1218 -8)) 254)       (load (+a arr/1218 8)) (alloc 1277 (load float64u (+a arr/1218 8))))       (if (!= (load unsigned int8 (+a arr/1218 -8)) 254)         (extcall "caml_modify" arr/1218 newval/1224 unit)         (store float64u arr/1218 (load float64u newval/1224))))    1a)   

You can see the explicit checks on the array header. They have a runtime performance hit, and the generated code size greatly increases.

The checks are removed if the compiler knows statically whether the elements in the array are floats or not. Unfortunately, it is quite hard to know statically that something is not a float, since the type system cannot express the fact that an abstract type is not an alias for float or that a type variable cannot be instantiated to float. So in some cases, even adding type annotations is not enough to get rid of the dynamic checks, even if you know that the array doesn't contain floats.

If the compiler knows that the array contains floats, it shortcuts the dynamic check. This means that an array of floats is not allowed to use the standard representation: it has to use the unboxed form. The consequence is that when a new array is created, one must decide it is a float array or not. Again, type abstract makes it generally impossible to have this information statically, so one needs to rely on a runtime check. Concretely, Array.make observes the initial value and bases its decision on whether this value is a float or not (looking at the header tag).

The consequence of the paragraph above is a very strong constraint on the runtime system: a value which is represented at runtime by a float (i.e. a block with tag 253) must be of static type float (or an alias of it). This interacts badly with other (potential) optimizations. The existing optimized representation of lazy values is an example. When a lazy value value is forced, the thunk is replaced to a forward pointer (tag 250) to the result and when the GC runs, it short-circuits such forward pointers (i.e. the forced lazy value becomes physically equal to the result). This adds a tiny bit of extra work when forcing a value, because several cases need to be accounted for (an unforced thunk, with tag 256; a forward pointer, with tag 250; a block with any other tag, or an immediate integer: the result after short-circuiting); but this avoids one memory read (when the value is forced and after short-circuiting -- checking the tag has to be done anyway) and reduces the number of blocks in the heap (and thus time spent in the GC). Anyway, this blog post is not about this optimization on lazy values. The point is that this optimization doesn't work well when the lazy result is a float: short-circuiting the forward pointer would mean that the value of type float Lazy.t would be represented as a float, and this would break the invariant described above. Not a big deal, as it is easy enough to take this into account and disable short-circuiting when the lazy result is a float. But we don't benefit from the nice optimization on lazy floats. And more importantly, this shows that we need to be careful when introducing clever representation for seemingly unrelated types (here, Lazy.t).

For instance, one might want, some day, to let developers specify inline attributes on sum type constructors, as in:

 1 2 3 4 5 6 7 8      type value =     | Str [@inline] of string     | Int [@inline] of int       type tree =     | Leaf [@inline] of int     | Node of int * tree * tree   

The meaning of this attribute is that the constructor behaves as the identity on runtime values: no allocation is required to wrap the argument inside the constructor.

The compiler would accept such declarations if enough information remains at runtime to distinguish between constructors (e.g. defining two different inline constructors with a string argument in a single sum type would be rejected). Perhaps not the kind of things you want to expose to beginners, but certainly something that could bring interesting performance gains in tight data structures and algorithms. Unfortunately, even if floats have their own runtime tag, this nice optimization wouldn't be available for them, only because of unboxed float arrays. Too bad if you write an interpreter for a language whose value type has a "Float of float" case, or if you manipulate binary trees with float leaves.

Another related interesting optimization would be a special representation for option types that would avoid the wrapping/allocation for the Some constructor (with some special representation for None, for Some None, etc). This could give huge gains in some contexts, and many people would love to see this implemented. But again, the story becomes more complex because of floats. One would have to plan for extra runtime checks and extra code paths to disable the optimization on floats.

The fact that all kinds of potential nice optimizations are made more complex and less effective because of the automatic special representation for float arrays is my main criticism against that special representation (even more than the existing overhead on array accesses).

## Why only floats?

People doing numerical code certainly appreciate that their float arrays are unboxed. But other element types would also benefit from a custom array representation:

• Arrays of ints: the potential gain here is not about avoiding boxing (because ints are not boxed), but about avoiding useless GC scans over large int arrays. Concretely, one could use the same header tag as for strings/bytes.
• Arrays of bools: since a bool requires only one bit, using a full word for each bool in an array is a clear waste of space; and one could also let the GC know that it doesn't need to scan the array (as for ints).
• Arrays of chars: using the normal array representation means that storing N bytes takes around N machine words (OCaml chars are really bytes).

Of course, nobody uses char array in OCaml, when there is a much better alternative: bytes (previously, string). And does anyone complain that char array is not optimized automatically? I don't think so!

## Proposal

Considering the bad consequences of the automatic custom representation for float arrays, the long-term goals of having more clever representations for e.g. option types and other inlined constructors, and the usefulness of custom representation for other array types (ints, bools, chars), here is my proposal:

• Define a module type:
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  module type ARRAY = sig   type elt   type t     val make: int -> elt -> t   val length: t -> int     val get: t -> int -> elt   val (.()): t -> int -> elt     val set: t -> int -> elt -> unit   val (.()<-): t -> int -> elt -> unit     val sub: t -> int -> int -> t   val max_length: int   (* ... *) end
• Ensure that Bytes satisfies the ARRAY with type elt = char interface.
• Define a module FloatArray : ARRAY with type elt = float (unboxed representation), and similarly for int and bool (bit field), ensuring that int and bool arrays are not scanned by the GC.
• Similarly, one could expose bigarrays as another available implementation for these modules.
• Expose a functor Array.MakeImpl: functor (X : sig type t end) -> ARRAY with type elt = X.t which generates array implementations using the standard representation.
• In order to support polymorphic algorithms that require arrays, and without turning them into functors, define:
 1 2 3  type ('a, 'b) array_impl =    (module ARRAY with type elt = 'a and type t = 'b)   
so that functions can be given types such as:
 1 2 3  val sort: ('a, 'b) array_impl -> ('a -> 'a -> int) -> 'b -> 'b val array_map: ('a, 'b) array_impl -> ('c, 'd) array_impl -> ('a -> 'c) -> 'b -> 'd   
It could also be useful to have another abbreviation:
 1 2  type 'a array_impl' = (module ARRAY with type elt = 'a)   
This would be useful for algorithms that use arrays only internally, and also for:
 1 2  val mk_impl: unit -> 'a array_impl'  (* same as Array.MakeImpl *)   
• Other nice functors/functions can be implemented e.g. for arrays on t1 * t2 from implementations of arrays on type t1 and on type t2 (i.e. keeping a pair of arrays).
• Of course, get rid of the current automatic representation of float arrays, and simplify the code generator and runtime system accordingly.

Thanks to the new array access operators, one can simply write FloatArray.(a.(i) <- b.(i)), or more realistically let open FloatAray in ... in functions which are mostly dealing with float arrays.

The main consequence of such a move is that numerical code that manipulates float arrays and need to remain efficient would need to switch from float array to FloatArray.t. I don't think this would be too bad in practice. It will also avoid that a forgotten type annotation (required to turn a polymorphic function into a function on float array) results in a huge performance loss.

The parametric 'a array type constructors remains, and it's perfectly fine to use it directly in any code where you don't need special representations. As a matter of fact, I don't think there will be a lot of uses for the Array.MakeImpl functor or for the array_impl type. In my experience, arrays are most often used in monomorphic contexts, and when they don't, performance might not matter too much (so we don't need the custom representations).

## Migration path

If we skip the last bullet point of the proposal above (dropping the automatic representation of float arrays), all of it can be implemented now, and quite easily.

Then, the legacy automatic representation of float arrays could be controlled through a configure-time switch, enabled by default for some versions of OCaml (to give time to people to go through their code base and use the new style), then disabled by default, and finally completely removed.

To help people find occurrence of generic float array in their code base, one could combine static and dynamic approaches:

• Parsing .cmt files, it is easy enough to spot occurrences of float array.
• If needed, a configure-time switch could trigger a warning on stderr e.g. when Array.make is called on float.

I'd like to hear the opinion of the community of OCaml developers about that proposal!

### Interesting idea

This certainly dovetails with the direction OCaml is taking but I would stress that .NET and HLVM already adopted a better solution to this problem and several other problems touched upon here (generic pretty printing, equality, comparison, hashing, serialization, GC overheads, all-float records) by monomorphizing at JIT time (i.e. reified generics). If at all possible, OCaml should adopt reification rather than going further down this road.

I also have some concerns. You're using modules and functors to abstract, which is great, but last I looked they impede inlining so operations like float array access would be heavyweight function calls. You're also describing solutions that introduce non-atomic operations (e.g. bool array as a bitvector) just as OCaml may finally go multicore.

### really nice

I think it's a very good idea. I don't use float arrays, but I could use better (read: easy) bitvectors and more efficient polymorphic arrays (in particular in Hashtbl?). I think the abstract signature of arrays could also be very useful to share code that works on arrays, bigarrays and bytes (e.g. resizable arrays, circular buffers, etc.)

### Good for polymorphism

This is just a reminder that the special representation of float arrays is also the reason we cannot have a Top type in OCaml. Recovering Top would allow to generalize not only covariant type variables but also contravariant ones.

### Isn't this essentially the

Isn't this essentially the solution used in Standard ML (or, more
properly, in the Standard ML basis library)? I'm referrring to the
MONO_ARRAY signature and the corresponding types:

BoolArray
CharArray
IntArray
RealArray
WordArray
...

I don't know OCaml well enough to get the sort and array_map examples.
Can you write a function (as opposed to functor) that takes a module
argument?

### Re: SML basis library

I don't know the SML basis library, but I wouldn't be surprised; the idea of using a signature and several specialized implementations is certainly not very new!

One interesting aspect is that thanks to OCaml's first-class modules, you can indeed pass these structures to polymorphic functions such as sort.

### Good for Coq

Rather than making a useful comment, I will just mention that the subject of float array comes back regularly in the internal discussion of the development team of the Coq proof assistant. It so happens that Coq uses arrays in a number of place and that the existence of float arrays has a noticeable effect on performances. I'm pretty I can speak for everyone in the Coq dev team when I'm saying that the world would be a better place were float arrays not implemented.

### Re: Good for Coq

It's certainly useful to know that the topic under consideration has a direct impact on the "mainstream" use of the language (i.e. symbolic processing). How did you estimate the performance hit?

### I had an idea of this cost by

I had an idea of this cost by inlining and monomorphizing some uses of Array.map and the like in critical cases, and it was indeed noticeable. I strongly believe that we pay a similar cost everywhere in the codebase of Coq...

Actually, I even tried to make a branch of the OCaml compiler without the unboxed array of floats optimization to check this. Yet, I did not succeed because there is quite a lot of assumptions that depend on it, and the task was daunting.

### Since I didn't do these

Since I didn't do these estimations myself, I cannot be very precise. But I believe there has been some playing around with monomorphising arrays by initializing them with [ref 0] and a bit of ugly [Obj.magic], and that proved to be noticeably (though not hugely) more efficient on some performance critical code.

### Records of float ?

IIRC, records where all fields are of type float are also represented as an unboxed array. How would they be handled with your proposal ?

### Re: Records of float

Records of floats are a different story. My proposal would not affect them at all. The main difference is that for records, the decision to unbox is made purely statically, when the type definition is processed. The compiler decides if the "float record" representation should be used based on its local knowledge; the decision is then attached to the internal representation of the type definition, which is available to the code generator everytime the program accesses the record (to build new values, or read/write some fields). No dynamic check is involved.

That said, I'm also not a big fan of this automatic optimization on records of floats. The deep reason is that I believe that the memory layout customization should be fully explicit, with a very simple and regular default. This is to avoid bad surprises (e.g. adding one non-float field destroys your application performance) and to support FFI.

There are also unfriendly practical consequences, which might not matter too much in practice, but which are not very satisfactory, such as:

 1 2    module X : sig    type t           type s = {x : t} end =              struct type t = float   type s = {x : t}

which gives the following error message:

 1 2 3 4 5 6        Type declarations do not match:         type s = { x : t; }       is not included in         type s = { x : t; }       Their internal representations differ:       the first declaration uses unboxed float representation.

I'd be in favor, again, of a more explicit approach to choosing the memory layout, such as:

 1    type s = { x : float } [@@unboxed]

This would avoid bad surprises (if non-float fields are added, the type-checker would complain) and perhaps lead naturally to a more generic treatment.

Or maybe the unboxed attribute should on the individual fields:

 1    type s = { x [@unboxed] : float }

and we could for instance allow records with only some unboxed floats field (and other boxed fields). This could be implemented by moving all unboxed fields e.g. at the beginning of the block and keeping in the block header the number of bytes occupied by unboxed fields. If we are ok to loose some support for polymorphic comparison on such records, one could e.g. allow to store also "packed" char or bool fields (they would require a single byte or bit).

But just to sum up: the proposal under discussion in this blog post wouldn't affect records of floats!

### A great move

I think this is a great improvement. I've become more and more convinced over the last few years that the float optimization adds too much complexity and special-casing to OCaml's value representation. This will simplify OCaml in a number of different ways, and will make way for useful representation tricks of the kind you describe.

I completely agree about removing the float array representation. In addition to the above suggestions, I think that it would be useful to add a new type declaration kind (in addition to records, variants, and extensible variants) for arrays. Something like:

type 'a array = [| 'a |]

Then the existing float representation can be implemented as:

type float_array = [| float |] [@inline]

and similarly for other cases where unboxing in an array would be useful:

type ('a, 'b) pair_array = [| 'a * 'b |] [@inline]

Possibly bytes could be represented the same way. If we distinguished mutable and immutable arrays, then maybe string could as well.

This would probably also involve providing type-based disambiguation for some syntax (e.g. [| ... |]), since all these types would be essentially sharing the same constructors.