A note about the performance of Printf and Format

The goal is to display the following to stdout:

(0,0)
(1,1)
(2,2)
...
(1000000,1000000)

How would you implement that in OCaml? For such a simple task, we probably expect the program to be IO bound, right? Ok, let's try with the idiomatic way, which is to use format strings as provided by the Printf module from OCaml standard library:

1
2
3
4
let printf () =
  for i = 1 to n do
    Printf.printf "(%d,%d)\n" i i
  done

Or maybe the same with the Format module:

1
2
3
4
let formatf () =
  for i = 1 to n do
    Format.printf "(%d,%d)@." i i
  done

On my machine, compiling with ocamlopt and running these functions (redirecting stdout to a local file) take respectively 0.94s and 3.33s. So the first lesson is:

If performance matters, don't use Format if Printf is enough.

Ok, can we beat the Printf version? This version parses and interprets the format string at runtime, and then maps to low-level output functions. So let's try to use directly those functions:

1
2
3
4
5
6
7
8
9
10
let direct () =
  let oc = stdout in
  for i = 1 to n do
    output_char oc '(';
    output_string oc (string_of_int i);
    output_char oc ',';
    output_string oc (string_of_int i);
    output_char oc ')';
    output_char oc '\n';
  done

This version takes 0.51s (0.94s for printf).

If performance matters, use direct output functions instead of Printf.

Sometimes, you are stuck with the Format module, because you must call existing printers of type Format.formatter -> foo -> unit, or you indeed make use of formatting boxes. Still, nothing forces you to rely on format strings to use Format. Here a more direct version:

1
2
3
4
5
6
7
8
9
10
let format () =
  let open Format in
  for i = 1 to n do
    print_char '(';
    print_int i;
    print_char ',';
    print_int i;
    print_char ')';
    print_flush ()
  done

This one takes 2.58s in native code (3.33s for formatf).

If performance matters and you are stuck with using Format, avoid the use of format strings.

Note that the direct version could still be optimized a little bit if we had a direct output_int which did not allocate its result as a string. Also note that in bytecode, the relative slowdowns are even more impressive.

Here is a table which summarizes those results (also showing the number of bytes allocated, as returned by Gc.allocated_bytes):

[              direct]:     0.51 sec,             32000128 bytes
[              printf]:     0.94 sec,           1704075840 bytes
[              format]:     2.58 sec,            728008488 bytes
[             formatf]:     3.33 sec,           2664063504 bytes

How often does the performance of printing textual data matters? Well actually, quite rarely, but sometimes it does, and the numbers above suggest that it might pay to rewrite the most intensively used pieces of the pretty-printing code. For instance, we've observed a total speedup of about 15% for compiling our entire code tree by optimizing the -annot printer of ocamlc (by avoiding some use of Format and format strings). I suspect that people doing textual logs could also be interested in those performance remarks.

There is also a small speedup

There is also a small speedup to be had by turning the last two output_char calls into a single output_string ")\n", at least here.

This matter is discussed a little in this bug report:

Bug 5443

Thanks

Thanks for the link. Good to see that I'm not the only interested in the performance of basic textual output :-)

What about readability?

And what about long term readability and maintainability of the code? Are those 15% gained immediately worth them? I'm sceptical.

Regards,
d.

Readability

For the specific case, I'd say yes, it's well worth it! We are talking about minutes saved every day for each developer (and our continuous integration server can find problems more quickly after each commit). The modified code is very local and it's a piece of code of the compiler which does not change a lot. Moreover, it's not really less readable to do 'pp_print_string ppf s' than 'Format.fprintf ppf "%s" foo', for instance. Sometime replacing more complex format strings with box specification yield slightly longer code, but it's not so much less readable.

Anyway, I believe people should be aware of performance implications. Sometimes, rewriting a small amount of code can lead to noticeable speedups. Everyone can decide on his own.

the two first examples

I don't think the first benchmark is correct. The "@." indications in the format string triggers a flush, so a system call is performed at each iteration while it is not in the printf example (you need to add a "%!" for that). There is still a noticeable difference when adding "%!" but it is smaller.

Thanks!

Indeed, thanks for pointing this out. I'd rather adapt all examples to avoid flushing after each line. For the formatf, is it correct to replace @. by @\n (print_newline -> force_newline) i.e. is it exactly the same semantics, in terms of boxes, etc)? This indeed gives a good speedup. Updated timing:
[              direct]:     0.51 sec,             32000128 bytes
[              printf]:     0.89 sec,           1704075840 bytes
[              format]:     1.08 sec,            688004736 bytes
[             formatf]:     1.52 sec,           2624064472 bytes
Note that the .annot printer of OCaml used to rely on @. quite intensively. Just replacing with @\n might already have given a decent performance boost.

Box semantics & using buffers

Note that besides emitting a newline and flushing, "@." also closes all open boxes, which may or may not be what the developer wants.

If you only want to add a newline (without closing any boxes or flushing), then use "@\n". If you want to merely close all boxes and flush, use "@?". If you want to close all boxes, print a newline, and then flush, use "@.".

You may also want to consider using a "Buffer.t" for creating an output batch. Both "Printf" and "Format" can print to buffers, one can also freely mix various approaches. That way you may be able to reduce the number of system calls.

The Jane Street Core library also offers a "Bigbuffer" module, which uses a Bigarray (Bigstring) internally to store data and is otherwise equivalent to the Buffer module. Since Bigarrays live outside of the OCaml heap, I/O performance is even better. The Bigbuffer module offers "bprintf" functions for both Printf and Format-style use.

override flush

Another solution is to set the flush function of the formatter to [ignore] and flush the output channel at the end.

Format strings

Thanks!

The timings I got seem to indicate that parsing format strings introduce some serious slow down (I assumed it was negligible compared to the actual IO, but it's not the case), probably related to the increased pressure on the GC. I'll now seriously consider using printing functions instead of format strings, even when using the Format module, when performance matters. At least, for the most commonly used branches of the printer code.