Between May and August 2022, Uma Kothuri from the US interned with us at LexiFi
and worked on adding native Windows support for Dune’s watch mode (dune build -w
). The internship was a big success, but, unfortunately we ran out of time
before the contribution could be polished for submission upstream, which delayed
its integration in Dune.
This is now done, and the support has been included in the latest release of Dune. To mark the occassion we are writing this note to give a semi-technical overview of how the feature is implemented. And, of course, we warmly invite all Windows Dune users to give it a go and report any issues that come up in the official bug tracker.
Dune’s watch mode (triggered with dune build -w
) causes Dune to continue to
run and monitor for changes in the sources after the initial build. When a
change is detected, Dune triggers a rebuild. If another change is detected
before the rebuild has finished, it is aborted, and a new one is triggered, etc.
Apart from being convenient when iterating on a codebase, this mode of operation can be more efficient, as the initial “startup” cost of launching Dune (scanning the file system, interpreting Dune files, etc) is paid only once, at the start, and later only the incremental building cost is incurred. Another big upside is that when in this mode, Dune acts as an “RPC server”, allowing it to interface with other external tools: this is how the integration with OCaml-LSP for showing live diagnostics works, for example.
Before we go on to the technical portion of this note, let us mention that Dune
already had a generic fallback to make watch mode work on Windows by using the
fswatch
command-line tool. However,
this backend had a number of downsides: it was less efficient than having
something built into Dune itself, one needed to install an external tool in
addition to Dune, the event filtering mechanism implemented by the tool was
unreliable at times, etc.
Because of all those reasons, it was important to have Dune support watch mode
natively on Windows (that is, by integrating directly into the operating system
API). Especially since this support already existed for the other usual
operating systems such as Linux (using the inotify
API) and macOS (using the
FSEvents
API).
Adding “native” support for watch mode means hooking into the native operating system’s API for file watching so that Dune may register a set of directories to be monitored and to be notified in case of any changes in them.
On Windows, there are two file-watching APIs:
The first API, FindFirstCHangeNotification
does not provide information
about which change caused the notification (ie whether the file was added,
removed, modified, etc). Since this information is very much needed by Dune, we
quickly focused on ReadDirectoryChangesW
instead. Once settled on a choice
of API, we had to figure out how to receive notifications from the operating
system. On this point, there is a panoply of choices: various flavors of
synchronous, as well as asynchronous, mechanisms.
Below we will explain our final design, but for those that would like to understand the possible choices better, we recommend reading the blog post Understanding ReadDirectoryChangesW, by Jim Beveridge, a veritable treasure trove of information about this API (even if in the end we did not use any of the approaches discussed in that article!).
Quick
use of
the excellent grep.app
revealed two existing bindings of
this API in OCaml:
The popular file syncing tool Unison: https://github.com/bcpierce00/unison/blob/master/src/lwt/win/lwt_win.mli
Facebook’s static type checker for Javascript, Flow: https://github.com/facebook/flow/blob/main/src/hack_forked/fsnotify_win/fsnotify.ml
However, each project makes some specific technical choices: Unison uses Asynchronous Procedure Calls, while Flow uses the API synchronously by spinning one native thread per watched directory. As we didn’t have a good picture of Dune’s requirements at the onset of the project, we preferred to start afresh instead of basing the work on these existing implementations. That said, they were an invaluable source of inspiration.
Our final design uses ReadDirectoryChangesW
in combination with I/O
completion
ports.
I/O completion ports are Windows' native high-performance mechanism for asynchronous
I/O.
I/O completion ports can be used in many of the same contexts where one would
use select
,
poll
or
epoll
under Linux. The main difference
in usage is that after registering a request for an I/O operation, I/O
completion ports notify the program on completion of the operation, while the
Linux APIs notify the program when the system is ready to perform the
operation. In that sense, I/O completion ports are closer in spirit to the more
modern io_uring
API.
In any case, for us the main attraction of I/O completion ports was that they are highly performant and they have a very simple API (if you have never encountered them before, check out this short and sweet tutorial).
fswatch_win
To interface Dune with ReadDirectoryChangesW
we added a small library to
Dune called fswatch_win
. Below is the main part of the interface, with
comments:
type t
(* A value of type `t` represents a "file watcher". Each file watcher can watch
an arbitrary number fo directories, and multiple file watchers may be used
simultaneously (this possibility is not used by Dune). *)
val create : unit -> t
(** Create a file watcher. *)
val add : t -> string -> unit
(** Add a new directory to the "watched set" of the given file watcher. *)
val wait : t -> sleep:int -> Event.t list
(** Wait for notifications from the given file watcher. The [sleep] argument is
the number of milliseconds to wait before returning when a notification is
received, to reduce the number of times Dune is woken up when there are many
notifications that arrive close together. *)
val shutdown : t -> unit
(** Shut down the file watcher and release all used resources. *)
The wait
function itself is blocking; Dune runs it in a separate thread. The
~sleep
parameter helps “batching” many notifications in one go to avoid Dune
repeatedly restarting a build when many notifications arrive very close
together. Currently it is set at 500msec in Dune. The downside is that after
making a change to a file, there is a small lag before Dune reacts.
As we will explain further in the next section, most of the work is done in a
native Windows thread created by the fswatch_win
that runs completely
independently of the OCaml runtime. This thread takes care, in particular, of
adding new directories to the watched set and waiting for notifications from the
operating system. When the notifications arrive they are stored, waiting to be
retrieved by the OCaml thread.
The sequence of operations that need to happen to start watching a new directory
for changes using ReadDirectoryChangesW
and I/O completion ports is as
follows:
Create a HANDLE
pointing to the directory that you want to watch using
CreateFileW
.
Register the handle with the I/O completion port using
CreateIoCompletionPort
. As a side-effect this tells the operating system
that future asynchronous I/O operations on this handle should be reported
through the I/O completion port.
Request to be notified of any changes in the directory by passing the handle
to ReadDirectoryChangesW
. As mentioned in the previous point, this call
will return immediately, and the actual notifications will arrive through the
I/O completion port.
The rest of the time, the native Windows thread repeatedly runs a “notification loop” that works as follows:
Wait for notifications to arrive on the I/O completion port by using
GetQueuedCompletionStatus
.
When notifications arrive, they are stored in a list waiting to be retrieved by the OCaml thread.
The request to receive notifications on the directory is resubmitted using
ReadDirectoryChangesW
in order to receive further notifications.
The GetQueuedCompletionStatus
API is actually flexible enough to be used to
receive messages from other threads in addition to the file change notifications
from the operating system, and we use it in this way, to receive messages from
the OCaml thread, as we explain in the next section.
To see the gory details, you can read the implementation, it is self-contained, not very long and, we hope, readable.
As mentioend in the previous section, most of the work takes place in a separate native Windows thread that is completely independent of the OCaml runtime. The main advantage of this approach is simplicity: no need to retain and release the OCaml runtime lock, register and unregister roots with the garbage collector for values that are used from C, etc.
The main downside is that one needs to implement a communication mechanism
between the OCaml thread and the native Windows thread. But since we were
already using GetQueuedCompletionStatus
to receive notifications from the
operating system about file system changes, and this same API could be used to
receive messages from other threads, in our case, the “cost” of this design was
substantially reduced.
Concretely, when calling the OCaml functions Fswatch_win.add
,
Fswatch_win.wait
, etc., the OCaml thread does not actually do any work, it
just sends a corresponding message to the native Windows thread using
PostQueuedCompletionStatus
. Upon receipt of the message the Windows thread
performs the actual work: adding a new directory to the “watched set” or
returning the list of notifications that have been accumulated to the OCaml
thread.
This last point needs a bit of care since the data structure that keeps track of received notifications is mutated from two different threads: new notifications are added from the Windows thread, and existing notifications are removed from the OCaml thread. This means that access to this data structure must be synchronized. One usual way of doing this is by using mutexes, but instead we used a form of “lockless” synchronization that we learned from Flow’s bindings mentioned above:
/* Retrieve the list of events from the shared list. */
static struct events* pop_events(struct fsenv* fsenv) {
struct events* res;
/* Perform [res = fsenv->events; fsenv->events = NULL] atomically */
do {
res = fsenv->events;
} while (InterlockedCompareExchangePointer(&(fsenv->events), NULL, res) != res);
return res;
}
The idea behind this code is that it performs res = fsenv->events
followed by
fsenv->events = NULL
atomically by repeatedly setting res = fsenv->events
and then setting fsevents->events = NULL
but only if the value of
fsevents->events
has not changed in between. Tricky code, but rather pleasent
when it works!
In this post we gave a technical overview of the work of Uma Kothuri during her
internship at LexiFi between May and August 2022, which consisted in adding
native Windows support for Dune’s watch mode. This support is included in the
recently released Dune 3.7.0, and was achieved by leveraging the
ReadDirectoryChangesW
API in conjunction with I/O completion ports.
We hope you enjoyed this technical note and do not hesitate to get in touch at nicolas.ojeda.bar AT lexifi.com if you have any questions. Thanks for reading!