I recently committed some code to the Flect repository that adds support for full-blown distributed compilation. This means that you can e.g. run a Flect server on one machine and then initiate a build from another machine and it’ll automagically distribute work to the server. This is just a quick and dirty post describing how it works.
It’s invoked this way:
1 2 3 4
There are a few things to note here:
--namesflag defaults to
shortwhich is OK for pretty much any use case. It can be set to
longif long names are desired (see the Erlang documentation on distribution for more info on this).
--groupflag defaults to
flect_compilers. The purpose of this flag is to allow creating compiler server groups such as
android_arm, etc. The argument to
--diston the client end is the group name to use.
--cookieflag is used for authentication. A client must use the same cookie as the server(s) it intends to talk to. It defaults to
nocookieon both the server and client.
--nameflag on the client sets the client’s node name. This name must be unique in the Erlang network. The first (and only) argument to the server is the server’s node name, which must also be unique. The default name for a client is
nonode(which is nonsensical for long name networks).
So assuming we don’t need advanced configuration, we could shorten the above to:
1 2 3 4
Servers can be run on the same machine that the client is going to run on, and multiple servers can run on the same machine (they must have different node names).
So how does the client distribute work?
The current ‘algorithm’ isn’t particularly clever, but it works. It effectively
parallelizes each stage in the compilation pipeline. For example, when the
lexing stage starts, the client uses Erlang’s
pg2 module to select a bunch of
random servers that it sends lexing work off to. After doing that, it then
loops until it receives all the results back.
This means that there is a synchronization barrier between each stage in the
pipeline at the moment. We could definitely do better, since parsing of a file
a.fl does in no way depend on lexing of a file
b.fl, and so on. However,
interleaving compiler stages in this way will complicate things a lot. It also
gets hairy once we get to the semantic and code generation stages since the
information from one file could be needed to analyze another, or a CTE
(compile-time evaluation) operation in one module depends on values in another.
The Flect compiler is still young, so this isn’t particularly relevant yet, but in the future stages like object code generation and linking will require querying servers to make sure that they have the expected backing C99 compiler and linker, and target the expected OS/architecture/ABI/endianness combination.
But is something like this going to be needed?
It’s hard to say at this point. The Flect compiler is implemented in Elixir, which runs on the Erlang VM. It isn’t the fastest language in existence, and the immutability model also means some things will be slightly slower than if the compiler had been written in a traditional, imperative language. It also doesn’t help that Flect’s type system is relatively complicated and takes a significant amount of time to semantically analyze.
I can almost certainly guarantee that the Flect compiler will not deliver instant compilation times for any non-trivial project. Still, I expect the compilation speeds to be significantly better than those for e.g. C++ (as little as that means) and Rust.
I implemented the Flect compiler in Elixir primarily because the Erlang VM runs on many architectures and operating systems. That meant a trade-off with regards to raw speed. At least the easy concurrency model of the Erlang VM makes parallelization of Flect builds trivial (the distributed compiler code is only around 250 lines).