This release fixes a fairly large number of bugs (by updating the debugger libraries SDB depends on) and improves the behavior of some commands to be a bit more intuitive.
If you’ve read my previous post on .NET atomics, you’ll know I’m not a big fan of those APIs in the .NET Framework. But it’s easy to sit back and complain about things, so I decided to actually try to do something about this sucky situation.
My attempt to sanitize atomics in .NET is Atomique, a library I just released on GitHub and NuGet. It provides atomic operations based on the C++11 memory model, with some stuff removed and simplified for the Common Language Infrastructure.
In particular, Atomique does not support these C++11
- 8-bit and 16-bit add, subtract, and compare-and-exchange operations.
XORatomic read-modify-write operations.
- Weak (spuriously failing) compare-and-exchange operations.
- Separate success/failure barriers for compare-and-exchange operations.
- Signal/compiler memory barriers (i.e.
- Load operations with consume semantics (i.e.
If you’re curious why these aren’t supported, refer to the
Atomique attempts to make you think about the memory barriers you want by way
of method naming: Every atomic operation method ends in
Sequential. If you have no idea what these terms mean, going
Sequential is almost always safe.
Atomique will, in some cases, use barriers that are much stronger than strictly necessary. This is a result of the very limited APIs in the .NET Framework. While this can have a negative impact on performance, it won’t be any worse than using the .NET Framework APIs directly. Also, stronger barriers are not actually problematic in terms of semantics; your code will still behave the way you expect it to. So, by using Atomique, you’re just making it clearer in your code what semantics you want. As the .NET Framework and Atomique evolve, your code will just end up being faster when these overly strong barriers are eventually removed, and you won’t have to change anything.
Atomique is built as a Portable Class Library (PCL). It should work on the majority of platforms that aren’t Silverlight-based. It also has no particular processor architecture dependencies, so it’ll just work on e.g. ARM and MIPS.
The only user-visible change (ideally) in this release is that trailing
argument patterns can now be omitted on
on_load functions, too. Other than
that, nothing has really changed in this release. The main purpose is an update
from Elixir 0.12.4 to 0.15.0. A lot of stuff has changed between those two
versions so it’s entirely possible that bugs are lurking after the update. If
you run into one, please open an issue.
This release is mainly about improving the user experience:
- Line numbers are now shown in output from the
srccommand, making it much easier to manipulate breakpoints as you go along. No more switching to a text editor to get line numbers.
- When no source code is available, SDB will now show disassembly for any such stack frames, instead of showing nothing.
- You can now define command aliases to optimize your workflow.
- A default location can now be set for the debugger database, making the
dbcommand significantly less tedious to use.
- The debugger database can now be saved automatically on shutdown and loaded automatically on startup.
- The environment variable table (and thus
env listoutput) is now sorted.
However, this release also contains a fancy new feature: The
This is commonly known as ‘set next statement’ in IDEs. SDB supports it both
on a source line level, but also on an instruction level.
This is just a short blog post to clarify the actual semantics of the atomic
functions exposed in the .NET Framework - those on the
Volatile classes, as well as the concrete behavior of C#’s
keyword and CIL’s
volatile. prefix instruction. These are more complicated
and subtle than it may seem at first, and work very differently from how most
people expect them to.
This post will probably be easier to understand if you know the C++11 memory model.
Barriers and Atomicity
First, let’s the define the three kinds of memory barriers that we’ll be dealing with in order to describe the semantics:
- Acquire: Synchronizes with release (or stronger) barriers from other threads. No atomic loads in the current thread can be reordered before this barrier.
- Release: Synchronizes with acquire (or stronger) barriers from other threads. No atomic stores in the current thread can be reordered after this barrier.
- Sequential consistency: Strongest barrier, synchronizing with acquire and release operations in other threads. This barrier means that both atomic loads and stores cannot be reordered in either direction across it.
These are almost always spoken of in the context of a memory operation. We say
that a load with acquire semantics is a
load-acq, while a store with release
semantics is a
store-rel. Sequential consistency can be applied to both loads
It’s important to note at this point that memory barriers are strictly speaking not related to atomic operations at all. Just because memory barriers are used, it doesn’t mean that an operation is actually atomic - it just imposes limits on the ability of the compiler and CPU to rearrange loads and stores.
For example, this is not atomic on a 32-bit CPU:
1 2 3 4 5 6
There can be word tearing if another thread writes to
v in the same way. This
is because the barriers don’t ensure that the two 32-bit words of the
are written at the same time.
The proper way to write this code would be:
1 2 3 4
Interlocked.Exchange implies the two
in the previous example.)
It’s important to know what atomicity guarantees the Common Language Infrastructure provides. Sometimes you can get away with doing something atomically without having to call any special methods in the framework if you are familiar with these guarantees. The relevant bits are in section I.12.6.6 of Ecma 335.
The guarantees can be summarized as follows: Any read and write of a size that
is equal to or lower than
IntPtr.Size shall be atomic. In practice, what this
means is that reading or writing
int values will always
be atomic, while operations involving
long values will only be atomic on a
Note, however, that these rules only hold so long as the memory location being
operated on is properly aligned with respect to the size of the operation - if
it isn’t, all bets are off as far as atomicity goes. Also, if you obtain a
pointer (whether managed or otherwise) to, say, an
int variable, reinterpret
it as a
short pointer, and then read or write through that pointer, that read
or write will not be atomic with respect to normal
int-size reads or writes
to that same variable.
Finally, no memory barriers are guaranteed in any of the above. Again, keep in mind: Atomicity and memory barriers are two separate (but related) things.
The one atomics-related method in the framework that actually does what it says
on the tin without anything left to guessing is
inserts a full load and store barrier, i.e. it is a combined
store-seq barrier. That’s all there is to it. Typically, you’d place this on
both sides of an operation to ensure sequential consistency in the sense of the
C++11 memory model. It’s worth noting, though, that adding explicit memory
barriers by hand is almost always a bad idea. Prefer using atomic methods that
imply barriers as they are easier to reason about in the grand scheme of
The methods that aren’t quite obvious are
would have you believe that these methods perform atomic operations. They do
not. What actually happens is that
VolatileRead inserts an acquire barrier
after the read, while
VolatileWrite inserts a release barrier before the
write. In other words, they are
respectively. They do not guarantee anything about atomicity beyond what the
CLI specification does.
And now for a WTF: On Microsoft .NET,
actually result in an
mfence x86 instruction after and before the operation,
respectively. This is a much stronger barrier than is needed. Unfortunately,
many programs now rely on this behavior, so Microsoft can’t remove the overly
strong barriers. This, in part, is why the
Volatile class (explained below)
now exists. In Mono, however, these two methods will result in the expected
acquire and release semantics as per Ecma 335. We encourage developers to fix
their programs to not rely on implementation quirks of Microsoft .NET instead
of us duplicating those quirks.
At this point, you may be thinking that
just glorified, confusing, and fairly broken wrappers around
and should be avoided. You would be right. Read on.
We’re approaching sanity. The methods exposed on this class are analogous to
Thread.VolatileWrite methods, but provide more
As far as barriers go,
Volatile.Read results in an acquire barrier, while
Volatile.Write results in a release barrier. In other words, the first is
load-acq and the latter is
store-rel, just like the methods on
These methods do not have quirks relating to barrier strength on Microsoft
.NET, and actually result in the barriers they’re supposed to produce.
These methods also guarantee atomicity for all data sizes regardless of CPU bitness. That said, there is a minor detail to be aware of when dealing with 64-bit overloads of this class’s methods on a 32-bit CPU: They are only atomic with respect to each other, not with respect to regular reads and writes as defined by the CLI specification. This means that if you’re accessing 64-bit data with the methods on this class, you should never access it with regular reads or writes, if you want to be safe on 32-bit systems.
If you’re looking for atomic reads and writes with acquire and release memory model semantics, respectively, this class is what you should be using.
Finally, it’s worth noting that despite what the MSDN documentation of this class would have you believe, the C# compiler does not emit calls to the methods in this class (see below for more details). This is not even the case on a conceptual level, due to the Microsoft .NET quirks detailed above.
This is where things actually become sane. All operations provided on this
class guarantee sequential consistency and are atomic everywhere. This claim
may seem odd, because the
doesn’t mention the ordering guarantees at all. However, the
for the native
InterlockedExchange function states that it inserts a full
memory barrier. This is the same for all other native functions that the
There’s an odd method:
Interlocked.Read which only works with 64-bit values.
If you recall, the CLI only guarantees atomicity for 64-bit values on 64-bit
CPUs. This method gives that guarantee on both bitnesses. In other words, it
works similarly to the 64-bit overloads of
Volatile.Read, but results in a
load-seq barrier instead of
You may be wondering why there’s no
Interlocked.Write method. This is strange
indeed, and I can only speculate as to why this is the case. One possibility is
that whoever designed the
Interlocked API reasoned that the
Interlocked.Exchange method is good enough for doing atomic 64-bit writes.
Another possibility is a misunderstanding of the x86-64 memory model: Thinking
that a 64-bit write always implies a
store-seq barrier (it does not). In any
Interlocked.Exchange can be used in place of an
method, although it is a bit more expensive than a simple write would have been
on some platforms.
As with methods on the
Volatile class, all 64-bit overloads on
are only atomic with respect to other calls to the 64-bit overloads, and the
same safety guidelines apply.
If you’re aiming for simple code,
Interlocked should be your go-to class.
By the way, in case you’re wondering:
Interlocked.MemoryBarrier is just an
alias for the
Thread.MemoryBarrier method. Nothing subtle here.
You may be wondering what guarantees you have when you mix calls to
Interlocked methods. For example, you might want to use
to perform a
load-acq while using
Interlocked.Exchange to perform a
store-seq, or something similar.
Intuitively, this would just work. But the reality is that making this work is
quite subtle: On some older 32-bit systems (especially on ARM), the 64-bit
overloads on these classes are often implemented with plain old locks because
there’s no better alternative. But then, who’s to say that
Volatile synchronize on the same lock?
The answer is: Nobody. There is no documentation anywhere to suggest that this is the case. Both Mono and Microsoft .NET happen to use the same lock, mostly for implementation simplicity. Still, this is an implementation detail, and I can’t say I recommend relying on it.
All this being said, if you’re only using the 32-bit overloads on these classes, mixing them will work just fine. This is because they consider each other ‘regular reads or writes’ and so must be atomic with respect to each other.
So how does the
volatile keyword tie into all this? It’s often frowned upon
since its semantics are unclear, but it’s actually simple: It compiles down to
Thread.VolatileWrite calls. That’s it. There’s
nothing else to it. Since the compiler ensures that you can’t use it on types
that are larger than 32 bits, you can’t shoot yourself in the foot with regards
to 64-bit CPUs.
However, you have to be aware that since
volatile compiles down to those
methods, it results in the same semantics (and Microsoft .NET quirks) that
those have. This doesn’t make
volatile unusable, but it should make you
think twice before using it.
Note that you can shoot yourself in the foot by somehow reading or writing to a
volatile field in a way that makes the C# compiler unable to insert the calls
Thread.VolatileWrite. This can happen through
reflection, but also by passing the field as a
out parameter. Any
production-quality C# compiler will warn you about the latter, though.
To complicate matters further, CIL also provides an instruction that deals
with memory barriers. Sections I.12.6.7 and III.2.6 have the gory details, but
in short, prefixing an instruction with
volatile. just turns reads into
load-acq and writes into
In other words, this is just a CIL-level equivalent to
Thread.VolatileWrite, except that it doesn’t have the quirks that
those methods have on Microsoft .NET.
My honest opinion is that atomics in the .NET Framework are a disaster. There
is way too much crap left over from past tried-and-failed attempts to provide
understandable APIs. In an ideal world, a
System.Threading.Atomic class would
be introduced, exposing methods that roughly model the C++11 memory model (with
the exception of signal barriers).
In the meantime, my advice would be this:
volatileand the related
Volatilewhich has clearer semantics, unless you actually don’t want atomicity, in which case, the
Threadmethods can be fine (if only a bit inefficient on Microsoft .NET).
- Avoid explicit
Thread.MemoryBarriercalls. Associate barriers with actual atomic operations by using
Volatilemethods, or use the
Threadmethods if point 1 doesn’t apply.
- Avoid C#’s
volatilekeyword. It’s not because it has unclear semantics – it hopefully doesn’t after you’ve read this post – but because it obscures intentions: The semantics should be specified at each point where a variable is read or written, not where it’s declared.
- If you’re writing a compiler that supports volatile operations somehow,
please use the
volatile.CIL prefix instruction. It has actual acquire and release semantics on both Microsoft .NET and Mono without any overly strong barriers. Don’t use the
This blog post turned out quite a bit longer than I expected when I started writing it. Atomics are hard. Who knew?
This release is mostly about incorporating some feedback I got on Reddit. In particular:
task. The old name was obscure and only made sense if you’d dealt with POSIX
makein the past.
- There is now support for
fallbacktasks. These run when an invalid target is given on the command line. They are ExMake’s equivalent to
Systemare now automatically imported in all build scripts. The functions in these modules are used commonly enough that there really was no excuse for ExMake not importing them.
_argument patterns in rules/tasks can now be omitted, reducing the syntax noise a bit.
It’s that time of the month. That time where some random person on the Internet announces a new, fancy build system that’s better than all the other build systems and will take over the world.
On a more serious note, I just finished version 0.3.1 of
ExMake, a build tool similar in nature to
make tool, but generally more modern. This version is more or less
usable for building actual things.
So what is this thing, why does it exist, and how is it better than all the other build systems?
As mentioned, ExMake is a
make-like tool. It’s not something meant to replace
monolithic build system suites like Autotools. It’s simply a dependency-driven
build tool that executes recipes to produce output files. The main reason I
created it was that I was sick and tired of
make’s scripting ‘language’ which
is, at best, a text processor with macros. Other things that bothered me were
make’s lack of support for libraries, its broken recursion model, its lack of
caching of any kind, its fragile and inefficient parallel job scheduler, and
its nightmare-inducing, shell-based recipes.
ExMake is written in Elixir, a general-purpose programming language, and also uses it as its scripting language. This does mean that ExMake requires Erlang to work. This is unlikely to be a problem, however, as Erlang is available for the vast majority of operating systems and architectures.
Let me introduce ExMake by example. A simple
Exmakefile might look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
myprog, you just invoke
exmake which will invoke the
since no other target has been named. As with
make, you can pass
-j 4 to
build up to 4 things in parallel (though it obviously won’t matter in this
case, since we’re just building one rule). ExMake does one thing very
make, however: It produces no console output at all by
default. This may seem odd, but this is in line with the Unix philosophy of
shutting up unless you have something interesting to say. ExMake will of course
produce output if an error occurs. You can ask ExMake to be loud (print all
shell invocations) by passing
You’re probably wondering about the explicit
./ prefixes added to that
shell invocation. This happens because ExMake only invokes recipes relative to
the directory it was started in. This may seem weird if you’re coming from good
make, but there’s a very good reason that ExMake does this: Its recursion
model. If you’ve ever maintained a non-trivial application using
make as a
build tool, you’ve probably dealt with
make’s insane approach to recursion.
There’s even a famous paper explaining why it’s evil and broken. ExMake does
away with this traditional approach to recursion and instead treats recursion as
a first-class citizen.
Recursion in ExMake is done with a simple directive that tells ExMake where to
go and pick up another
Exmakefile. Let’s see an example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
There’s a bit to take in here. Let me explain:
recursedirective tells ExMake to go into
utilsand pick up the
Exmakefilethat’s in there. ExMake then adds all of the rules in that file to the dependency graph.
- The top-level
cleantarget now depends on
utils/clean. This notation may seem odd, but it’s just that ExMake uses path separators for phony targets too. This dependency ensures that a top-level
exmake cleanwill also clean up in
myprognow depends on
utils/stuff.o, which is built in the
utils. This means that you could
cd utils && exmakeand then
cd .. && exmaketo build
utils/stuff.oseparately if you wanted.
- The argument list (or more accurately, argument pattern list) for
myprognow contains two identifiers in the sources list. This just means that we’re expecting the two dependencies (
utils/stuff.o) that we declared, and therefore pattern match on that list to conveniently get them.
We can now build:
1 2 3
Notice how ExMake didn’t
utils. But because we wrote our recipes to
use the paths passed to them instead of writing the paths literally, everything
worked out as ExMake simply passed the full paths to the recipes.
Now, why is this way of dealing with recursion a good idea? The reason is quite
simple: Since ExMake does not invoke itself recursively – for performance
reasons – changing the current directory would be dangerous. Sure, it would
work fine when executing a build script serially, but if you add
-j 2 to the
mix, suddenly some rules will start executing in the wrong directories! One way
to work around that would be to add locks around directory changes, but then
the performance gained by parallelizing the build is undermined.
So, it’s a tradeoff. You have to write rules more carefully, but on the other
hand, ExMake doesn’t have to jump through ridiculous hoops such as invoking
itself recursively and using IPC to communicate between the processes. This
means that ExMake’s parallel job scheduler is much more robust than that of,
One performance-related thing that ExMake puts a lot of effort into is caching.
To see just how much this matters, let’s try timing non-cached and cached
builds of the recursion example above. We do this by passing the
This will print a bunch of stuff after the build is done, but we’re only
interested in the total time.
1 2 3 4 5 6 7 8 9 10 11 12
After this build, everything has been cached. Let’s try a cached build:
1 2 3 4 5 6 7 8 9 10 11 12 13
So we went from 578 milliseconds to 211 milliseconds. This doesn’t seem like a whole lot given the small test case, but on large projects with huge dependency graphs, just loading the graph from disk is significantly faster than computing all of it on every ExMake invocation. Similarly, loading the compiled build scripts is way faster than lexing, parsing, analyzing, and emitting them over and over.
I hate how, when I use
make, I have to reinvent so much stuff to invoke the
tools I need. There’s no standard way to have libraries that can be installed
and used in makefiles. In ExMake, there are standard mechanisms to construct
and use libraries.
Let’s suppose I want to build a simple C# console application. If I use the C#
library that ships with ExMake, my
Exmakefile will look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Let’s run it:
1 2 3 4
And that’s it. We can add source files to the sources list as we go along. The
C# library picked up a C# compiler automatically, set up all the arguments as
needed, and created the
myprog.exe rule and its recipe (with all its somewhat
complicated internal logic to handle a lot of different use cases).
Note that information such as the C# compiler to use is cached.
Of course, it doesn’t end with just the standard libraries. You can write your own libraries that can be installed globally (or locally) and included in build scripts.
For example, a super simple C library (
c.ex) might look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
You can use ExMake to build the library with an
Exmakefile like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Once compiled to
Elixir.ExMake.Lib.C.beam, it can be copied to the system so
load_lib directive will be able to load it. For example, you could
drop it into
ExMake does away with some features of GNU
make that I personally consider
either broken, insane, or actively harmful. Or any combination of those.
Last-resort rules and the
.DEFAULT target are unsupported. I have never come
across a sensible use case for these, and it seems to me that if you find
yourself needing these, you’ve done something horrifically wrong.
ExMake can’t create and update archive files. This is an odd feature that is
better left to explicit invocations of
.LOW_RESOLUTION_TIME target does not exist in any form. I don’t know of
any remotely relevant operating system or file system that needs this. I
suspect it made sense back when it was introduced, but highly doubt its value
.EXPORT_ALL_VARIABLES feature is unsupported as I strongly view this as
a glaring recipe for disaster - it could affect all sorts of things in the
programs being executed. It’s much better to explicitly export variables with
There is no support for
.NOTPARALLEL (or anything like it). This is another
case of hiding build system bugs intentionally. If two rules must not execute
in parallel, make one depend on the other.
.NOTPARALLEL is a shotgun solution
that works, but harms parallelism of the entire build script.
.ONESHELL makes little sense in ExMake since recipes are regular Elixir
expressions, not shell invocations.
In short, ExMake is a
make-like tool using a modern, general-purpose
programming language as its scripting language. It features extensive caching,
reusable libraries, sane recursion, and better parallel job scheduling.
If you’re interested, you can find more info on
the ExMake wiki, though I have yet to
write the manual. Still, this post and the API documentation for
ExMake.Lib should get you going easily.
The only user-visible change in this release is that the
args command to set
arguments for the inferior process actually works. This was spotted and fixed