alexrp’s blog

ramblings usually related to software

.NET Atomics and Memory Model Semantics

This is just a short blog post to clarify the actual semantics of the atomic functions exposed in the .NET Framework - those on the Thread, Interlocked, and Volatile classes, as well as the concrete behavior of C#’s volatile keyword and CIL’s volatile. prefix instruction. These are more complicated and subtle than it may seem at first, and work very differently from how most people expect them to.

This post will probably be easier to understand if you know the C++11 memory model.

Barriers and Atomicity

First, let’s the define the three kinds of memory barriers that we’ll be dealing with in order to describe the semantics:

  • Acquire: Synchronizes with release (or stronger) barriers from other threads. No atomic loads in the current thread can be reordered before this barrier.
  • Release: Synchronizes with acquire (or stronger) barriers from other threads. No atomic stores in the current thread can be reordered after this barrier.
  • Sequential consistency: Strongest barrier, synchronizing with acquire and release operations in other threads. This barrier means that both atomic loads and stores cannot be reordered in either direction across it.

These are almost always spoken of in the context of a memory operation. We say that a load with acquire semantics is a load-acq, while a store with release semantics is a store-rel. Sequential consistency can be applied to both loads and stores; load-seq and store-seq respectively.

It’s important to note at this point that memory barriers are strictly speaking not related to atomic operations at all. Just because memory barriers are used, it doesn’t mean that an operation is actually atomic - it just imposes limits on the ability of the compiler and CPU to rearrange loads and stores.

For example, this is not atomic on a 32-bit CPU:

1
2
3
4
5
6
void Set(ref long x)
{
    Thread.MemoryBarrier();
    x = 42;
    Thread.MemoryBarrier();
}

There can be word tearing if another thread writes to v in the same way. This is because the barriers don’t ensure that the two 32-bit words of the long are written at the same time.

The proper way to write this code would be:

1
2
3
4
void Set(ref long x)
{
    Interlocked.Exchange(ref x, 42);
}

(Note that Interlocked.Exchange implies the two Thread.MemoryBarrier calls in the previous example.)

CLI Atomicity

It’s important to know what atomicity guarantees the Common Language Infrastructure provides. Sometimes you can get away with doing something atomically without having to call any special methods in the framework if you are familiar with these guarantees. The relevant bits are in section I.12.6.6 of Ecma 335.

The guarantees can be summarized as follows: Any read and write of a size that is equal to or lower than IntPtr.Size shall be atomic. In practice, what this means is that reading or writing byte, short, and int values will always be atomic, while operations involving long values will only be atomic on a 64-bit system.

Note, however, that these rules only hold so long as the memory location being operated on is properly aligned with respect to the size of the operation - if it isn’t, all bets are off as far as atomicity goes. Also, if you obtain a pointer (whether managed or otherwise) to, say, an int variable, reinterpret it as a short pointer, and then read or write through that pointer, that read or write will not be atomic with respect to normal int-size reads or writes to that same variable.

Finally, no memory barriers are guaranteed in any of the above. Again, keep in mind: Atomicity and memory barriers are two separate (but related) things.

Thread Methods

The one atomics-related method in the framework that actually does what it says on the tin without anything left to guessing is Thread.MemoryBarrier. It inserts a full load and store barrier, i.e. it is a combined load-seq and store-seq barrier. That’s all there is to it. Typically, you’d place this on both sides of an operation to ensure sequential consistency in the sense of the C++11 memory model. It’s worth noting, though, that adding explicit memory barriers by hand is almost always a bad idea. Prefer using atomic methods that imply barriers as they are easier to reason about in the grand scheme of things.

The methods that aren’t quite obvious are Thread.VolatileRead and Thread.VolatileWrite. The MSDN documentation would have you believe that these methods perform atomic operations. They do not. What actually happens is that VolatileRead inserts an acquire barrier after the read, while VolatileWrite inserts a release barrier before the write. In other words, they are load-acq and store-rel operations, respectively. They do not guarantee anything about atomicity beyond what the CLI specification does.

And now for a WTF: On Microsoft .NET, VolatileRead and VolatileWrite actually result in an mfence x86 instruction after and before the operation, respectively. This is a much stronger barrier than is needed. Unfortunately, many programs now rely on this behavior, so Microsoft can’t remove the overly strong barriers. This, in part, is why the Volatile class (explained below) now exists. In Mono, however, these two methods will result in the expected acquire and release semantics as per Ecma 335. We encourage developers to fix their programs to not rely on implementation quirks of Microsoft .NET instead of us duplicating those quirks.

At this point, you may be thinking that VolatileWrite and VolatileRead are just glorified, confusing, and fairly broken wrappers around MemoryBarrier and should be avoided. You would be right. Read on.

Volatile Methods

We’re approaching sanity. The methods exposed on this class are analogous to the Thread.VolatileRead and Thread.VolatileWrite methods, but provide more useful guarantees.

As far as barriers go, Volatile.Read results in an acquire barrier, while Volatile.Write results in a release barrier. In other words, the first is load-acq and the latter is store-rel, just like the methods on Thread. These methods do not have quirks relating to barrier strength on Microsoft .NET, and actually result in the barriers they’re supposed to produce.

These methods also guarantee atomicity for all data sizes regardless of CPU bitness. That said, there is a minor detail to be aware of when dealing with 64-bit overloads of this class’s methods on a 32-bit CPU: They are only atomic with respect to each other, not with respect to regular reads and writes as defined by the CLI specification. This means that if you’re accessing 64-bit data with the methods on this class, you should never access it with regular reads or writes, if you want to be safe on 32-bit systems.

If you’re looking for atomic reads and writes with acquire and release memory model semantics, respectively, this class is what you should be using.

Finally, it’s worth noting that despite what the MSDN documentation of this class would have you believe, the C# compiler does not emit calls to the methods in this class (see below for more details). This is not even the case on a conceptual level, due to the Microsoft .NET quirks detailed above.

Interlocked Methods

This is where things actually become sane. All operations provided on this class guarantee sequential consistency and are atomic everywhere. This claim may seem odd, because the MSDN documentation doesn’t mention the ordering guarantees at all. However, the MSDN documentation for the native InterlockedExchange function states that it inserts a full memory barrier. This is the same for all other native functions that the methods on Interlocked mirror.

There’s an odd method: Interlocked.Read which only works with 64-bit values. If you recall, the CLI only guarantees atomicity for 64-bit values on 64-bit CPUs. This method gives that guarantee on both bitnesses. In other words, it works similarly to the 64-bit overloads of Volatile.Read, but results in a load-seq barrier instead of load-acq.

You may be wondering why there’s no Interlocked.Write method. This is strange indeed, and I can only speculate as to why this is the case. One possibility is that whoever designed the Interlocked API reasoned that the Interlocked.Exchange method is good enough for doing atomic 64-bit writes. Another possibility is a misunderstanding of the x86-64 memory model: Thinking that a 64-bit write always implies a store-seq barrier (it does not). In any case, Interlocked.Exchange can be used in place of an Interlocked.Write method, although it is a bit more expensive than a simple write would have been on some platforms.

As with methods on the Volatile class, all 64-bit overloads on Interlocked are only atomic with respect to other calls to the 64-bit overloads, and the same safety guidelines apply.

If you’re aiming for simple code, Interlocked should be your go-to class.

By the way, in case you’re wondering: Interlocked.MemoryBarrier is just an alias for the Thread.MemoryBarrier method. Nothing subtle here.

Mixing Volatile and Interlocked

You may be wondering what guarantees you have when you mix calls to Volatile and Interlocked methods. For example, you might want to use Volatile.Read to perform a load-acq while using Interlocked.Exchange to perform a store-seq, or something similar.

Intuitively, this would just work. But the reality is that making this work is quite subtle: On some older 32-bit systems (especially on ARM), the 64-bit overloads on these classes are often implemented with plain old locks because there’s no better alternative. But then, who’s to say that Interlocked and Volatile synchronize on the same lock?

The answer is: Nobody. There is no documentation anywhere to suggest that this is the case. Both Mono and Microsoft .NET happen to use the same lock, mostly for implementation simplicity. Still, this is an implementation detail, and I can’t say I recommend relying on it.

All this being said, if you’re only using the 32-bit overloads on these classes, mixing them will work just fine. This is because they consider each other ‘regular reads or writes’ and so must be atomic with respect to each other.

C#’s volatile Keyword

So how does the volatile keyword tie into all this? It’s often frowned upon since its semantics are unclear, but it’s actually simple: It compiles down to Thread.VolatileRead and Thread.VolatileWrite calls. That’s it. There’s nothing else to it. Since the compiler ensures that you can’t use it on types that are larger than 32 bits, you can’t shoot yourself in the foot with regards to 64-bit CPUs.

However, you have to be aware that since volatile compiles down to those methods, it results in the same semantics (and Microsoft .NET quirks) that those have. This doesn’t make volatile unusable, but it should make you think twice before using it.

Note that you can shoot yourself in the foot by somehow reading or writing to a volatile field in a way that makes the C# compiler unable to insert the calls to Thread.VolatileRead and Thread.VolatileWrite. This can happen through reflection, but also by passing the field as a ref or out parameter. Any production-quality C# compiler will warn you about the latter, though.

CIL’s volatile. Prefix

To complicate matters further, CIL also provides an instruction that deals with memory barriers. Sections I.12.6.7 and III.2.6 have the gory details, but in short, prefixing an instruction with volatile. just turns reads into load-acq and writes into store-rel.

In other words, this is just a CIL-level equivalent to Thread.VolatileRead and Thread.VolatileWrite, except that it doesn’t have the quirks that those methods have on Microsoft .NET.

Conclusion

My honest opinion is that atomics in the .NET Framework are a disaster. There is way too much crap left over from past tried-and-failed attempts to provide understandable APIs. In an ideal world, a System.Threading.Atomic class would be introduced, exposing methods that roughly model the C++11 memory model (with the exception of signal barriers).

In the meantime, my advice would be this:

  1. Avoid volatile and the related Thread methods. Prefer Volatile which has clearer semantics, unless you actually don’t want atomicity, in which case, the Thread methods can be fine (if only a bit inefficient on Microsoft .NET).
  2. Avoid explicit Thread.MemoryBarrier calls. Associate barriers with actual atomic operations by using Interlocked or Volatile methods, or use the Thread methods if point 1 doesn’t apply.
  3. Avoid C#’s volatile keyword. It’s not because it has unclear semantics – it hopefully doesn’t after you’ve read this post – but because it obscures intentions: The semantics should be specified at each point where a variable is read or written, not where it’s declared.
  4. If you’re writing a compiler that supports volatile operations somehow, please use the volatile. CIL prefix instruction. It has actual acquire and release semantics on both Microsoft .NET and Mono without any overly strong barriers. Don’t use the Thread methods.

This blog post turned out quite a bit longer than I expected when I started writing it. Atomics are hard. Who knew?