As part of my ‘make Mono awesome on ARM’ quest, I’ve made some improvements to Mono recently, which may be of interest to anyone using Mono on ARM boards or embedded devices.
All of these features have landed in Git, though not all of them are present in
the latest release. If you need/want these features, you will need to build the
master branch in Git.
Build System Sanitation
Mono’s build system has traditionally done some fairly insane things when the target is set to ARM.
We used to detect the target FPU by attempting to execute a program using VFP instructions in the configure process. While this works fine on native ARM hardware, it doesn’t work at all when cross-compiling. It is also not what most people want because it completely ignores what FPU the GCC toolchain is set up to use (so it could easily screw things up for Linux distro package maintainers for instance).
We now detect the target’s FPU by using preprocessor logic that goes somewhat like this:
1 2 3 4 5 6 7
It looks a bit convoluted, but it’s necessary due to the way GCC behaves with
the various FPU configurations. Also, we have a special case for iOS because
the GCC shipped with it doesn’t follow the logic that the regular GCC does. On
iOS, we always assume
-mfloat-abi=softfp because all iOS devices have a VFP
(For an explanation of the floating point ABIs, see the section on dynamic VFP further into the article.)
Another problem we had when detecting the FPU is that we simply executed plain
gcc to do so. This is very wrong because the compiler executable name can be
different – especially when cross-compiling – and so we could end up querying
the host compiler instead of the target compiler. This is now fixed.
In a similar vein to the old FPU detection, we used to detect ARM v6+ by
executing a small program that uses an ARM v6
mcr instruction. Again, this is
very hostile to cross-compilation and also completely disregards the toolchain
configuration. We now check the various
__ARM_ARCH_...__ preprocessor symbols
defined by the compiler. For example,
__ARM_ARCH_6ZK__ means ARM v6, while
__ARM_ARCH_7A__ means ARM v7, and so on (there are many variations).
Finally, various explicit
arguments that we passed depending on the target triple have been removed in
favor of the sanitized detection of target FPU and ARM version.
Improved Hardware Feature Detection
It used to be that Mono would do a couple of checks via
get an idea of what ARM version is in use, and that’s about it. And this
was only done on Linux.
First of all, using
/proc/cpuinfo to detect hardware features is always
problematic because it doesn’t work under QEMU. That file is generated by the
host kernel, so if you’re running Mono in an ARM
chroot by using QEMU and
the host kernel is, say, an x86 kernel, the values Mono will find will seem
like complete garbage.
So, instead, Mono now uses the Linux auxiliary vector to detect hardware
features. This is the
/proc/self/auxv file. And don’t try to
cat it (it’s
binary) - use
LD_SHOW_AUXV=1 /bin/true instead. Using the auxiliary vector
is better because QEMU gets a chance to decide what information is provided.
If you execute the aforementioned command on an x86 machine, you’ll see
something like this:
1 2 3 4 5
Now if we switch to an ARM
chroot using QEMU, we’ll get this:
1 2 3 4
Two things are worth noting:
AT_PLATFORMentry is present.
AT_HWCAPentry has ARM-specific info.
Ideally, QEMU would have provided an
AT_PLATFORM entry saying something like
v7 (as would be the case on real ARM hardware), but not providing it at all
also works in that Mono will just be conservative and only generate ARM v4
AT_HWCAP entry is also provided by QEMU and usually is consistent with
what hardware features QEMU is capable of emulating, so we can safely look at
it and see that it supports e.g. VFP and therefore generate VFP code.
Another thing that’s been improved is that Mono now detects the ARM version on iOS too. This means that a Mono binary compiled for ARM v6 that executes on v7 or v7s hardware will use v7 and v7s instructions. Similarly, a v7 binary will use v7s instructions if executed on a v7s device. This has always been the case on Linux and Android - we just do it on iOS too, now.
Finally, a new
MONO_VERBOSE_HWCAP environment variable has been added to Mono
that makes it print the hardware features it has detected:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
This is useful for finding out whether Mono is making full use of your hardware.
Dynamic Use of Vector Floating Point
On ARM, there are two software floating point ABIs:
former ABI uses full software emulation for all floating point operations, and
passes floating point values in core registers and on the stack. The latter
adheres to the
soft ABI but uses Vector Floating Point (VFP) instructions to
perform floating point computations, resulting in a significant performance
Previously, if Mono was compiled for the
soft ABI, it would always use
software floating point, even if the hardware it was executed on actually had a
You might wonder why that’s a problem in practice - surely people will just do
the right thing and compile Mono for the ABI they need, right? It turns out
that it’s not that simple. For example, Debian’s
armel distribution ships
with a GCC that is configured for the
soft ABI. Similarly, all packages are
compiled for that ABI. Most people don’t think about this, and end up with a
Mono that performs worse than it has to. Another example is Android. Here at
Xamarin, we ship two versions of Mono compiled for ARM in the Xamarin.Android
product: One for ARM v5 using the
soft ABI and one for ARM v7 using the
softfp ABI. The v5 build will be used on v5 and v6 devices and even if they
have a VFP unit, it won’t be used. Only Android for ARM v7 guarantees that a
VFP unit is present.
This is now fixed. The improved feature detection described above, combined
with an overhaul of the way the JIT treats software floating point targets,
means that we can now detect whether the hardware provides a VFP unit and use
it if it does, even if Mono is compiled for the
The way this actually works is that Mono now always assumes that the hardware
has VFP, even if compiled for the
soft ABI. During initialization, Mono
checks if the host hardware actually has a VFP unit, and if not, falls back to
the software floating point code paths. That is to say, software floating point
is now a ‘second-class citizen’ that is only compiled in as a fallback
mechanism in case no VFP unit could be found. Note, however, that Mono compiled
softfp will not contain a fallback (it would make no sense, since Mono
itself wouldn’t run on
Initially assuming that VFP is present may seem like a somewhat odd way to do things, but as it turns out, the vast majority of ARM devices have VFP, so it’s actually a fairly sane default - like how we assume all x86 devices in this day and age have an FPU even though it’s not actually guaranteed on i386 machines per the Intel manual.
Atomics and Architecture Version
ARM is a bit of a mess as far as SMP support goes. The first architecture
version to have proper SMP support was ARM v6. ARM v7 added some new convenient
dmb to issue a memory barrier (which was only possible via
the somewhat convoluted
mcr instruction in v6). However, v4 and v5 did not
have any support for SMP at all. What this means in practical terms is that, in
the past, if you compiled Mono for ARM v4 or v5 and then executed it on an SMP
system with multiple cores, everything would blow up because Mono wouldn’t know
how to do atomic operations since it’s compiled for an architecture version
that just doesn’t have them.
Generally, the problem here is that it’s very common to compile Mono for an old
version of the ARM architecture and execute it on much newer hardware. For
example, as I mentioned earlier, the GCC included in Debian’s
distribution targets ARM v4 by default. Another example is the Android NDK,
which targets v5 by default. In practice, most ARM hardware today is v6 or v7.
There are a couple of Android devices out there that are actually v5 (believe
it or not), but Mono will work fine on those.
So the solution to this problem is to do something similar to what we do for VFP: Detect the actual architecture version we’re running on and use newer SMP instructions if available.
Previously, Mono’s atomics looked something like this:
1 2 3 4 5 6 7 8
HAVE_ARMV7 are compile-time things, this would result
in the v4/v5 code being used if the compiler was configured for anything below
What we want is this:
1 2 3 4 5 6 7 8 9 10 11 12
(We still have the
#if there because we don’t want a pointless branch on ARM
v6 and v7 where we always have SMP instructions available.)
We could have easily implemented this in Mono, but as it turns out, GCC has a
bunch of convenient intrinsics that already do that for us! Those are the
__sync_* functions, which you can read about
Although it isn’t clearly documented, GCC makes sure that the code it emits for
those functions works on both the target ARM version and all newer ARM
versions. So on ARM v6 and up, GCC will just generate the obvious code using
the native instructions available in that architecture version. For any older
architecture version, it delegates to various helpers in the kernel such as
__kuser_memory_barrier which sit at
0xffff0fa0 respectively. These functions are compiled for the actual ARM
version the hardware is running, since they are part of the kernel itself, so
they can do the right thing for the architecture version that is actually in
use. These are provided as part of the Linux kernel’s vDSO interface and are
therefore fairly efficient to call.
For the above reasons, Mono uses GCC’s
__sync_* intrinsics for atomics on
most targets today (x86, ARM, PowerPC, MIPS).
The TL;DR of all of the above is:
configure.innow respects toolchain configuration with regards to target architecture version and target FPU, and invokes the correct compiler executable to detect this information.
- Hardware feature detection is now done via the Linux auxiliary vector instead
/proc/cpuinfo, so Mono works under QEMU. Also, ARM version detection is now done on iOS too. A new
MONO_VERBOSE_HWCAPenvironment variable has been added to print hardware feature information.
- The JIT will now actively make use of a VFP unit even when compiled for
systems that don’t have one. This results in significantly better floating
point performance on systems that don’t yet use the
- Mono will now work properly on SMP-capable ARM systems even when compiled for non-SMP architecture versions such as ARM v4 and v5.
Just build from the Git
master branch to get all of the above.
But what about hard float support?
We (Xamarin) are aware that hard float support in Mono is very important for platforms like the Raspberry Pi and, generally, all new ARM boards.
Hard float support is coming. In fact, I’m working on it as I publish this
article, so it shouldn’t take long before it lands in