alexrp’s blog

ramblings usually related to software

Elixir Isn’t Hipster

I often hear people asking what kind of language and/or framework they should use for their distributed or parallel workload. In 99% of cases, I reply with “Erlang, Elixir, or anything that runs on the Erlang VM”.

Given that Elixir is not widely known yet, they often have to go and look it up. What they find is a language that is syntactically very similar to Ruby. This often results in skepticism - Ruby is not exactly known for its great performance, reliability, or language design (there, I said it). That is, they perceive Elixir as a language with neat syntax but with a rather poor ecosystem/infrastructure - it’s ‘hipster’.

But this is not so. Let me assure you: Elixir is very much a real world programming language created to solve real problems.

Performance

Elixir is based on the Erlang virtual machine - the BEAM (Bogdan/Björn’s Erlang Abstract Machine). By default, the BEAM interprets compiled BEAM bytecode in a similar fashion to CPython and most JavaScript implementations. It uses the so-called ‘threaded code interpretation’ approach. This is one of the fastest possible ways to interpret bytecode as every instruction is a direct pointer to executable C code. This kind of dispatch mechanism results in such a high performance increase that the Erlang developers were willing to go to great lengths (question number five) to make it work on Windows.

But even if an extremely fast interpreter is too slow for you, the BEAM offers a full-blown native code compiler these days, called HiPE (High Performance Erlang). HiPE works in a similar fashion to software such as PyPy and IonMonkey with the exception that it compiles ahead-of-time (AOT) instead of just-in-time (JIT). HiPE often outperforms the interpreter but there are cases where the interpreter is faster. As with anything involving performance, benchmark it and find out what’s best for your use case.

It’s worth mentioning that in many cases, performance isn’t everything in a distributed system. Of course we’d like things to go faster than an instruction per second (they do, just for the record), but it turns out that scalability and reliability are usually significantly more important. There certainly exists software where this is not the case. It depends™.

The great thing about the Erlang ecosystem is that it makes writing distributed systems incredibly easy thanks to its processes and message passing. An Erlang process is a sort of lightweight thread with an extremely low footprint - millions of them can run on the same system. Processes communicate by means of sending messages (arbitrary data) back and forth. Processes can even talk to each other seamlessly and without any location information across node boundaries (a node refers to an instance of the BEAM) making distribution across machines trivial.

Reliability

As with performance, reliability is mostly a property of the virtual machine and standard library, and not so much the language. The BEAM and the OTP (Erlang’s standard library) provide many ways to handle errors in a distributed system:

  • Linked processes: If process A is linked to process B, A will receive a message when/if B somehow fails. A can then decide to restart B or perform some other kind of recovery. This ensures that even if some isolated part of a system crashes, it can be immediately restarted and serve requests again, without the rest of the system being affected.
  • Coordinated restarts: Through supervisor processes, OTP makes it easy to restart a set of processes (referred to as supervised children) if a single child process fails. Even supervisors can be supervised, resulting in highly reliable supervision trees.
  • Failover nodes: Should an entire node die (usually because of hardware or operating system problems), another node can be notified and immediately take over execution of the application the original node was running. Similarly, another node can take over execution on demand if a node is going to be shut down gracefully.
  • Hot code reloading: Modules (which contain executable code) can be reloaded at run-time even while the old code is still executing. The OTP makes code changes especially easy to handle through the gen_server module’s code_change/3 function. This feature means that systems do not have to be taken down for upgrades and downgrades.

Elixir gives you full access to the OTP libraries so you can do anything that you can do in Erlang when it comes to building distributed systems.

Beyond handling software and hardware failure, there is the concern that the virtual machine itself – i.e. the code written in C – should be extremely robust. While errors in pure Erlang code can be handled, errors in C code are usually fatal and will kill the virtual machine.

Robustness of code can of course not be measured in any useful way, but it is worth considering that the BEAM is a battle-tested virtual machine that was written almost 20 years ago. Countless man hours have gone into improving it, and it is used in many very large distributed systems such as Riak, CouchDB, RabbitMQ, and WhatsApp. Its robustness could reasonably be compared to that of HotSpot and the .NET CLR. You have to try very hard to break it.

Language Design

While Elixir takes some concepts from Ruby’s syntax (do blocks, optional parentheses, def for functions, etc), the semantics of Elixir are rather far removed from Ruby. It does not have classes/objects, global state, monkey patching, and so on since it is built on the Erlang VM and mostly follows the Erlang philosophy; that is, functional-style programming, no shared and mutable state, hot code reloading, and so on.

Some notable highlights of the language:

  • Expression syntax: Everything is an expression. if, case, cond, try, unless, receive, and so on are all expressions that result in a value when evaluated. This makes for super easy composition of code instead of littering it with mutable variables.
  • Pattern matching: Any value can be matched against using a syntax similar to (but slightly saner than) Erlang. This avoids if/else forests and makes expressing alternate code paths easy and elegant.
  • First-class functions: Functions are values in Elixir, and can be passed to other functions. This feature is at the core of functional programming and is what makes functions like foldl and foldr useful.
  • Closures: Elixir has full-fledged lexical closures as seen in other functional programming languages, making higher-order operations like map and reduce easy to use.
  • Records: Unlike Erlang, Elixir has significantly more useful records. In Elixir, a record consists of 3 things: A module, a list of fields, and a set of functions to manipulate and retrieve fields. Records have the same tuple representation as in Erlang, but record functions can be called directly on record values (similar to methods on classes, but certainly not the same concept).
  • Protocols: A common problem in Erlang is that extending APIs for new types is close to impossible if the API doesn’t allow passing in functions to handle custom types. In object-oriented languages, interfaces usually solve this problem. In Elixir, protocols can be used to dispatch function calls dynamically based on the type of a value.
  • Metaprogramming: Instead of Erlang’s C-like preprocessor, Elixir has Lisp-style hygienic macros. Such a macro system is significantly less error-prone and makes AST manipulation at compile-time trivial. In addition, all Elixir code inside macros and in module/record definitions is executed at compile-time making possibilities for code generation practically limitless.
  • Unicode strings: All strings in Elixir are encoded in UTF-8 as Erlang binaries. Similarly, all functions in the String module assume UTF-8 encoding. Globalization is much less of a pain than in other languages thanks to this.
  • Immutability: Everything is immutable - more or less. While all data structures are entirely immutable, state can be maintained on a per-process level. Processes also have the so-called process dictionary which can be used to maintain shared state if really necessary. It is generally frowned upon.
  • Variable rebinding: In Elixir, variables can be rebound to different values, even though everything is immutable. It turns out that this is useful in practice and doesn’t actually violate immutability (single assignment is not immutability). The compiler rewrites a variable rebinding as creating a new version of the variable, effectively transforming code into SSA (static single assignment) form.
  • Erlang interop: Calling Erlang/OTP functions from Elixir has no overhead and does not look much different from regular function calls. Elixir code can also use behaviors - a feature that helps in writing modules conforming to a certain interface.

Elixir actually has very few ‘original’ ideas. I think it’s reasonable to think of Elixir as taking various proven features from other programming languages, adapting them to the Erlang ecosystem, and putting them together. What this means is that you are unlikely to run into rough edges where the language seems unorthogonal and impractical to work with.

Speaking of orthogonality, one interesting thing to consider is that everything (barring literals such as strings, numbers, etc) is a function call. Let’s look at a simple module:

1
2
3
4
5
defmodule Math do
  def mul x, y do
    x * y
  end
end

Looks like a normal Elixir module. But here’s a twist: It’s all function calls. It is actually interpreted as this:

1
2
3
4
5
defmodule(Math, do:
  def(:mul, (quote do: [x, y]), [], do:
    quote do: x * y
  )
)

Further, even defmodule and def are macros! That’s about as orthogonal as it gets. Granted, there is some compiler magic to make defmodule and def actually work.

Functional Programming

At this point, if you’re mostly familiar with imperative and/or object-oriented languages, you may be a bit intimidated by the prospect of everything being immutable, and the general lack of anything resembling classes. That’s entirely understandable. Functional programming is very different from programming in languages like Ruby, Python, and Perl. A general rule of thumb is that you should focus on the data flowing through your program rather than focusing on your program’s high-level behavior. In other words, instead of coupling data tightly with algorithms, separate the two. Prefer recursion over mutable local state. Use higher-order functions or protocols where you would have used interfaces or classes.

It takes time to get used to functional programming, but once you stop thinking about objects and methods entirely, it becomes incredibly easy to solve real problems in functional languages.

Conclusion

Elixir is not a toy programming language. It’s built to be a better Erlang while using the same battle-tested virtual machine. It’s well-designed, incorporating proven language features, and especially friendly to metaprogramming and concurrency.

I hope I’ve managed to dispel the myth that Elixir is a ‘hipster’ language.

All of this being said, Elixir is still in development; that is, 1.0 hasn’t been reached yet. Don’t let this stop you from using it, however - the language and standard libraries (which complement OTP) rarely change in ways that break existing code. If you stick to released versions of Elixir, you shouldn’t run into any trouble other than the occasional name change between major versions.