alexrp’s blog

ramblings usually related to software

Flect Versus Conditional Compilation

Or, “Justifying the Use of a Preprocessor in 2013”.

I’ve spent a long time thinking about how I wanted to implement conditional compilation in Flect. Conditional compilation is a reality of portable systems programming languages, by definition; you program close to the system (or the machine). So you need some way to have some particular code for one system and some for another.

Some would argue that this can all be handled in the build system. I disagree with this idea for two reasons:

  • It’s more robust to rely on the compiler to know what the target is rather than having a disaster such as the Autoconf ecosystem. The build system should ask the compiler what the target is - not the other way around.
  • The assumption that a low-level module can expose the same exact API for all targets (such that callers don’t need conditional compilation) is most of the time invalid, in my experience.

But what system should be used for conditional compilation? There are many different ways to deal with this problem, but in a language that has C linkage semantics, options are somewhat limited. In general, I think there are three ways to deal with the problem:

  1. Use a preprocessor that runs before parsing.
  2. Build conditional compilation statements directly into the language.
  3. Utilize Lisp-style macros and some compiler query magic.

The first approach is what C and most other C-family languages use. The second approach is what D and Nimrod do. The third approach is not all that common from what I’ve seen, but it is what the Boo language does.

I dislike the second approach because it complicates the parsing stage rather significantly. The grammar has to be extended in so many places to allow conditional compilation that it’s just not worth the implementation effort.

The third approach would probably be OK if done right (but I confess I’m not sure what “right” means here). In the case of Flect, though, it wouldn’t be sufficient because Flect only allows macro expansion at expression level.

So, evil as it may seem, I settled on using a preprocessor for Flect. I want to explain in this post why I made this decision and how Flect’s preprocessor is much saner than that of e.g. C.

How It Works

Other than having a different directive character, Flect’s preprocessor is very similar to the syntax C# and F# use at first glance. But as we’ll see, it has some significant differences. A simple example:

1
2
3
4
5
6
7
8
9
pub mod my_mod {
    pub fn my_fn() -> i32 {
\if Flect_CPU_ARM
        42;
\else
        0;
\endif
    }
}

When compiled for an ARM processor, the above code returns 42, while for all other processors, it returns 0. The reason for the somewhat odd \ character is that # is already used for comments.

We can do more elaborate checks:

1
2
3
4
5
6
7
8
9
pub mod my_mod {
    pub fn my_fn() -> i32 {
\if (Flect_CPU_ARM || Flect_CPU_MIPS) && !Flect_OS_FreeBSD
        42;
\else
        0;
\endif
    }
}

This returns 42 on all ARM and MIPS target processors when the target OS is not FreeBSD.

We can also use an arbitrary number of \elif directives before an \else or \endif directive is reached:

1
2
3
4
5
6
7
8
9
10
11
12
13
pub mod my_mod {
    pub fn my_fn() -> i32 {
\if Flect_CPU_ARM
        42;
\elif Flect_CPU_X86
        21;
\elif Flect_CPU_PowerPC
        11;
\else
        0;
\endif
    }
}

This code will return 42 on ARM processors, 21 on x86 processors, 11 on PowerPC processors, and 0 everywhere else.

Arbitrary Boolean expressions can be used in conditions. ! negates a Boolean value. || is a short-circuiting conditional OR. && is a short-circuiting conditional AND. Parentheses can be used to control precedence. Identifiers can be used to refer to names defined by the compiler, passed via the --define flag, or set/unset with \define and \undef. A reference to an identifier evaluates to true if it is set, and false if it is not. Finally, the literals true and false are allowed.

\define and \undef are pretty straightforward:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
pub mod my_mod {
\if (Flect_CPU_ARM || Flect_CPU_MIPS) && !Flect_OS_FreeBSD
\define Forty_Two
\endif

    pub fn my_fn() -> i32 {
\if Forty_Two
        42;
\else
        0;
\endif
    }

\undef Forty_Two
}

There are a few rules pertaining to \define and \undef:

  • Defining an already-defined identifier is an error.
  • Undefining an already-undefined identifier is an error.
  • Defining or undefining an identifer starting with Flect_ is an error.

The last rule is to prevent interfering with compiler-provided identifiers (which tell the code what processor, operating system, application binary interface, etc is being targeted). There is a well-defined set of identifiers that the Flect compiler will use. The list is available here but will of course be in the specification once I get around to updating it.

Finally, the \error directive can be used to flag a code path as invalid:

1
2
3
4
5
6
7
8
9
10
11
12
13
pub mod my_mod {
    pub fn my_fn() -> i32 {
\if Flect_CPU_ARM
        42;
\elif Flect_CPU_X86
        21;
\elif Flect_CPU_PowerPC
        11;
\else
\error "Unsupported CPU!"
\endif
    }
}

The \error directive triggers a compile-time error if the path that it’s located in is reached during preprocessing.

Questions and Concerns

So no macros?

Nope! If you need macros, Flect already has a built-in, hygienic/sane, Lisp-style mechanism for that. The preprocessor is meant for conditional compilation. Nothing else.

The support for macros is probably the main reason that the C preprocessor has gotten such a bad name among programming language enthusiasts and working programmers alike. There are good reasons why that is, and there’s no reason to repeat the mistakes of C.

What about #include?

No. Flect has a proper module system, so there is no need for such a construct in its preprocessor.

Having said that, #include is sometimes used to import arbitrary text into strings at compile time, which is a sort-of-reasonable thing to do. Flect will probably get some other mechanism to do that in its CTE (compile-time evaluation) engine at some point.

Can the preprocessor in any way affect the program text?

No. It does not perform textual transformations at all. It only operates on directives starting with a \ (a character which is not used anywhere else in the language other than inside strings, where the lexer won’t treat it as a directive).

The preprocessor cannot do anything other than decide code paths statically.

How is the preprocessor implemented?

Unlike other languages, Flect’s preprocessor runs after lexing and before parsing. This makes the name somewhat of a misnomer, but it’ll be instantly familiar to C-family programmers.

The preprocessor constructs a simple statement/expression AST by parsing directives as statements, conditions as expressions, sections of tokens as section statements, and non-directive tokens as token statements. It then recursively evaluates the AST and throws away sections that are not live. The code is remarkably simple compared to traditional preprocessors. It sits here and here

In other words, the preprocessor basically consists of a simplistic parser and an AST evaluator.

What about static analysis complication?

A preprocessor does make static analysis harder because the static analyzer has to do one of two things:

  1. Attempt to evaluate all possible code paths and unify them or analyze them independently of each other.
  2. Run the preprocessor for the target configuration and just not analyze code paths that aren’t relevant to the target.

The first approach, while not impossible, is very complicated and confusing. It may even be pointless, since it is unlikely that the programmer can fix the code in a target-specific path without actually being able to compile and test for that particular target.

The second approach is significantly easier (trivial, even). It’s what most static analyzers for languages with conditional compilation actually do.

OK, but what about program refactoring?

This is where things get a little hairy. It’s easy to deal with the case where a function is wrapped in an \if directive, or the case where a function has an \if directive in its body that also terminates in its body. But what about the case where the text inside the \if goes beyond the function body? Or the \if directive starts before the function declaration and only covers part of its body?

Arguably, you’re a terrible person if you write code like that, but it’s still a valid concern. The right thing to do here depends entirely on what kind of code we’re looking at and what kind of transformation we intend to do. Most refactoring tools I have dealt with simply erase the entire directive, which is obviously not ideal.

I admit that I don’t know what the best solution here is. This is the only real problem with Flect’s preprocessor that I have no definite solution to. Still, most other conditional compilation systems have problems similar to this one, so I wouldn’t actually call it a weakness of the preprocessor model.

Conclusion

Hopefully I’ve made it clear that Flect’s preprocessor is significantly saner than that of C, and that it doesn’t complicate things too much. It still has problems when it comes to refactoring, but those are mostly inherent to conditional compilation in general.

I admit that I wish a more elegant solution could have been engineered, but I’m not sure what it would look like and how it would actually work. I decided on a preprocessor in the interest of moving forward with the language and because it’s a pragmatic enough approach to conditional compilation.