08 January 2014

Abusing the C Preprocessor

On a whim, I thought of this idea after reading Cory Li's article on bytecode hacking for Battlecode, MIT's premier Independent Activities Period competition, and after reading some code of his bot, which won Battlecode 2012. Optimized code gets real messy real fast, so wouldn't it be nice if it could be (just a little) easier to write?

The C and sibling languages have a wonderful feature called preprocessing. This feature allows one to insert snippets of text character for character into another file, so as to save time and space writing redundant code. The most prevalent use of the preprocessor is for including header files:

#include <stdio.h>

int
main()
{
    printf("%s\n", "Hello, world!");
    return 0;
}

Another use of the preprocessor is to define macros. For example, in competitive programming, one iterates from 0 (inclusive) to N (exclusive) reasonably often, so why spend the time to type

for (int i = 0; i < N; ++i)
{
   // ...
}

when it can be compressed to

F(i, N)
{
   // ...
}

all with the simple macro

#define F(i, N) for (int i = 0; i < N; ++i)

Now it's great that C offers this, but what about languages that don't, such as Java? Is there a way to make the C preprocessor do this for us?

The answer, unsurprisingly, is yes! We can use cpp (that's c preprocessor)

cpp test.c

to find all the macros in our code, as defined by #define, replaces them with the value with which they are defined, and removes the #define. However, it leaves some junk in the code, namely some lines at the beginning of the files starting with #, so we have to remove those before giving the preprocessed file to javac.

So how do we put all of this together? We first write C-ified Java code Test.cjava with our macro

import java.util.*;

#define F(i, N) for (int i = 0; i < N; ++i)

public class Test {
    public static void main(String[] args) {
        F(i, 5) {
            System.out.println(i);
        }
        System.exit(0);
    }
}

and preprocess it with cpp to generate the almost-legitimate Java code. We can strip the aforementioned lines that start with # by passing the output of cpp to sed with a regex that deletes any line that starts with # and redirect the sanitized output to our desired Java file

cpp Test.cjava | sed '/^#/d' > Test.java

Finally, we compile Test.java and run it with java Test, which will produce the desired output

0
1
2
3
4