befunjit-86 is a just-in-time compiler for Befunge-93. It works on Linux x86-64 and it only generates code that follows the System V AMD64 ABI.
This is an educational/for-fun project - I do not recommend using this for any critical infrastructure!
I've made this project because 1) I wanted to learn how a JIT compiler works and 2) Befunge-93 was designed to be as hard as possible to write a compiler for due to its reflexive nature.
Indeed, a fully AOT compiler is impractical so a JIT compiler is the next best thing.
The befunjit-86 JIT traverses a file, following ^<v>#"_|?@ and creates a graph of "static" paths.
Machine code is generated for each path, after which they're linked against each other. Execution starts by calling into the generated machine code.
The generated code for the p instruction writes back to the playfield and if it alters cells that are part of any static path then it triggers a re-compile.
By default befunjit-86 starts by interpreting the first hundred or so steps. If the program did not terminate then the optimizing JIT is called and execution continues. The JIT is called every time the program self mutates up to some threshold; when this is reached the optimizations are turned off. If the program continues to self-mutate beyond another threhold then execution continues in the interpreter.
The JIT outperforms the interpreter for programs that seldom self-mutate. Otherwise the interpreter is kicks in automatically.
Take for example a hello world program:
v
v"Hello World!"<
>v
,:
^_@
The graph contains 3 paths:
- starting from <0, 0, east>, changing directions, going through "!dlroW olleH",
:and until_ - starting from the location of
_, going west and coming back to_ - starting from the location of
_, going east and ending immediately at@
All 3 paths can be compiled separately; the jump targets can be patched in after the machine code for each path is laid in memory.
^<v>#" and do not generate any code and already a compiled path requires less steps/operations than an interpreter.
The first optimizing pass performs constant folding:
- sequences like
12+rewrite to3; the same is done for-and* 1+2+becomes3+; the same is done for combinations of+and-2*3*becomes6*abcg*becomesbcga*; this allows strength reduction in a later pass:+becomes2*0\-becomes(-1)*1:becomes11; even though1:does not typically appear in befunge source code, it can result from previous fold passes\\becomes ``:$becomes ``
These are followed by a pass that performs strength reduction:
abgwhereaandbare known at compile time generate much less code than in the general caseabpwhereaandbare known at compile time bypass a few checksa+whereais known generates fewer instructions than even a simple+a-whereais known generates fewer instructions than even a simple-a*whereais known to be -1, 0, 1, 3, 5 and any powers of two avoids multiplication\-generates faster "reverse subtraction" code\`generates as much code as a simple`\`!generates as much code as a simple``!generates as much code as a simple`
Finally, one more pass aims to bypass the stack alltogether when executing some operations consecutively: add, sub, sub-reverse, mul, square.
The script ./bench/run.sh <befunge-source-file> runs perf stat -r 100 on the JIT with a few arguments and the interpreter.
The mandelbrot and snowflake programs were found in the wild: mandelbrot and snowflake.
The count-down-mutate and count-down-mutate-long programs were crafted to execute as slow as possible in the JIT. They were both designed to trigger as many recompilations as possible.
For the two "natural" programs the JIT is ~23 times and ~15 times faster than the interpreter. The JIT with optimizations is ~2.8 and ~2.5 faster than without.
For the two JIT-breaking programs the external interpreter is ~21 and ~35 times faster than the JIT (pure, called with --no-interpreter).
cmake -S . -B build-release/ -D CMAKE_BUILD_TYPE=Release
cmake --build build-release/
b86 <befunge-source-file>
b86 --stack-size <size> <befunge-source-file>
Some options and subcommands that help with debugging and are used by the test runner:
b86 --never-opt <befunge-source-file>
never uses the optimizing code generator.
b86 --always-opt <befunge-source-file>
always uses the optimizing code generator.
b86 --no-interp <befunge-source-file>
only uses the jit.
b86 read-playfield <befunge-source-file>
reads a befunge source file and outputs the playfield as it's represented in memory.
b86 find-pathlet <befunge-source-file>
finds the path starting from 0,0 going east and outputs it.
b86 find-graph <befunge-source-file>
traverses the entire source and outputs all reachable paths.
b86 find-graph-optimize <befunge-source-file>
traverses the entire source, optimize and output all reachable paths.
b86 run-line <befunge-source-file>
runs a single line disregarding any of ^<v>#.
All lines should end with @; any of _|? should not be used as there is no graph and no paths.
The test runner uses node/JS. Install the test runner dependency via npm i and run the tests via npm test.