This repo contains the code for a C compiler project written in Rust, based on the Writing a C Compiler book by Nora Sandler.
The project began in a Rust study group at Trend Micro while I was interning there. After the internship was over, I completed the project independently, and am still iterating to make the compiler generate more efficient code.
Performance Achievement: The compiler now beats or ties GCC -O0 on 4 out of 5 benchmarks (array_sum, matmul, bitwise, struct_bench), with only fib being ~5% slower. This demonstrates that effective mid-level IR optimizations and smart register allocation can match or exceed GCC's baseline performance.
Disclaimer: the vast majority of the project is vibe-coded using a variety of Agentic IDEs like Copilot and Antigravity.
That makes the project quite competitive with Anthropic's version, since I'm spending $0, while they spent $20,000. My only remaining work is to fully cover the C language, and use the compiler on bigger projects, like Linux.
The most exciting part. Currently only works on Windows, Linux compatability is coming soon. Run release.exe and point to any C file. The compiler appears to still be rather buggy so only primitive programs work, despite the massive chunk of "features," so it is a big WIP. The ultimate stress test may not necessarily be Linux, but rather donut.c.
To build a new release from the existing files, run the following and copy the created target/run/driver.exe into the home directory.
cargo build --bin driver --release
The compiler is built in Rust using a multi-stage pipeline. It orchestrates preprocessing via GCC, followed by custom lexing, parsing, and semantic analysis to ensure code validity.
The backend lowers the Abstract Syntax Tree into a Static Single Assignment (SSA) based Intermediate Representation. This IR is then optimized and converted into x86-64 assembly, which is finally assembled and linked using standard system tools.
Tokenizes C source code using regex-based patterns. Key function: tokenize() converts input text into a vector of tokens (identifiers, keywords, operators, literals).
Implements recursive descent parsing to build an Abstract Syntax Tree (AST). Key function: parse_program() consumes tokens and produces a structured tree of statements and expressions.
Validates program semantics including type checking, symbol resolution, and scope validation. Key function: analyze_program() traverses the AST and reports semantic errors before code generation.
Converts AST to Static Single Assignment (SSA) form with basic blocks and phi nodes. Key function: lower_program() transforms high-level constructs into a linear IR suitable for optimization.
Applies optimization passes including constant folding, dead code elimination, and strength reduction. Key functions: strength_reduce_function() replaces expensive operations with cheaper equivalents (e.g., multiply by power-of-2 becomes shift), optimize_function() performs constant propagation and DCE.
Generates x86-64 assembly with register allocation using graph coloring. Key functions: allocate_registers() assigns physical registers to SSA variables via interference graph coloring, gen_program() emits AT&T syntax assembly from IR.
The compiler supports a substantial subset of the C language including:
- Basic types:
- Standard types:
int,char,void,float,double, and pointers - Unsigned types:
unsigned int,unsigned char,unsigned short,unsigned long,unsigned long long - Long types:
short,long,long longwith proper size semantics (char=1, short=2, int=4, long=8 bytes) - Complex type specifiers:
unsigned long long,signed short, etc.
- Standard types:
- Function pointers: Full support for function pointer types, assignment, and indirect calls
- Structs: Full support for struct definitions, field access (
.), and pointer member access (->) - Union types: Full support for union definitions with overlapping memory layout where all fields share the same offset
- Arrays: Single and multi-dimensional array indexing with automatic decay to pointers
- Pointer arithmetic: Full support including:
- Array decay to pointers (e.g.,
int *p = arr) - Pointer subscripting with proper scaling (
p[i]correctly advances by element size) - Pointer arithmetic operations (
p + n,p - q) - Pointer comparisons (
p < q,p == NULL) - Address-of and dereference operators (
&x,*p) - Note: For arithmetic expressions, use subscript notation
p[i]rather than*(p + i)
- Array decay to pointers (e.g.,
- Control flow:
if,else- conditional executionwhile,for,do-while- loopsswitch,case,default- multi-way branching with fallthrough supportbreak,continue- loop control
- Expressions: Arithmetic, relational, logical, and bitwise operations
- Functions: Definitions, declarations, and recursive calls
- Global variables: Initialized and uninitialized globals with proper RIP-relative addressing
The compiler generates position-independent x86-64 assembly compatible with Windows (MinGW) and targets modern Intel/AMD processors.
Run the full test suite with:
cargo testIndividual test files are located in the testing/ directory. Each test file uses a // EXPECT: <exit_code> annotation to specify the expected program exit code.