Skip to content

Commit af11179

Browse files
committed
add readme
1 parent 1278246 commit af11179

File tree

1 file changed

+165
-0
lines changed

1 file changed

+165
-0
lines changed

README.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Halcyon: A C Compiler in Haskell
2+
3+
Halcyon is a work-in-progress compiler for a large subset of C, written in Haskell. It targets the x86_64 instruction set architecture. This project focuses on implementing the core compiler functionality while leveraging existing system tools for preprocessing, assembly, and linking.
4+
5+
## Current Status
6+
7+
The compiler currently handles the simplest subset of C programs: functions that return integer constants. For example:
8+
9+
```c
10+
int main(void) {
11+
return 42;
12+
}
13+
```
14+
15+
### Compilation Pipeline
16+
17+
The compiler processes source code through the following stages:
18+
19+
1. **Lexical Analysis**: Breaks source code into a sequence of tokens
20+
2. **Parsing**: Converts tokens into an Abstract Syntax Tree (AST)
21+
3. **Code Generation**: Transforms AST into x86_64 assembly
22+
4. **Code Emission**: Outputs the assembly code to an executable
23+
24+
### Internal Representations
25+
26+
Programs are represented internally using a series of increasingly lower-level data structures:
27+
28+
1. **Abstract Syntax Tree (AST)**:
29+
```haskell
30+
data Program = Program FunctionDef
31+
data FunctionDef = Function
32+
{ name :: Text
33+
, body :: Statement
34+
}
35+
data Statement = Return Expr
36+
data Expr = Constant Int
37+
```
38+
39+
2. **Assembly AST**:
40+
```haskell
41+
data Program = Program FunctionDef
42+
data FunctionDef = Function
43+
{ name :: Text
44+
, instructions :: [Instruction]
45+
}
46+
data Instruction = Mov Operand Operand | Ret
47+
data Operand = Imm Int | Register
48+
```
49+
50+
## Project Structure
51+
52+
```
53+
lib/
54+
├── Halcyon/
55+
│ ├── Backend/ # Code generation and emission
56+
│ │ ├── Codegen.hs # AST to Assembly conversion
57+
│ │ └── Emit.hs # Assembly to text output
58+
│ ├── Core/ # Core data types and utilities
59+
│ │ ├── Assembly.hs # Assembly representation
60+
│ │ ├── Ast.hs # C language AST
61+
│ │ ├── Monad.hs # Compiler monad stack
62+
│ │ └── Settings.hs # Compiler settings and types
63+
│ ├── Driver/ # Compiler driver
64+
│ │ ├── Cli.hs # Command line interface
65+
│ │ └── Pipeline.hs # Compilation pipeline
66+
│ └── Frontend/ # Parsing and analysis
67+
│ ├── Lexer.hs # Lexical analysis
68+
│ ├── Parse.hs # Parsing
69+
│ └── Tokens.hs # Token definitions
70+
```
71+
72+
### Architecture
73+
74+
The compiler uses a monad transformer stack to handle IO operations and error management:
75+
76+
```haskell
77+
newtype CompilerT m a = CompilerT
78+
{ unCompilerT :: ExceptT CompilerError m a }
79+
80+
type Compiler = CompilerT IO
81+
```
82+
83+
This provides:
84+
- Error handling through `ExceptT`
85+
- IO capabilities through the underlying monad
86+
- Clean separation of pure and effectful code
87+
- Structured error reporting and recovery
88+
89+
## Command Line Interface
90+
91+
```bash
92+
halcyon [OPTIONS] FILE
93+
94+
Options:
95+
--lex Run lexical analysis only
96+
--parse Run parsing only
97+
--codegen Run through code generation
98+
-S Stop after assembly generation
99+
-h,--help Show help text
100+
```
101+
102+
### Build and Run
103+
104+
```bash
105+
# Build the project
106+
cabal build
107+
108+
# Run the compiler
109+
cabal run halcyon -- [OPTIONS] input.c
110+
111+
# Example: Compile a file
112+
cabal run halcyon -- input.c
113+
114+
# Example: Run only the lexer
115+
cabal run halcyon -- --lex input.c
116+
```
117+
118+
## External Dependencies
119+
120+
Halcyon relies on the following system tools:
121+
- **GCC**: For preprocessing C source files (`gcc -E`)
122+
- **Assembler**: For converting assembly to object files
123+
- **Linker**: For producing final executables
124+
125+
Make sure these tools are installed and available in your system path.
126+
127+
## Error Handling
128+
129+
The compiler provides detailed error reporting for:
130+
- Lexical errors (invalid characters, malformed numbers)
131+
- Syntax errors (invalid program structure)
132+
- Semantic errors (coming soon)
133+
- System errors (file I/O, external tool failures)
134+
135+
## Future Plans
136+
137+
### The Basics
138+
- [x] A minimal compiler
139+
- [ ] Unary operators
140+
- [ ] Binary operators
141+
- [ ] Logical and relational operators
142+
- [ ] Local variables
143+
- [ ] if statements and conditional expressions
144+
- [ ] Compound statements
145+
- [ ] Loops
146+
- [ ] Functions
147+
- [ ] File scope variable declarations and storage-class specifiers
148+
149+
### Types Beyond Int
150+
- [ ] Long integers
151+
- [ ] Unsigned integers
152+
- [ ] Floating-point numbers
153+
- [ ] Pointers
154+
- [ ] Arrays and pointer arithmetic
155+
- [ ] Characters and strings
156+
- [ ] Supporting dynamic memory
157+
- [ ] Structures
158+
159+
### Optimizations
160+
- [ ] Optimizing TACKY programs
161+
- [ ] Register Allocations
162+
163+
## Contributing
164+
165+
This is a personal learning project following the book "Writing a C Compiler" by Nora Sandler. While it's not currently open for contributions, feel free to use it as a reference for your own compiler projects.

0 commit comments

Comments
 (0)