Nilan is a programming language I am currently developing for fun 🚀, implemented in Go. My goal is to learn more about how programming languages work under the hood and to explore the different pipelines involved — from taking source code as input to making the CPU execute instructions 🤖.
ℹ️ At the moment, I have decided to stop working on the tree-walk interpreter and parser. Instead I am starting to develop the compiler and the VM. I will compile the tokens into bytecode and then have a virtual machine which executes the bytecode. For the moment, Nilan language will no longer have an Abstract Syntax Tree (AST).
✅ Arithmetic expressions: +, -, *, /
✅ Lexical scope
✅ Block scope: {}
✅ Comparison operators: >, >=, <, <=, ==, !=
✅ Boolean literals: true, false
✅ String literals: "hellow world"
✅ Control flow: if, else
✅ Logical operators: and, or
✅ Boolean literals: true, false
✅ null literal
✅ Parenthesized expressions
✅ Variable identifiers and names
✅ Assignment statements (e.g., var a = 2)
✅ Unary operations: logical not !, negation -
✅ REPL (Read-Eval-Print Loop) for interactive testing
✅ Execute source code from a file.
The following are not supported yet:
🔴 Functions and function calls
🔴 Classes, structs, interfaces
🔴 Inheritance
🔴 Arrays or other complex data structures
🔴 Control flow: loops, else if, break
🔴 Logical operators: not
🔴 Exponentiation or other advanced operators
🔴 Tree-Walk interpreter
🔴 Complex features such as Module/package imports, etc ...
- Implement existing language features in tree-walk interpreter and parser (AST generator) into the compiler and VM 👷 (In progress)
- Add suppor for
else if,break - Add support for functions and function calls.
- Add support for structs.
Nilan’s syntactic grammar is defined using ISO Extended Backus–Naur Form (ISO EBNF), conforming to ISO/IEC 14977. It represents the rules used to parse a sequence of tokens into an Abstract Syntax Tree (AST)
program = { declaration }, EOF ;
declaration = variable-declaration | statement ;
variable-declaration = identifier , "=" , expression ;
statement = expression
| if-statement
| print-statement
| while-statement
| block-statement ;
if-statement = "if" , expression , statement , [ "else" , statement ] ;
print-statement = "print" , expression ;
while-statement = "while", expression, statement ;
block-statement = "{" , { declaration } , "}" ;
expression = assignment-expression ;
assignment-expression = IDENTIFIER, "=", assignment-expression
| or-expression ;
or-expression = and-expression , { "or" , and-expression } ;
and-expression = equality-expression, { "and", equality-expression } ;
equality-expression = comparison-expression, { ("!=", "=="), comparison-expression } ;
comparison-expression = term-expression, { (">" | ">=" | "<" | "<="), term-expression } ;
term-expression = factor-expression, { ("+" | "-"), factor-expression } ;
factor-expression = unary-expression, { ("*" | "/"), unary-expression } ;
unary-expression = ("!" | "-"), unary-expression
| primary-expression ;
primary-expression = FLOAT
| INT
| IDENTIFIER
| "true"
| "false"
| "null"
| "(", expression, ")" ;
This grammar is not left-recursive because none of the non-terminals start their production with themselves on the left side. Each rule begins with a different non-terminal or terminal before any recursion happens. For example, equality starts with comparison,comparison starts with term, etc...
Each line defines a production rule in the form:
nonterminal = definition ;- A nonterminal (e.g.,
term,factor) is a named syntactic category made of other rules. - A definition consists of terminal symbols (token literals), other nonterminals, and notation operators.
| Type | Example | Description |
|---|---|---|
| Nonterminal | term |
Named construct that expands into other rules |
| Terminal | '+', 'true' |
Fixed token literals enclosed in single quotes |
💡 Note: Tokens like
'INT'and'FLOAT'are token types returned by the lexer, not literal characters.
| Symbol | Meaning | Example |
|---|---|---|
= |
Rule definition | `term = factor , { ('+' |
; |
End of rule | Every rule ends in a semicolon |
, |
Sequence | a , b means a followed by b |
| ` | ` | Alternatives |
{ ... } |
Zero or more repetitions | { a } means repeat a zero or more times |
( ... ) |
Grouping | Used to group alternatives or sequences |
[ ... ] |
Optional | Used to speficy optional implementation, for example an else clause |
Example rule:
term-expression = factor-expression , { ( '+' | '-' ) , factor-expression } ;Means:
- A
term-expressionconsists of:- A
factor-expression, followed by - Zero or more repetitions of:
- Either
'+'or'-', and - Another
factor-expression
- Either
- A
33 + 53 - 4 + 2
Precedence from lowest to highest is encoded in the grammar structure itself:
| Precedence Level | Operators | Grammar Rule |
|---|---|---|
| Lowest | Equality: ==, != |
equality |
Comparison: >, <, etc |
comparison |
|
Additive: +, - |
term |
|
Multiplicative: *, / |
factor |
|
Unary: -, ! |
unary |
|
| Highest | Parentheses, literals | primary |
💡 Lower-precedence rules contain (as components) higher-precedence expressions. This structure ensures operators like
*bind more tightly than+. For example, the expression5 * 5 + 10 + 2is parsed as(5 * 5) + 10 + 2.
Handles addition and subtraction:
+ , -
Example:
3 + 5 - 2
Handles multiplication and division:
* , /
Example:
4 * 2 / 8
Handles unary operations like logical not and negation, with recursive chaining:
! , -
Examples:
--5
!(-3)
Handles literals and parenthesized expressions:
(FLOAT | INT | true | false | null | '(' expression ')')
Examples:
(5 + 3)
true
Parsing order according to precedence:
- Multiplication
*byfactorrule - Addition
+bytermrule
Result:
- Multiply
2 * 3first - Add
1 + (2 * 3)
+
/ \
1 *
/ \
2 3
Expressed as:
Binary(
Left=Literal(1),
Op='+',
Right=Binary(
Left=Literal(2),
Op='*',
Right=Literal(3)
)
)| Expression | Grammar Rule |
|---|---|
1 |
primary → INT |
2 * 3 |
factor (multiplication) |
1 + (...) |
term (addition) |
These examples will not parse correctly with the current grammar:
1 +
2++
2--
2+=
2-=
2**2
-
Update the Lexer (optional) If new token types need to be introduced,
token.goandlexer.goneed to be modified. -
**Extend the
ExpressionVisitororStmtVisitorinterfaces Depending on the type of new syntax introduced, make sure to add the corresponding visit method to one of the interfaces. -
**Add a new AST node to
expressions.goorstatements.goDepending on the type of new syntax introduced, make sure to add the corresponding AST nodestructtoexpressions.goorstatements.godepending if its an expression or statement node. -
Extend the Parser Extend the Parser to handle the new syntax grammar by adding a method which creates an AST node.
-
Extend the Interpreter Extend the interpreter to execute the the new AST node returned by the parser. This will involve implementing the method added to the
ExpressionVisitororStmtVisitorinterfaces
git clone https://github.com/informatter/nilan.git
cd nilan
go install .Once installed there are three main commands than can be used. The first two are still based of the tree-walk interpreter and the third command now uses the compiled version of nilan that is under development.
1. REPL
Start a REPL session
nilan repl2. Run
Compiles the specified file and executes it directly
nilan run hellow_world.ni💡If changes are made to the code, run go install . once again so a new binary is created with the new changes.
For iterative development is recommended to simply run:
go run . -- repl or go run . -- run <file-name>
1. Emit
Emits the bytecode representation, or the diassembled bytecode representation from a nilan source code file. This command is useful for debugging purposes when developing the compiler.
nilan emit arithmetic.ni2. REPL
Start a REPL session, optionally write the encoded bytecode as hexadecimal to a .nic file or diassemble the bytecode and dump it to a .dnic file
nilan cReplTo see all available flags:
nilan cRepl --help💡If changes are made to the code, run go install . once again so a new binary is created with the new changes.
For iterative development is recommended to simply run:
go run . -- cRepl or go run . -- emit <file-name>
Run tests for a specific package, e.g., lexer:
go test ./lexerRun all unit tests recursively:
go test ./...Format a particular package:
go fmt ./lexerFormat all Go files:
go fmt ./...