A simple compiler implemented in Java using ANTLR and LLVM IR.
This project follows the structure of an IntelliJ-based Maven project.
- src/ – contains manually written source files
- src/main/java/SimpleCompiler.java – main class for launching the application
- src/main/resources/ – contains auxiliary resource files
- testdata/prog/ – example source programs written in the Simple language for testing the compiler:
- max.smpl – finds the maximum of values
- sort.smpl – sorts a list of strings
- shortest.smpl – shortest path in a graph (Floyd-Warshall algorithm)
- specification.pdf – The Simple language specification document
To build the project, Maven is used. The root directory contains a pom.xml with the necessary configurations, including the maven-assembly-plugin to create a java-executable JAR containing all dependencies.
To compile run
mvn install
which generates target/SimpleCompiler-1.0-SNAPSHOT-jar-with-dependencies.jar Resource files are loaded using CassLoader (location independent access), thus in case of manual compilation, you need to include the src/main/resources/ directory into CLASSPATH. (The pom.xml handles this automatically.)
JAVA 16 or higher is required.
To run the compiler:
java -jar target/SimpleCompiler-1.0-SNAPSHOT-jar-with-dependencies.jar <input_file> [-o <output_file>]
<input_file> – required, the source code file written in the Simple language
-o <output file> – optional, specifies the output file to store the generated LLVM IR (defaults to a.ll)
The output file is the program translated in the LLVM IR intermediate lenguage. You can compile it to an executable using clang:
clang -o <output> input.ll
The compiler has some important limitations, which make it not suitable for production use. It was created for educational purposes only and lacks production-level safety and robustness.
- syntax error handling:
- If the input source code is syntactically incorrect, ANTLR may produce a parse tree with null non-terminal nodes. These can cause
NullPointerExceptionduring compilation. These exceptions are caught inmain()and the stacktrace gets printed to the console, but compilation is aborted. - Semantic errors are handled manually thus do not throw runtime exceptions.
- If the input source code is syntactically incorrect, ANTLR may produce a parse tree with null non-terminal nodes. These can cause
- unhandled
mallocreturn values:- The Simple language defines strings as primitive types, but the compiler implements them allocated on the heap to survive function returns. Although all strings are correctly freed (the compiler was tested using valgrind), the return value of
mallocis not checked. If memory allocation fails (i.e.,mallocreturns a null pointer), the compiler will crash (or worse...).
- The Simple language defines strings as primitive types, but the compiler implements them allocated on the heap to survive function returns. Although all strings are correctly freed (the compiler was tested using valgrind), the return value of
- use of uninitialized strings:
- Using an uninitialized string in an expression will result in a crash due to null dereferencing. The specification explicitly states that such a use of uninitialized strings is undefined, so this behavior is compliant.
- missing return in string-returning functions:
- If a function that returns a string has a control flow without a return statement, the program will crash as the caller will try to deallocate an uninitialized string. Again, this is considered undefined behavior by the specification.