NOTICE: This project is in motion. Since AWS has popularized working backwards, I no longer am ashamed of writing full-fledged READMEs and alike before even doing any real implementation. If I did it the other way around, it's very likely my codebase gets messy and also that I prioritize the wrong stuff for getting it into a stable and productive state quickly. Hence, what you are reading is an abstract of what is going to happen in the future. This is a really exciting project for me, maybe this can be infectious to someone else.
This program is an implementation of a ZeroJava source code to zMIPS assembly
compiler and is based of the TrustworthyComputing
ZeroJava-compiler
reference implementation. zMIPS assembly serves as input for the
TrustworthyComputing Zilch zero-knowledge proof framework.
First off, it's not really a compiler, it's (in my opinion) actually a transpiler, because zMIPS assembly is not executable and only an abstraction of byte-code. Compiler is a pretty big word and even though a big challenge for me, getting from an abstract syntax tree (AST) to optimized zMIPS assembly is not really what I think of as compilation, when I look at true bytecode compilers.
I found the dependency tree and build environment of the ZeroJava-compiler
reference implementation to be rather fragile and wasn't even able to build it
with JDK21. Also, I'm not a fan of Java programs - not because of the language,
but because of the ecosystem, which is highly opinionated and puts policies over
mechanisms. Basically I feel forced to work a certain way, even though that is
in no way necessary to get some functional output. In addition, I consider
ZeroJava-compiler
to be a reference implementation, so I'm not expecting it to
be production-grade.
The ZeroJava-compiler
is considered the frontend of the Zilch framework
. My
long-term goal is to also rewrite the backend and port its CUDA kernels to
OpenCL, and overall make its GPGPU facilities optional. That way the framework
would be uniform and easily portable onto different platforms, especially legacy
hardware. I bought a 10 year old Tesla K80 GPU recently at a very low price and
was astounded by how powerful it is, even for today's standard. That cost to
performance ratio inspired me to focus on legacy support. Moving GPGPU away
from Nvidia's proprietary CUDA SDK avoids vendor lock-in and extends the
framework's compatibility onto a larger set of GPU hardware.
As a PoC, my stretch goal is to port the entire Zilch framework to the ESP32 platform. It's going to be (really) slow, probably not even possible (due to memory constraints when emulating the zMIPS architecture), but figuring this out is going to be fun anyway. I enjoy working with constrained execution environments.
By tackling these hardware constraints, I aim to broaden my understanding of cryptographic design principles. This is a personal exploration of cryptographic applications and the type of workflow it requires to provide source code to someone that knows the ins-and-outs and can audit it from an academic standpoint. I have some concepts of applications for decentralized-autonomous-organizations (DAO) floating around in my head, but I want to familiarize myself with the fundamentals first, so that I'm not dependent on hearsay.
I considered noting the ZeroJava grammar as EBNF and use flex and bison as parser generators, but decided on implementing an idiomatic recursive-descent parser from scratch instead, since integrating flex and bison into Rust and maintaining it seemed rather tedious. Also, it's always a good exercise to write a parser from scratch.
I am currently taking a closer look at the initial release of GCC (0.9) and some C99 compiler implementations to get an alternate perspective on how a simple, yet robust compiler design might be achieved, in comparison to what's considered modern nowadays.
This is the rough road-map. I will implement most by hand, as this is a very nice exercise for me and also I see potential in greatly improving on the reference implementation to allow for further collaboration on zero-knowledge proof implementations.
- setup an end-to-end build environment with continuous integration (CI)
- implement ZeroJava tokenizer
- implement ZeroJava abstract syntax tree (AST) parser
- implement zMIPS intermediate representation (IR) generator
- implement zMIPS assembly code generator
- implement zMIPS IR optimizer
- implement lightweight comparative fuzzing and benchmarking framework
Starting with CI first should make it easier to understand the workflow I am aiming for without having to resort to describing everything in detail right away. CI forces me to define a clear interface for automation, which coincidentally might help others in getting a mind map of my workflow. Also, it's nice not having to worry about distribution later on...
This is a short description of all the files and directories in use by this project.
File name | Description |
---|---|
.gitignore | Git-specific globbing patterns for excluding files and directories from being treated by Git. |
.vimrc | Configuration of VIM text editor. I'll be working within different environments and am not always going to configure them from scratch. VIM luckily allows for defining configurations on a per-directory basis, that's why I'm shipping the configuration with this repository. |
ARCHITECTURE.md | Detailed description of design choices and implementation details |
bitbucket-pipelines.yaml | Bitbucket Pipelines CI service specification. Defines a sequence of shell commands to be executed when commits are pushed to the remote Git server. |
Cargo.toml | Rust project specification (for cargo build driver)
|
Cargo.lock | Lockfile of cargo build driver (autogenerated) |
configure | Generated GNU Autoconf autoconfiguration script to initialize the build environment. This is intended for use by CI services, so that I can be flexible when choosing/switching build environments (basically a Linux container image) and get some verbosity for quickly debugging misconfigurations. |
configure.ac | Source file for GNU Autoconf for generating a (POSIX) shell
autoconfiguration script to initialize the build environment. Every time
the file changes, autoconf must be executed, in order to
regenerate the configure file. |
CHANELOG.md | Logs of changes (additions, fixes, removals) that happen in this project. It follows the Keep-a-changelog convention. |
CONTRIBUTING.md | Describes the development workflow, as well as maintenance tasks and general conventions to follow when contributing to this project, which will only be me though (for the time being). |
docs/ | Convolute of assets for documentation purposes. |
examples/ | Examples of actually applying the project to use-cases. |
LICENSE | License applicable to this project. |
Makefile | GNU Make specification, which serves as an interface to the
cargo build driver (and other things) for CI services. It
can also aid in getting an overview of common build/maintenance
tasks. |
README.md | Project introduction and overview (what you are currently viewing). |
scripts/test.sh | Wrapper for cargo test , to make testing more
conventional by integrating testing, as well as reporting. |
scripts/todo.sh | (somehow) quantifies (known) technical debt. This is a guard rail, should I get too excited with implementing and forget about resolving issues first. |
src/bin/ | Source code for auxiliary binaries. This is to provide a CLI interface to the program. |
src/lib.rs | Code entrypoint of this project. The project is intended as a library, so that it does not require a CLI when using it on headless platforms. All public modules of the library are defined in this file. |
tests/ | Directory containing unit and integration tests. I'm breaking with
Rust convention a little, since I want a stronger separation of code and
test code, than is intended by the Rust team. Therefore I'm wrapping
cargo test to make it behave the way I want it to, which
may result in some confusing message output. I will adapt my approach
over time. |
Kudos go to the creators of the Zilch framework. I'm just a UNIX sysadmin, intrigued by the research and who's not a fan of Oracle.
D. Mouris and N. G. Tsoutsos, "Zilch: A Framework for Deploying Transparent Zero-Knowledge Proofs,"
in IEEE Transactions on Information Forensics and Security (TIFS), 2021, DOI: 10.1109/TIFS.2021.3074869
I've stumbled upon this research when looking at benchmarks for zero-knowledge proof algorithms and wondering why the Zilch framework was (considerably) faster than other implementations. I got an answer, but honestly no longer care about the answer. I'm just excited to do something with the Zilch framework, on my own terms.