Description
Currently the ProgramArtifact
and ContractFunctionArtifact
contain base64 encoded bytecode of a Program
, where the serialisation is done by bincode
encoding the Program
and gzip
compressing the bytes.
The Noir repo generates C++ code for the program and the witness stack using serde
machinery, which is used in Berratenberg to deserialize the artifact.
The Program
has two fields: a list of ACIR circuits, and a list of Brillig bytecodes. Out of these two Berratenberg (bb
) only needs the ACIR circuits, it never touches the unconstrained Brillig part. However, it still deserialises it, and therefore any change in the Brillig opcodes is a breaking change, as bincode
provides no backwards compatibility. For example a new opcode won't be recognised, resulting in an error, and a deleted opcode (unless it's the last one) ends up shifting the index of all subsequent enum members, which older readers do not expect.
Ideally we want to be able to:
- Add/remove Brillig opcodes without breaking
bb
- Ignore the Brillig part in
bb
completely - Remove ACIR opcodes without breaking
bb
(obviously we cannot add new opcodes it can't handle, it should error in that case)
We can consider the formats mentioned in #1125
Protobuf
Protobuf requires us to maintain a .proto
schema file, based on which we can generate Rust and C++ bindings with a library such as prost
or protoc
itself. In Protobuf everything is optional, which is how it can maintain its backwards/forwards compatibility. We would have to manually maintain a mappings between the Protobuf DTO structures and our domain models. To achieve the result where bb
doesn't have to deserialise the data it doesn't care about, we can have either a) have an intermediary shim that treats circuits and brillig as a vector of bytes
, and then do a second deserialisation step on the circuits or b) we can have a reader message type that just doesn't have a field for the brillig part, thus ignoring it completely (protobuf uses numbers for fields in binary format).
message ProgramBytes {
repeated CircuitBytes functions = 1;
repeated BrilligBytes unconstrained_functions = 2;
}
message CircuitBytes {
bytes circuit = 1;
}
message BrilligBytes {
bytes brillig = 1;
}
message AcirProgramBytes {
repeated CircuitBytes functions = 1;
reserved 2;
}
The drawback is that even if we keep generating domain types using the serde machinery to keep the Rust and the C++ models in sync, we would have to maintain the mappings in C++ as well.
Another general drawback of Protobuf is that it deserialises into DTOs which we end up dropping after mapping them to domain types.
FlatBuffers
FlatBuffers is similar to Protobuf in that we have to maintain a .fbs
schema file, and everything tends to be optional, but it doesn't need to deserialise into DTOs, instead it offers and API to read into the byte slice directly, similarly to Cap'n'Proto. The steps to serialise data looks more involved in the examples than Protobuf, like writing a String
field to the builder
first, and then using the return value to make a struct
, which is how it stitches together the references internally.
The drawback again is to have to maintain a schema and either map it to domain types on the C++ side (which loses some of the benefit of reading from byte slices directly), or rewriting all methods that read the ACIR bytecode to convert them to constraints to work with the FlatBuffers API and deal with optionalities.
The good news is that there is a schema-less version called FlexBuffers, which actually works with serde, so we can potentially have our cake and eat it too. It works by serialising the field names/enum tags, which inflates the bytecode sizes but we can potentially reclaim that by the compression already in place.
CBOR
CBOR is like JSON but binary, a fork of MessagePack
. It typically uses BTreeMap
to represent objects, so the field names are preserved, which makes it backwards compatible without requiring a schema file. serde
would tag enums, which would allow us to remove an item without breaking older readers.
Similarly to FlexBuffers we pay for this with increased bytecode size, with the compression potentially compensating for that. FlexBuffers is supposed to be better in that it will reuse strings in the file, rather than include them repeatedly.
Two step bincode
A simple approach that preserves most of the status quo is to introduce a technical DTO struct such as this:
pub struct ProgramBytes {
pub functions: Vec<CircuitBytes>
pub unconstrained_functions: Vec<BrilligBytes>,
}
pub struct CircuitBytes(Vec<u8>);
pub struct BrilligBytes(Vec<u8>);
Once again the drawback is not being able to remove an ACIR opcode without breaking readers.
128 bit integers
Note that out of the above, the CBOR standard doesn't cover 128 bit integers, nor does Protobuf of FlatBuffers have a type for them. By the looks of it the serde codegen already treats them as strings, but in Protobuf we would have to decide what to map them to (e.g. bytes
).
Steps
I'll start by comparing the bytecode sizes in aztec-packages
using the baseline bincode
vs the self-describing format such as CBOR and FlexBuffers.
Activity