Backwards compatible `Program` serialisation

Currently the `ProgramArtifact` and `ContractFunctionArtifact` contain [base64 encoded bytecode](https://github.com/noir-lang/noir/blob/3bc52e40c86251b3a07cc7e17b5b96bea967d5a5/tooling/noirc_artifacts/src/program.rs#L24-L28) of a `Program`, where the [serialisation](https://github.com/noir-lang/noir/blob/3bc52e40c86251b3a07cc7e17b5b96bea967d5a5/acvm-repo/acir/src/circuit/mod.rs#L241-L247) is done by `bincode` encoding the `Program` and `gzip` compressing the bytes. 

The Noir repo [generates](https://github.com/noir-lang/noir/blob/c254c3ca07c65c7845d5affb17cb37114c5a3051/acvm-repo/acir/src/lib.rs#L52-L137) C++ code for the program and the witness stack using `serde` machinery, which is [used](https://github.com/AztecProtocol/aztec-packages/blob/master/barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/acir.hpp) in Berratenberg to [deserialize](https://github.com/AztecProtocol/aztec-packages/blob/8835b31d76b2f7c45416eaf67a748d8df9dbc753/barretenberg/cpp/src/barretenberg/dsl/acir_format/acir_to_constraint_buf.cpp#L850) the artifact. 

The `Program` has [two fields](https://github.com/noir-lang/noir/blob/3bc52e40c86251b3a07cc7e17b5b96bea967d5a5/acvm-repo/acir/src/circuit/mod.rs#L41-L42): a list of ACIR circuits, and a list of Brillig bytecodes. Out of these two Berratenberg (`bb`) only needs the ACIR circuits, it never touches the unconstrained Brillig part. However, it still deserialises it, and therefore any change in the Brillig opcodes is a breaking change, as `bincode` provides no backwards compatibility. For example a new opcode won't be recognised, resulting in an error, and a deleted opcode (unless it's the last one) ends up shifting the index of all subsequent enum members, which older readers do not expect.

Ideally we want to be able to:
* Add/remove Brillig opcodes without breaking `bb`
* Ignore the Brillig part in `bb` completely
* Remove ACIR opcodes without breaking `bb` (obviously we cannot add new opcodes it can't handle, it should error in that case)

We can consider the formats mentioned in https://github.com/noir-lang/noir/issues/1125 

### Protobuf

Protobuf requires us to maintain a `.proto` schema file, based on which we can generate Rust and C++ bindings with a library such as `prost` or `protoc` itself. In Protobuf everything is optional, which is how it can maintain its backwards/forwards compatibility. We would have to manually maintain a mappings between the Protobuf DTO structures and our domain models. To achieve the result where `bb` doesn't have to deserialise the data it doesn't care about, we can have either a) have an intermediary shim that treats circuits and brillig as a vector of `bytes`, and then do a second deserialisation step on the circuits or b) we can have a reader message type that just doesn't have a field for the brillig part, thus ignoring it completely (protobuf uses numbers for fields in binary format). 

```proto
message ProgramBytes {
  repeated CircuitBytes functions = 1;
  repeated BrilligBytes unconstrained_functions = 2;
}
message CircuitBytes {
  bytes circuit = 1;
}
message BrilligBytes { 
  bytes brillig = 1;
}
message AcirProgramBytes {
  repeated CircuitBytes functions = 1;
  reserved 2;
}
```

The drawback is that even if we keep generating domain types using the serde machinery to keep the Rust and the C++ models in sync, we would have to maintain the mappings in C++ as well. 

Another general drawback of Protobuf is that it deserialises into DTOs which we end up dropping after mapping them to domain types. 

### FlatBuffers

[FlatBuffers](https://flatbuffers.dev) is similar to Protobuf in that we have to maintain a `.fbs` schema file, and everything tends to be optional, but it doesn't need to deserialise into DTOs, instead it offers and API to read into the byte slice directly, similarly to [Cap'n'Proto](https://capnproto.org). The steps to serialise data looks more involved in the examples than Protobuf, like writing a `String` field to the `builder` first, and then using the return value to make a `struct`, which is how it stitches together the references internally. 

The drawback again is to have to maintain a schema and either map it to domain types on the C++ side (which loses some of the benefit of reading from byte slices directly), or rewriting all methods that read the ACIR bytecode to convert them to constraints to work with the FlatBuffers API and deal with optionalities. 

The good news is that there is a [schema-less version](https://flatbuffers.dev/flexbuffers/) called FlexBuffers, which actually [works with serde](https://github.com/google/flatbuffers/blob/master/samples/sample_flexbuffers_serde.rs), so we can potentially have our cake and eat it too. It works by serialising the field names/enum tags, which inflates the bytecode sizes but we can potentially reclaim that by the compression already in place. 

### CBOR

[CBOR](https://cbor.io) is like JSON but binary, a fork of `MessagePack`. It typically uses `BTreeMap` to represent objects, so the field names are preserved, which makes it backwards compatible without requiring a schema file. `serde` would tag enums, which would allow us to remove an item without breaking older readers. 

Similarly to FlexBuffers we pay for this with increased bytecode size, with the compression potentially compensating for that. FlexBuffers is supposed to be better in that it will reuse strings in the file, rather than include them repeatedly.


### Two step bincode 

A simple approach that preserves most of the status quo is to introduce a technical DTO struct such as this:

```rust
pub struct ProgramBytes {
    pub functions: Vec<CircuitBytes>
    pub unconstrained_functions: Vec<BrilligBytes>,
}
pub struct CircuitBytes(Vec<u8>);
pub struct BrilligBytes(Vec<u8>);
```

Once again the drawback is not being able to remove an ACIR opcode without breaking readers.

## 128 bit integers

Note that out of the above, the CBOR standard doesn't cover 128 bit integers, nor does Protobuf of FlatBuffers have a type for them. By the looks of it the serde codegen already [treats them as strings](https://github.com/AztecProtocol/aztec-packages/blob/b2b5589cacc926fcb7a6a5ec1dbc5fdf023b65cc/noir/noir-repo/acvm-repo/acir/codegen/witness.cpp#L17), but in Protobuf we would have to decide what to map them to (e.g. `bytes`).

## Steps

I'll start by comparing the bytecode sizes in `aztec-packages` using the baseline `bincode` vs the self-describing format such as CBOR and FlexBuffers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backwards compatible `Program` serialisation #7511

Protobuf

FlatBuffers

CBOR

Two step bincode

128 bit integers

Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Backwards compatible Program serialisation #7511

Description

Protobuf

FlatBuffers

CBOR

Two step bincode

128 bit integers

Steps

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Backwards compatible `Program` serialisation #7511