|
| 1 | +# RFC-0139: Faster Erasure Coding |
| 2 | + |
| 3 | +| | | |
| 4 | +| --------------- | ------------------------------------------------------------------------------------------- | |
| 5 | +| **Start Date** | 7 March 2025 | |
| 6 | +| **Description** | Faster algorithm for Data Availability Layer | |
| 7 | +| **Authors** | ordian | |
| 8 | + |
| 9 | +## Summary |
| 10 | + |
| 11 | +This RFC proposes changes to the erasure coding algorithm and the method for computing the erasure root on Polkadot to improve performance of both processes. |
| 12 | + |
| 13 | +## Motivation |
| 14 | + |
| 15 | +The Data Availability (DA) Layer in Polkadot provides a foundation for |
| 16 | +shared security, enabling Approval Checkers and Collators to download |
| 17 | +Proofs-of-Validity (PoV) for security and liveness purposes respectively. |
| 18 | +As the number of parachains and PoV sizes increase, optimizing the performance |
| 19 | +of the DA layer becomes increasingly critical. |
| 20 | + |
| 21 | +[RFC-47](https://github.com/polkadot-fellows/RFCs/blob/main/text/0047-assignment-of-availability-chunks.md) |
| 22 | +proposed enabling systematic chunk recovery for Polkadot's DA to improve |
| 23 | +efficiency and reduce CPU overhead. However, while it helps under the assumption of |
| 24 | +good network connectivity to a specific one-third of validators (modulo some |
| 25 | +backup tolerance on backers), it still requires re-encoding. Therefore, |
| 26 | +we need to ensure the system can handle load in the worst-case scenario. |
| 27 | +The proposed change is orthogonal to RFC-47 and can be used in conjunction with it. |
| 28 | + |
| 29 | +Since RFC-47 already requires a breaking protocol change (including changes to |
| 30 | +collator nodes), we propose bundling another performance-enhancing breaking |
| 31 | +change that addresses the CPU bottleneck in the erasure coding process, but using |
| 32 | +a separate node feature (`NodeFeatures` part of `HostConfiguration`) for its activation. |
| 33 | + |
| 34 | +## Stakeholders |
| 35 | + |
| 36 | +- Infrastructure providers (operators of validator/collator nodes) |
| 37 | + will need to upgrade their client version in a timely manner |
| 38 | + |
| 39 | +## Explanation |
| 40 | + |
| 41 | +We propose two specific changes: |
| 42 | + |
| 43 | +1. Switch to the erasure coding algorithm described in the Graypaper, |
| 44 | +Appendix H. SIMD implementations of this algorithm are available in: |
| 45 | + |
| 46 | + - [Rust](https://github.com/AndersTrier/reed-solomon-simd) |
| 47 | + - [C++](https://github.com/catid/leopard) |
| 48 | + - [Go](https://github.com/celestiaorg/go-leopard) |
| 49 | + |
| 50 | +2. Replace the Merkle Patricia Trie with a Binary Merkle Tree for computing the erasure root. |
| 51 | + |
| 52 | +The reference root merklization implementation can be found [here](https://github.com/paritytech/erasure-coding/blob/512e77472beb877fe0881a857623d54d97b82bc4/src/merklize.rs#L9-L197). |
| 53 | + |
| 54 | +### Upgrade path |
| 55 | + |
| 56 | +We propose adding support for the new erasure coding scheme on both validator and collator sides without activating it until: |
| 57 | +1. All validators have upgraded |
| 58 | +2. Most collators have upgraded |
| 59 | + |
| 60 | +Block-authoring collators that remain on the old version will be unable to produce valid candidates until they upgrade. Parachain full nodes will continue to function normally without changes. |
| 61 | + |
| 62 | +An alternative approach would be to allow collators to opt-in to the new erasure |
| 63 | +coding scheme using a reserved field in the candidate receipt. This would allow |
| 64 | +faster deployment for most parachains but would add complexity. |
| 65 | + |
| 66 | +Given there isn't urgent demand for supporting larger PoVs currently, we recommend prioritizing simplicity with a way to implement future-proofing changes. |
| 67 | + |
| 68 | +In short, the following steps are proposed: |
| 69 | +1. Implement the changes a and wait for most collators to upgrade. |
| 70 | +2. Activate RFC-47 via `Configuration::set_node_feature` runtime change. |
| 71 | +3. Activate the new erasure coding scheme using another `Configuration::set_node_feature` runtime change. |
| 72 | + |
| 73 | +## Drawbacks |
| 74 | + |
| 75 | +Bundling this breaking change with RFC-47 might reset progress in updating collators. However, the omni node initiative should help mitigate this issue. |
| 76 | + |
| 77 | +## Testing, Security, and Privacy |
| 78 | + |
| 79 | +Testing is needed to ensure binary compatibility across implementations in multiple languages. |
| 80 | + |
| 81 | +## Performance and Compatibility |
| 82 | + |
| 83 | +### Performance |
| 84 | + |
| 85 | +According to [benchmarks](https://gist.github.com/ordian/0af2822e20bf905d53410a48dc122fd0): |
| 86 | +- A proper SIMD implementation of Reed-Solomon is 3-4× faster for encoding and up to 9× faster for full decoding |
| 87 | +- Binary Merkle Trees produce proofs that are 4× smaller and slightly faster to generate and verify |
| 88 | + |
| 89 | +### Compatibility |
| 90 | + |
| 91 | +This requires a breaking change that can be coordinated following the same approach as in RFC-47. |
| 92 | + |
| 93 | +## Prior Art and References |
| 94 | + |
| 95 | +JAM already utilizes the same optimizations described in the Graypaper. |
| 96 | + |
| 97 | +## Unresolved Questions |
| 98 | + |
| 99 | +None. |
| 100 | + |
| 101 | +## Future Directions and Related Material |
| 102 | + |
| 103 | +Future improvements could include: |
| 104 | +- Using ZK proofs to eliminate the need for re-encoding data to verify correct encoding |
| 105 | +- Removing the requirement for collators to compute the erasure root for the collator protocol |
0 commit comments