Proposal: Bipartite Supervalidator Architecture#224
Open
elliottdehn wants to merge 8 commits intocanton-foundation:mainfrom
Open
Proposal: Bipartite Supervalidator Architecture#224elliottdehn wants to merge 8 commits intocanton-foundation:mainfrom
elliottdehn wants to merge 8 commits intocanton-foundation:mainfrom
Conversation
Separates CPU-bound crypto from IO-bound state management into two independently scalable machine classes. PoC benchmarks demonstrate 4x throughput and 7x latency improvement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Measured CPU breakdown via JDK Flight Recorder: crypto is 38.2% of CPU (Ed25519 signing 21.5%, ECDH 15.7%), while PostgreSQL is only 1.2% (IO-wait, not CPU). Validates the premise that crypto is the dominant offloadable bottleneck. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Isolated the SV workload by running the submitting participant in a separate JVM. Crypto rises to 42.7% of SV CPU (vs 38.2% all-in-one) because Daml engine/protobuf work lives on the submitter, not the SV. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The key insight: DB operations aren't slow (they're IO-wait), but they can't start until a core is free. On a monolithic node with cores saturated by ECIES, DB threads sit in the run queue. The bipartite split eliminates this contention — B's cores are almost idle, so DB threads schedule instantly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmarking showed that JVM thread pool separation does NOT improve latency — even under CPU saturation, the OS scheduler doesn't enforce isolation between pools. The bipartite speedup comes from having more total CPU (8 cpus vs 2 cpus = 4x), not from eliminating scheduling contention between crypto and DB threads. Updated the motivation, "Why Not Just Bigger Machines", and benchmark results sections to accurately describe: - The throughput gain is proportional to added CPU capacity - The latency improvement follows from reduced queueing with more cores - The architectural value is cost-efficient horizontal scaling of stateless crypto capacity, not thread scheduling optimization - Thread pool separation within a single JVM was tested and doesn't work Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The smallest useful bipartite configuration (2 A nodes + 1 B node, 3 cpus total) delivers 6.4x throughput over 1-cpu monolithic — better than the CPU ratio alone. The split eliminates cache thrashing between crypto and IO workloads on the same core. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Open
3 tasks
f75bbd0 to
2aa3f94
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2aa3f94 to
a620c64
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Development Fund Proposal Submission
Proposal file:
/proposals/bipartite-supervalidator-architecture.md
Summary
This proposal separates CPU-bound cryptographic operations (ECIES view encryption/decryption, Speedy reinterpretation) from IO-bound state management (BFT consensus, sequencer DB writes, ACS updates) into two independently scalable machine classes. A proof-of-concept using Canton's actual crypto primitives demonstrates 4x throughput improvement and 7x latency reduction at 32 concurrent transactions. The architecture is backward-compatible and opt-in — existing Daml applications require no changes.
Checklist
/proposals/Notes for Reviewers
bipartite-poc/. It uses real JDK 21 crypto (ECDH P-256, AES-256-GCM, Ed25519) and real PostgreSQL writes — not simulated delays.