Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
a15321e
update spec: vote even after noarization and finalization
dnkolegov-ar Apr 15, 2026
9dbe61c
poc of the replayer with local support
dnkolegov-ar Apr 16, 2026
9976f76
fix replayer: remove selv votes adding
dnkolegov-ar Apr 16, 2026
d6586f1
fix makefile
dnkolegov-ar Apr 16, 2026
f118317
new event based tracer poc
dnkolegov-ar Apr 17, 2026
b1ef7b9
stage 1: wire up canonical bins
dnkolegov-ar Apr 17, 2026
447b99e
stage 2: add canonical Trace-recording pipeline alongside legacy
dnkolegov-ar Apr 17, 2026
82661a3
stage 3: port static_honest to emit canonical Trace
dnkolegov-ar Apr 17, 2026
50754f0
stage 4: port itf_to_trace + trace_to_quint to canonical Trace
dnkolegov-ar Apr 17, 2026
853527a
stage 5: implement Trace-native trace_mutator driver
dnkolegov-ar Apr 17, 2026
67a804a
stage 6: delete TraceData; migrate TLC-honest + ist/apalache/tlc_chec…
dnkolegov-ar Apr 17, 2026
82cc1e7
stage 7: rename canonical bins + symbols to plain names
dnkolegov-ar Apr 17, 2026
b2ee1df
stage 8: regenerate canonical fixtures + add bootstrap hook
dnkolegov-ar Apr 17, 2026
f9c1008
stage 9: update makefiles for Trace-native pipeline
dnkolegov-ar Apr 17, 2026
2f0d500
encoder: fix Quint rendering for non-round-robin leaders,
dnkolegov-ar Apr 17, 2026
0f10ca2
quint_model: extract ITF state from split store_* variables
dnkolegov-ar Apr 17, 2026
20823f8
record: derive keys via Runner::seeded(0); add replay gate in validat…
dnkolegov-ar Apr 17, 2026
6c6c570
record: apply seeded-fixture fix to Byzantine/twins paths + style nits
dnkolegov-ar Apr 17, 2026
1a1fb0a
tlc.sh: bump default JVM heap to 16g; make heap/stack env-overridable
dnkolegov-ar Apr 17, 2026
03c5a8b
tlc: append {"reset": true} terminator on every /execute POST
dnkolegov-ar Apr 17, 2026
69b4117
docs: add trace.md explaining the canonical Trace format + Quint enco…
dnkolegov-ar Apr 17, 2026
755837b
restore trace selection strategies; harden mutator refill
dnkolegov-ar Apr 17, 2026
bc4342f
trace_mutator: graceful queue exhaustion; fair per-seed budget
dnkolegov-ar Apr 17, 2026
70792d1
trace_mutator: port 10 legacy mutations + fold canonical.rs into mod.rs
dnkolegov-ar Apr 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 16 additions & 1 deletion consensus/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ commonware-utils.workspace = true
futures.workspace = true
rand.workspace = true
rand_core.workspace = true
serde = { workspace = true, optional = true, features = ["derive"] }
serde_json = { workspace = true, optional = true }
thiserror.workspace = true

[target.'cfg(not(target_arch = "wasm32"))'.dependencies]
Expand All @@ -45,11 +47,13 @@ tracing.workspace = true

[dev-dependencies]
commonware-conformance.workspace = true
commonware-consensus = { path = ".", features = ["mocks"] }
commonware-consensus = { path = ".", features = ["mocks", "replay"] }
commonware-resolver = { workspace = true, features = ["mocks"] }
commonware-runtime = { workspace = true, features = ["test-utils"] }
rand_chacha.workspace = true
rstest.workspace = true
serde = { workspace = true, features = ["derive"] }
serde_json.workspace = true
tracing-subscriber.workspace = true

[package.metadata.cargo-udeps.ignore]
Expand All @@ -64,6 +68,17 @@ mocks = [
"commonware-p2p/mocks",
"commonware-resolver/mocks",
]
# Record and replay simplex traces as unit tests.
#
# Enables `src/simplex/replay/` and the cfg-gated hooks it drives
# (`voter::Message::{Proposed,Replayed}`, `Mailbox::{proposed,replayed}`,
# `Engine::voter_mailbox`). Pulls in `mocks` so the mock reporter is
# available to the replay driver.
replay = [
"mocks",
"dep:serde",
"dep:serde_json",
]
arbitrary = [
"commonware-codec/arbitrary",
"commonware-coding/arbitrary",
Expand Down
40 changes: 31 additions & 9 deletions consensus/fuzz/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ cargo-fuzz = true

[dependencies]
arbitrary = { workspace = true, features = ["derive"] }
bytes.workspace = true
commonware-codec.workspace = true
commonware-consensus = { workspace = true, features = ["mocks", "arbitrary"] }
commonware-consensus = { workspace = true, features = ["mocks", "arbitrary", "replay"] }
commonware-cryptography.workspace = true
commonware-macros.workspace = true
commonware-math.workspace = true
Expand Down Expand Up @@ -177,29 +178,50 @@ doc = false
bench = false

[[bin]]
name = "replay_trace"
path = "src/bin/replay_trace.rs"
name = "itf_to_trace"
path = "src/bin/itf_to_trace.rs"
test = false
doc = false
bench = false

[[bin]]
name = "validate_trace_corpus"
path = "src/bin/validate_trace_corpus.rs"
name = "ist"
path = "src/bin/ist.rs"
test = false
doc = false
bench = false

[[bin]]
name = "ist"
path = "src/bin/ist.rs"
name = "tlc_check"
path = "src/bin/tlc_check.rs"
test = false
doc = false
bench = false

[[bin]]
name = "itf_to_trace"
path = "src/bin/itf_to_trace.rs"
name = "trace_mutator"
path = "src/bin/trace_mutator.rs"
test = false
doc = false
bench = false

[[bin]]
name = "replay_trace"
path = "src/bin/replay_trace.rs"
test = false
doc = false
bench = false

[[bin]]
name = "validate_trace_corpus"
path = "src/bin/validate_trace_corpus.rs"
test = false
doc = false
bench = false

[[bin]]
name = "generate_seeds"
path = "src/bin/generate_seeds.rs"
test = false
doc = false
bench = false
Expand Down
267 changes: 267 additions & 0 deletions consensus/fuzz/docs/trace.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
# The canonical `Trace` format and how it becomes Quint

## 1. On-disk shape

A `Trace` is a JSON document with three top-level fields:

```jsonc
{
"topology": { "n": 4, "faults": 0, "epoch": 0,
"namespace": "636f6e73656e7375735f66757a7a",
"timing": { "leader_timeout_ms": 1000, /* ... */ } },
"events": [ /* ... ordered event stream ... */ ],
"expected": { "nodes": [ /* per-correct-replica observable state */ ] }
}
```

- **`topology`** pins down the world: how many replicas (`n`), how many
are Byzantine (`faults` — indices `0..faults`), the signing namespace,
the epoch, and the timeout knobs. This is everything `replay(&Trace)`
and `rehydrate_keys(&Topology)` need to reconstruct an
identical-signing replica set.
- **`events`** is the entire causal history — not just network traffic,
but also locally-authored actions and timeouts. Four variants, in this
order of abstraction (low → high):
- `Deliver { to, from, msg: Wire }` — a message arrived at `to`. `msg`
is either `Wire::Vote(signed vote)` or `Wire::Cert(signed
certificate)`, both with real cryptographic payloads as hex.
- `Construct { node, vote }` — `node` locally produced a signed vote
(notarize/nullify/finalize). This is fundamentally different from
`Deliver`: it says "this replica authored this vote", not "this
replica received a vote".
- `Propose { leader, proposal }` — the leader of a view produced a
fresh proposal. The `Construct(Notarize)` that immediately follows
carries the leader's own notarize for that proposal.
- `Timeout { node, view, reason }` — a replica's leader-timeout or
certification-timeout fired.
- **`expected`** is a `Snapshot`: per-correct-replica observable state
(`notarizations`, `nullifications`, `finalizations`, `certified`,
per-view signer sets for each vote kind, and `last_finalized`).
Byzantine indices contribute no `NodeSnapshot`. The snapshot lets
`replay_trace` check replay-equivalence without re-running Quint.

## 2. How events are produced

- `fuzz/src/tracing/record.rs` wraps a live 4-node run:
- A `Recorder::new(participants)` is shared across all engines.
- Each replica's vote receiver/sender is wrapped by
`RecordingReceiver` / `RecordingSender`; its application is wrapped
by `RecordingApp`.
- `RecordingSender::send` pushes an `Event::Construct` before
forwarding (this replica authored this vote).
- `RecordingReceiver::recv` pushes an `Event::Deliver` after decoding
(this replica received this message from that peer).
- `RecordingApp::on_propose` pushes an `Event::Propose` when the app
emits a proposal.
- The keyset must match what `rehydrate_keys` computes — we
pre-derive via a nested `Runner::seeded(0)` so the fuzz runtime's
FuzzRng doesn't bias key derivation.
- `recorder.freeze(topology, snapshot)` bakes the collected events +
provided snapshot into a `Trace`.
- `static_honest.rs` synthesizes the same event shape analytically
without running the engine (for fast fuzz-seed generation).

## 3. Quint encoding is a three-stage pipeline

All three stages live in `fuzz/src/tracing/encoder.rs`.

### Stage A: `build_block_map_from_events(&events)`

Scans every `Event::Propose` and
`Event::Deliver { msg: Vote(Notarize|Finalize) | Cert(Notarization|Finalization) }`,
collects each distinct proposal-payload hex (`Sha256Digest` → 64-char
hex), and assigns each one the name `val_b0`, `val_b1`, … in first-seen
order. Returns `Vec<(hex, name)>`.

Quint can't work with 64-char hex digests — it has a small finite
`VALID_PAYLOADS` set. This map is the bijection.

### Stage B: `lower_events_to_actions(&events, &block_map, faults) -> Vec<ActionItem>`

A 1:1 lowering — no causal reconstruction, just direct per-event
translation with two mandatory filters for `faults > 0`:

| Event | ActionItem | Notes |
|---|---|---|
| `Propose { leader, proposal }` | `Propose { leader, view, parent_view, payload }` | **Dropped** if `leader < faults` (Quint doesn't model Byzantine replicas; the proposal gets introduced by the following `Construct`). |
| `Construct { node, Vote::Notarize(n) }` | `SendNotarizeVote { view, parent_view, payload, sig }` | Signer is the signed vote's `signer()`, not `node`. Byzantine signers still emit this — it's a pure network barrier. |
| `Construct { node, Vote::Nullify(n) }` | `SendNullifyVote` | |
| `Construct { node, Vote::Finalize(f) }` | `SendFinalizeVote` | |
| `Deliver { to, from, Wire::Vote(Notarize) }` | `OnNotarize { receiver, view, parent_view, payload, sig }` | **Dropped** if `to < faults` (Byzantine receivers have no modeled state; `replica_state.get("n0")` would panic). |
| `Deliver { to, from, Wire::Vote(Nullify) }` | `OnNullify` | Same filter. |
| `Deliver { to, from, Wire::Vote(Finalize) }` | `OnFinalize` | Same filter. |
| `Deliver { to, from, Wire::Cert(c) }` | `SendCertificate` (on first sighting of this `(ghost_sender, cert)`) + `OnCertificate { receiver, cert }` | `SendCertificate` is a network-availability barrier and fires even for Byzantine receivers; only `OnCertificate` is gated. |
| `Timeout { ... }` | — | Quint models timeouts indirectly via the nullify votes that follow; no direct `on_timeout` is emitted. |

### Stage C: `render_quint_from_actions(&cfg, &events, &block_map, &proposals, &actions, &reporter_states) -> String`

Builds the `.qnt` test module. Rough skeleton:

```quint
module tests {
import types.* from "../types"
import defs.* from "../defs"
import option.* from "../option"
import automaton(CERTIFY_DOMAIN = Set("val_b0", ...)) as app from "../automaton"
import replica(
N = 4, F = 1, Q = 3,
CORRECT = Set("n1","n2","n3"),
BYZANTINE = Set("n0"),
REPLICA_KEYS = Map("n0"->"n0", "n1"->"n1", ...),
VIEWS = 1.to(19),
VALID_PAYLOADS = Set("val_b0","val_b1",...),
INVALID_PAYLOADS = Set(),
ACTIVITY_TIMEOUT = 10
).* from "../replica"

// One named value per (view, parent, block_name) seen in events:
pure val proposal_v1_p0_val_b0 = { payload: "val_b0", view: 1, parent: 0 }
pure val proposal_v2_p1_val_b1 = { payload: "val_b1", view: 2, parent: 1 }
// ...

// Per-replica certify-policy map:
pure val CERTIFY_POLICY = Set(GENESIS_PAYLOAD, "val_b0", /* ... */)
pure val CERTIFY_CUSTOM = Replicas.mapBy(_ => CERTIFY_POLICY)

// Helper actions (see below for send_*_vote semantics).
action send_notarize_vote(vote: NotarizeVote) = all { /* ... */ }
action send_nullify_vote(vote: NullifyVote) = all { /* ... */ }
action send_finalize_vote(vote: FinalizeVote) = all { /* ... */ }
action send_certificate(cert: Certificate) = all { /* ... */ }

// Action chain, chunked into trace_part_NN for readability:
action trace_part_00 =
initWithLeaderAndCertify(
Map(0->"n1", 1->"n3", 2->"n3", 3->"n0", 4->"n1", 5->"n2"),
CERTIFY_CUSTOM
)
.then(propose("n3", "val_b0", 0))
.then(send_notarize_vote({ proposal: proposal_v1_p0_val_b0, sig: "n3" }))
.then(on_notarize("n1", { proposal: proposal_v1_p0_val_b0, sig: "n3" }))
.then(on_notarize("n2", { proposal: proposal_v1_p0_val_b0, sig: "n3" }))
// ...
.expect(safe_invariants)

run traceTest =
trace_part_02
.expect(safe_invariants)
.expect(replica_state.get("n1").last_finalized >= 1)
.expect(replica_state.get("n2").last_finalized >= 1)
.expect(replica_state.get("n3").last_finalized >= 1)
}
```

Three non-obvious details in this stage:

1. **Leader map is read from the trace, not computed.**
`build_leader_map_to(cfg, max_view, events)` walks every
`Event::Propose` and records `view -> leader.get()`; it only falls
back to `(epoch + view) % n` for views that have no recorded
`Propose`. This matters because twins scenarios / Random / VRF
electors don't follow round-robin.

2. **`send_*_vote` helpers are not pure network barriers** — they also
mirror the vote into the signer's local vote store when the signer
is correct:
```quint
store_notarize_votes' = if (CORRECT.contains(vote.sig))
store_notarize_votes.set(vote.sig,
store_notarize_votes.get(vote.sig).union(Set(vote)))
else store_notarize_votes
```
This matches the Rust model: `Event::Construct` for a correct node
is "this replica locally authored a vote", which updates its state,
not just the network. Without this, `n1`'s self-nullify would be
invisible to `n1`'s own quorum check.

3. **`send_certificate` is emitted once per
`(ghost_sender, dedup_key)`**, not once per delivery. A cert may be
`Deliver`ed to all four replicas; we want one `send_certificate`
into the network and one `on_certificate` per correct receiver.

## 4. Example: one view of an honest 4-node trace

Recorded events, in order, for view 1 with leader n2:

```
Propose { leader: 2, proposal { view: 1, parent: 0, payload: sha(val_b0) } }
Construct { node: 2, vote: Notarize(n2, ...) }
Deliver { to: 0, from: 2, Vote(Notarize n2) }
Deliver { to: 1, from: 2, Vote(Notarize n2) }
Deliver { to: 3, from: 2, Vote(Notarize n2) }
Construct { node: 0, vote: Notarize(n0, ...) } // n0 echoes after seeing leader vote
Construct { node: 1, vote: Notarize(n1, ...) }
Construct { node: 3, vote: Notarize(n3, ...) }
Deliver { to: 0, from: 1, Vote(Notarize n1) }
// ...all the peer deliveries...
Deliver { to: 0, from: 0, Cert(Notarization) } // cert broadcast
Deliver { to: 1, from: 0, Cert(Notarization) }
// ...finalize votes + cert...
```

Lowers to:

```
Propose leader="n2" view=1 parent=0 payload="val_b0"
SendNotarizeVote view=1 parent=0 payload="val_b0" sig="n2"
OnNotarize receiver="n0" view=1 parent=0 payload="val_b0" sig="n2"
OnNotarize receiver="n1" ...
OnNotarize receiver="n3" ...
SendNotarizeVote sig="n0" # from Construct
SendNotarizeVote sig="n1"
SendNotarizeVote sig="n3"
OnNotarize receiver="n0" sig="n1"
// ...
SendCertificate cert=Notarization(v=1, signers=[n0,n1,n2,n3], ghost="n0")
OnCertificate receiver="n0" ...
OnCertificate receiver="n1" ...
// ...
SendFinalizeVote ...
OnFinalize ...
SendCertificate cert=Finalization(...)
OnCertificate ...
```

Which renders into the Quint chain:

```quint
.then(propose("n2", "val_b0", 0))
.then(send_notarize_vote({ proposal: proposal_v1_p0_val_b0, sig: "n2" }))
.then(on_notarize("n0", { proposal: proposal_v1_p0_val_b0, sig: "n2" }))
.then(on_notarize("n1", /* ... */))
.then(on_notarize("n3", /* ... */))
.then(send_notarize_vote({ proposal: proposal_v1_p0_val_b0, sig: "n0" }))
.then(send_notarize_vote({ proposal: proposal_v1_p0_val_b0, sig: "n1" }))
.then(send_notarize_vote({ proposal: proposal_v1_p0_val_b0, sig: "n3" }))
.then(on_notarize("n0", { proposal: proposal_v1_p0_val_b0, sig: "n1" }))
// ...
.then(send_certificate(notarization(proposal_v1_p0_val_b0,
Set("n0","n1","n2","n3"), "n0")))
.then(on_certificate("n0", notarization(/* ... */)))
// ...
```

The same `Vec<ActionItem>` is what `tlc_encoder::encode_from_trace`
renders into the JSON action sequence posted to tlc-controlled —
identical semantic walk, different target syntax. The JSON variant must
also terminate the action sequence with `{"reset": true}` so the TLC
server's `simulate(..., is_reset=true)` loop exits cleanly; see
`tlc::terminate_actions`.

## 5. Quint's side of the contract

The emitted `.qnt` imports `replica.qnt`, which maintains per-replica
state machines. Our generated file adds a `traceTest` run that:

1. Seeds the initial state with
`initWithLeaderAndCertify(LEADER_MAP, CERTIFY_CUSTOM)`.
2. Fires each `ActionItem` via `.then(...)`.
3. Asserts `safe_invariants` holds throughout, and per-replica
`last_finalized >= required_containers` at the end.

Once Quint accepts the trace, `quint_model::validate_and_extract_expected`
reads the final ITF state (via `--out-itf`) and reconstructs the
expected `Snapshot` by walking `replica_state`, the split
`store_{notarize,nullify,finalize}_votes` maps, and `store_certificates`.
That snapshot is what then gets embedded back into `Trace.expected` for
downstream `replay_trace` equivalence checking.
Loading
Loading