feat(replay): per-step session replay with hash-chained journal#1810
Conversation
Adds a per-step replay surface on top of lineage v1. Each agent step
is written to a hash-chained journal under .sdd/runtime/journal/
where step_hash = SHA256(canonical_json({prev_hash, input_hash,
model, prompt, tool_call, tool_result})). The chain head is the
run's verifiable identity.
CLI verbs (additive on top of the existing replay command):
- bernstein replay <agent_id> renders the per-step view and verifies
the chain matches the recorded head before any rendering.
- bernstein session fork <session_id> --from-step <n> seeds the fork
worktree journal with the parent prefix [0..N] and records the
parent step_hash so the chain becomes a tree.
- bernstein replay export <agent_id> writes a portable tarball
receipt; verify_receipt walks the chain offline.
- bernstein replay publish <agent_id> --opt-in runs redaction and
re-anchors the chain so the published receipt still verifies.
- bernstein replay diff-journal A B surfaces the first divergent
field rather than a flaky-test signature.
New audit event-type entries (replay.step, replay.fork,
replay.export, replay.publish) ride the existing HMAC-chained audit
log; existing entries are untouched.
Backward compatibility: bernstein git undo, plain session fork
(no --from-step), and the audit-slice extractor work unchanged.
Closes #1799
There was a problem hiding this comment.
Sorry @chernistry, you have reached your weekly rate limit of 2500000 diff characters.
Please try again later or upgrade to continue using Sourcery
Sonar insights (advisory, no merge-block)Snapshot of
Run This comment is a soft signal. The Sonar scan runs on push to |
Review-bot acknowledgement summary
All must-address findings are resolved or acknowledged. |
|
bernstein doctor observe for PR #1810 ( sonar -- WARN (project bernstein)
code-scanning -- WARN (22 open alert(s))
Skipped backends (credentials not configured)
See docs/observability/unified-doctor.md for backend setup notes. |
Mutation gate (fixed critical paths)
|
| "tool_result": self.tool_result, | ||
| "step_hash": self.step_hash, | ||
| "ts": self.ts, | ||
| "blob_refs": list(self.blob_refs), |
| "steps": self.steps, | ||
| "bernstein_version": self.bernstein_version, | ||
| "created_at": self.created_at, | ||
| "blob_digests": list(self.blob_digests), |
| "steps": self.steps, | ||
| "bernstein_version": self.bernstein_version, | ||
| "created_at": self.created_at, | ||
| "blob_digests": list(self.blob_digests), |
|
|
||
|
|
||
| # Re-export so callers don't need a separate journal import for the symbol. | ||
| JournalError = JournalError |
|
|
||
| def _redact_row(row: dict[str, Any], policy: RedactionPolicy) -> dict[str, Any]: | ||
| """Return a copy of *row* with redaction policy applied.""" | ||
| redacted = dict(row) |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (18)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Summary
Adds a per-step replay surface on top of lineage v1. Each agent step is written to a hash-chained journal under
.sdd/runtime/journal/<agent_id>/wherestep_hash = SHA256(canonical_json({prev_hash, input_hash, model, prompt, tool_call, tool_result})). The chain head is the run's verifiable identity.Closes #1799.
Changes
New modules:
src/bernstein/core/persistence/journal.py- chained journal + canonical step encoding (load-bearing contract documented inline)src/bernstein/core/persistence/journal_diff.py- precise per-field divergence detectorsrc/bernstein/core/persistence/journal_export.py- portable, offline-verifiable receipt formatsrc/bernstein/core/persistence/journal_publish.py- privacy-redacted publish with chain re-anchoringsrc/bernstein/cli/commands/replay_cmd.py- CLI helpers for the new verbsdocs/operations/replay.md- operator-facing documentationModified:
src/bernstein/cli/commands/advanced_cmd.py- extendedreplayto dispatch the new verbs without breaking the legacyreplay <run_id>shapesrc/bernstein/cli/commands/session_cmd.py- added--from-stepand--prompttosession forksrc/bernstein/core/sessions/fork.py-fork_sessionnow supportsfrom_step; seeds the fork journal with the parent chain prefixsrc/bernstein/core/security/audit.py- additive new event-type constants (replay.step,replay.fork,replay.export,replay.publish)Tests:
tests/unit/test_journal_chain.py- 19 cases for step-hash determinism, chain integrity, atomic append, reconstructiontests/unit/test_journal_divergence.py- precise field-level difftests/unit/test_journal_export.py- receipt format + tamper detectiontests/unit/test_journal_publish.py- opt-in publish + redactiontests/unit/test_replay_journal_cli.py- CLI helper exit-code contracttests/integration/test_fork_from_step.py- fork-from-step end-to-end + backward-compat regression nettests/integration/test_replay_divergence.py- forced non-determinism in a test adapter -> precise difftests/integration/test_replay_receipt_roundtrip.py- export+verify offline + signed receipt + redacted publishAcceptance criteria
.sdd/runtime/journal/<agent_id>/<bucket>.jsonlwith atomic append (one line per call under a lock)step_hash = SHA256(canonical_json(prev_hash, input_hash, model, prompt, tool_call, tool_result)); head hash is the run identitybernstein replay <agent_id>verifies the chain matches the recorded head before renderingbernstein session fork <session_id> --from-step <n>materialises a sibling worktree and records the parent step hash as the chain rootreplay diff-journaland theStepDivergencedataclass; orchestrator never silently accepts a divergent replaybernstein replay export <agent_id>produces an offline-verifiable receipt (tarball with canonical manifest + chain + CAS blobs)bernstein replay publishrequires explicit--opt-in; re-anchors the chain to redacted payloads so verification still works post-redactionbernstein git undo, plainbernstein session fork(no--from-step), and the audit-slice extractor work unchangedTest plan
uv run pytest tests/unit -q --no-cov --timeout=120 -k "journal or replay or fork"(290 passed locally)uv run pytest tests/integration/test_fork_from_step.py tests/integration/test_replay_divergence.py tests/integration/test_replay_receipt_roundtrip.py -q --no-cov --timeout=120(13 passed locally)uv run ruff check src/ tests/(clean)uv run ruff format --check src/bernstein/core/persistence/journal*.py src/bernstein/cli/commands/replay_cmd.py(clean)uv run pyright --project pyrightconfig.strict.json(strict zone clean)