feat(replay): per-step session replay with hash-chained journal by chernistry · Pull Request #1810 · sipyourdrink-ltd/bernstein

chernistry · 2026-05-21T20:27:15Z

Summary

Adds a per-step replay surface on top of lineage v1. Each agent step is written to a hash-chained journal under .sdd/runtime/journal/<agent_id>/ where step_hash = SHA256(canonical_json({prev_hash, input_hash, model, prompt, tool_call, tool_result})). The chain head is the run's verifiable identity.

Closes #1799.

Changes

New modules:

src/bernstein/core/persistence/journal.py - chained journal + canonical step encoding (load-bearing contract documented inline)
src/bernstein/core/persistence/journal_diff.py - precise per-field divergence detector
src/bernstein/core/persistence/journal_export.py - portable, offline-verifiable receipt format
src/bernstein/core/persistence/journal_publish.py - privacy-redacted publish with chain re-anchoring
src/bernstein/cli/commands/replay_cmd.py - CLI helpers for the new verbs
docs/operations/replay.md - operator-facing documentation

Modified:

src/bernstein/cli/commands/advanced_cmd.py - extended replay to dispatch the new verbs without breaking the legacy replay <run_id> shape
src/bernstein/cli/commands/session_cmd.py - added --from-step and --prompt to session fork
src/bernstein/core/sessions/fork.py - fork_session now supports from_step; seeds the fork journal with the parent chain prefix
src/bernstein/core/security/audit.py - additive new event-type constants (replay.step, replay.fork, replay.export, replay.publish)

Tests:

tests/unit/test_journal_chain.py - 19 cases for step-hash determinism, chain integrity, atomic append, reconstruction
tests/unit/test_journal_divergence.py - precise field-level diff
tests/unit/test_journal_export.py - receipt format + tamper detection
tests/unit/test_journal_publish.py - opt-in publish + redaction
tests/unit/test_replay_journal_cli.py - CLI helper exit-code contract
tests/integration/test_fork_from_step.py - fork-from-step end-to-end + backward-compat regression net
tests/integration/test_replay_divergence.py - forced non-determinism in a test adapter -> precise diff
tests/integration/test_replay_receipt_roundtrip.py - export+verify offline + signed receipt + redacted publish

Acceptance criteria

Per-step journal under .sdd/runtime/journal/<agent_id>/<bucket>.jsonl with atomic append (one line per call under a lock)
step_hash = SHA256(canonical_json(prev_hash, input_hash, model, prompt, tool_call, tool_result)); head hash is the run identity
bernstein replay <agent_id> verifies the chain matches the recorded head before rendering
bernstein session fork <session_id> --from-step <n> materialises a sibling worktree and records the parent step hash as the chain root
Non-determinism surfaces as a precise field diff via replay diff-journal and the StepDivergence dataclass; orchestrator never silently accepts a divergent replay
bernstein replay export <agent_id> produces an offline-verifiable receipt (tarball with canonical manifest + chain + CAS blobs)
Local-only default; bernstein replay publish requires explicit --opt-in; re-anchors the chain to redacted payloads so verification still works post-redaction
Existing bernstein git undo, plain bernstein session fork (no --from-step), and the audit-slice extractor work unchanged

Test plan

uv run pytest tests/unit -q --no-cov --timeout=120 -k "journal or replay or fork" (290 passed locally)
uv run pytest tests/integration/test_fork_from_step.py tests/integration/test_replay_divergence.py tests/integration/test_replay_receipt_roundtrip.py -q --no-cov --timeout=120 (13 passed locally)
uv run ruff check src/ tests/ (clean)
uv run ruff format --check src/bernstein/core/persistence/journal*.py src/bernstein/cli/commands/replay_cmd.py (clean)
uv run pyright --project pyrightconfig.strict.json (strict zone clean)
CI green

Adds a per-step replay surface on top of lineage v1. Each agent step is written to a hash-chained journal under .sdd/runtime/journal/ where step_hash = SHA256(canonical_json({prev_hash, input_hash, model, prompt, tool_call, tool_result})). The chain head is the run's verifiable identity. CLI verbs (additive on top of the existing replay command): - bernstein replay <agent_id> renders the per-step view and verifies the chain matches the recorded head before any rendering. - bernstein session fork <session_id> --from-step <n> seeds the fork worktree journal with the parent prefix [0..N] and records the parent step_hash so the chain becomes a tree. - bernstein replay export <agent_id> writes a portable tarball receipt; verify_receipt walks the chain offline. - bernstein replay publish <agent_id> --opt-in runs redaction and re-anchors the chain so the published receipt still verifies. - bernstein replay diff-journal A B surfaces the first divergent field rather than a flaky-test signature. New audit event-type entries (replay.step, replay.fork, replay.export, replay.publish) ride the existing HMAC-chained audit log; existing entries are untouched. Backward compatibility: bernstein git undo, plain session fork (no --from-step), and the audit-slice extractor work unchanged. Closes #1799

sourcery-ai

Sorry @chernistry, you have reached your weekly rate limit of 2500000 diff characters.

Please try again later or upgrade to continue using Sourcery

github-actions · 2026-05-21T20:27:41Z

Sonar insights (advisory, no merge-block)

Snapshot of bernstein on the configured Sonar instance:

Metric	Value
Coverage	13.4
Code smells	126
Bugs	11
Vulnerabilities	2
Security hotspots	87

Run bernstein doctor sonar locally for the full surface.

This comment is a soft signal. The Sonar scan runs on push to main; the PR check itself never fails on smells.

github-actions · 2026-05-21T20:27:48Z

Review-bot acknowledgement summary

Must-address findings: 0 (0 acknowledged, 0 open)
Informational findings: 0

All must-address findings are resolved or acknowledged.

github-actions · 2026-05-21T20:28:02Z

bernstein doctor observe for PR #1810 (feat/1799-step-replay-merkle): ok=0, warn=2, fail=0, error=0, skipped=2

sonar -- WARN (project bernstein)

metric	value	delta	threshold	status
coverage_pct	13.4%	new	80.0%	fail
code_smells	126	new	50	warn
bugs	11	new	0	fail
vulnerabilities	2	new	0	warn
security_hotspots	87	new	0	fail

code-scanning -- WARN (22 open alert(s))

metric	value	delta	threshold	status
open_alerts	22	new	0	fail
critical_alerts	0	new	0	ok
high_alerts	2	new	0	warn
medium_alerts	0	new	-	ok
low_alerts	0	new	-	ok

Skipped backends (credentials not configured)

glitchtip: BERNSTEIN_GLITCHTIP_TOKEN not set
dt: DTRACK_URL/TOKEN/PROJECT not set

See docs/observability/unified-doctor.md for backend setup notes.

github-actions · 2026-05-21T20:28:41Z

Mutation gate (fixed critical paths)

Module	Kill rate	Threshold	Killed/Total	Status	Notes
audit_integrity	100.0%	70%	38/38	✅	-
audit_log	100.0%	70%	57/57	✅	-
claim_next	84.9%	70%	45/53	✅	8 survivors
config_seed_parser	96.5%	70%	56/58	✅	budget exceeded; 2 survivors
lineage_gate	80.0%	75%	28/35	✅	7 survivors
lineage_merge	75.0%	75%	6/8	✅	2 survivors
lineage_tips	93.8%	75%	15/16	✅	1 survivors

Gate is advisory while thresholds stabilise. To kill survivors locally:
uv run python scripts/mutmut_critical.py --only <module>

+            "tool_result": self.tool_result,
+            "step_hash": self.step_hash,
+            "ts": self.ts,
+            "blob_refs": list(self.blob_refs),


+            "steps": self.steps,
+            "bernstein_version": self.bernstein_version,
+            "created_at": self.created_at,
+            "blob_digests": list(self.blob_digests),


+                "steps": self.steps,
+                "bernstein_version": self.bernstein_version,
+                "created_at": self.created_at,
+                "blob_digests": list(self.blob_digests),


+
+
+# Re-export so callers don't need a separate journal import for the symbol.
+JournalError = JournalError


+
+def _redact_row(row: dict[str, Any], policy: RedactionPolicy) -> dict[str, Any]:
+    """Return a copy of *row* with redaction policy applied."""
+    redacted = dict(row)


coderabbitai · 2026-05-21T20:31:30Z

Warning

Rate limit exceeded

@chernistry has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 26 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c4ab9640-0878-4b17-9297-fae95e001a54

📥 Commits

Reviewing files that changed from the base of the PR and between da8e804 and 0265d4e.

📒 Files selected for processing (18)

docs/operations/replay.md
src/bernstein/cli/commands/advanced_cmd.py
src/bernstein/cli/commands/replay_cmd.py
src/bernstein/cli/commands/session_cmd.py
src/bernstein/core/persistence/journal.py
src/bernstein/core/persistence/journal_diff.py
src/bernstein/core/persistence/journal_export.py
src/bernstein/core/persistence/journal_publish.py
src/bernstein/core/security/audit.py
src/bernstein/core/sessions/fork.py
tests/integration/test_fork_from_step.py
tests/integration/test_replay_divergence.py
tests/integration/test_replay_receipt_roundtrip.py
tests/unit/test_journal_chain.py
tests/unit/test_journal_divergence.py
tests/unit/test_journal_export.py
tests/unit/test_journal_publish.py
tests/unit/test_replay_journal_cli.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/1799-step-replay-merkle

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chernistry enabled auto-merge (squash) May 21, 2026 20:27

sourcery-ai Bot reviewed May 21, 2026

View reviewed changes

github-actions Bot added core cli docs tests size/xl labels May 21, 2026

chore(auto): regenerate AGENTS.md mirrors + ruff format

0265d4e

chernistry merged commit 6160ea6 into main May 21, 2026
25 of 26 checks passed

chernistry deleted the feat/1799-step-replay-merkle branch May 21, 2026 20:28

github-advanced-security AI found potential problems May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(replay): per-step session replay with hash-chained journal#1810

feat(replay): per-step session replay with hash-chained journal#1810
chernistry merged 2 commits into
mainfrom
feat/1799-step-replay-merkle

chernistry commented May 21, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		# Re-export so callers don't need a separate journal import for the symbol.
		JournalError = JournalError

Uh oh!

Uh oh!

Conversation

chernistry commented May 21, 2026

Summary

Changes

Acceptance criteria

Test plan

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 21, 2026

Sonar insights (advisory, no merge-block)

Uh oh!

github-actions Bot commented May 21, 2026

Review-bot acknowledgement summary

Uh oh!

github-actions Bot commented May 21, 2026

sonar -- WARN (project bernstein)

code-scanning -- WARN (22 open alert(s))

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mutation gate (fixed critical paths)

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading