Skip to content

Commit ece80c8

Browse files
author
auto-heal-fixup
committed
feat(run): image attachment passthrough with provenance
Adds the operator-facing --attach surface and the spawn-time provenance contract: each attached image is anchored to the run via an HMAC-chained audit event, the content-addressed blob store, and the worker's lineage v1 receipt parents. Changes: - Add Task.attachments list field and plumb it through Task.from_dict. - New module src/bernstein/core/agents/multimodal_attestation.py: build_attachment_context() reads paths, stores bytes in CAS, records the audit event, and returns a MultiModalContext. - New module src/bernstein/core/security/audit_chain.py: AuditChainStore facade over AuditLog plus the additive multimodal.attach event type and the record_multimodal_attach() helper. - Additive helpers in core/persistence/lineage_signer.py for the attachment-as-parent URI scheme. - CLI: --attach option on bernstein run, repeatable, validated path, with capability gating before any process is launched. - Adapters: Claude and Gemini accept multimodal_context= and inline base64-encoded attachments with the documented wire format. - YAML plan loader honours an attachments: list on each step. - Worktree pinning enforced at resolve time; cross-worktree attempts raise WorktreeAccessDenied. - Documentation in docs/operations/run.md. - Tests: - tests/unit/test_multimodal_attestation.py: 23 cases covering Task model field, capability gating, sha256 stability, audit record shape, lineage parents, worktree isolation, replay, tamper detection, chain continuity, and YAML plan loader. - tests/integration/test_run_attach.py: end-to-end stub-adapter round-trip plus CLI option validation. Closes #1797
1 parent fe6a831 commit ece80c8

12 files changed

Lines changed: 1781 additions & 0 deletions

File tree

docs/operations/run.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# `bernstein run` operator notes
2+
3+
This document covers the `bernstein run` surface and operator-facing
4+
flags. Other run-related docs:
5+
6+
* [`run_names.md`](run_names.md) -- the memorable deterministic run-name
7+
generator.
8+
* [`runbooks.md`](runbooks.md) -- recovery playbooks for stuck or
9+
failed runs.
10+
11+
## Image attachments (`--attach`)
12+
13+
`bernstein run` accepts one or more `--attach <path>` arguments to
14+
hand operator-supplied images (screenshots, diagrams) to the spawned
15+
agent. Repeat the flag for multiple files:
16+
17+
```
18+
bernstein run --goal "Reproduce the failure shown" \
19+
--attach ./screenshot.png \
20+
--attach ./architecture.svg \
21+
--cli claude
22+
```
23+
24+
### Capable adapters
25+
26+
Only `claude` and `gemini` accept attachments. Selecting any other
27+
adapter (`codex`, `aider`, `qwen`, ...) with `--attach` aborts the
28+
run BEFORE any process is launched with a `UsageError` that names
29+
the capable adapters.
30+
31+
### Wire format
32+
33+
Attached files are read at spawn time and inlined into the prompt
34+
body as base64-encoded `<attachment>` blocks at the head of the
35+
prompt:
36+
37+
```
38+
<attachment mime="image/png" sha256="<64 hex chars>">
39+
<base64 payload>
40+
</attachment>
41+
42+
<original prompt body>
43+
```
44+
45+
Both adapters use the same wire format so a replay path can
46+
verify exact bytes regardless of provider.
47+
48+
### Provenance
49+
50+
For each `--attach` invocation the orchestrator:
51+
52+
1. Hashes the raw bytes (SHA-256) and stores them once in the
53+
content-addressed blob store at `.sdd/cas/`.
54+
2. Appends a `multimodal.attach` event to the HMAC-chained audit log
55+
carrying `(sha256, mime, operator_install_id_sig, worker_id,
56+
turn_seq, worktree_id, prev_chain_digest)`. Tampering with the
57+
on-disk log fails verification.
58+
3. Adds the digest to the worker's lineage v1 receipt as a
59+
`multimodal-attachment://<sha256>` parent so any artefact produced
60+
this turn carries the input image's hash in its lineage.
61+
62+
Replay over the exported chain reproduces the exact bytes the model
63+
API saw on the original turn. Substituting bytes breaks the chain.
64+
65+
### Worktree pinning
66+
67+
The audit-chain event embeds the worktree id of the attaching
68+
worker. A worker in a different worktree cannot resolve the
69+
attachment back to bytes; the resolver raises
70+
`WorktreeAccessDenied` on cross-worktree attempts. This protects
71+
session-shared state where multiple worktrees coexist.
72+
73+
### Task YAML
74+
75+
Plan-file steps accept an `attachments:` list mirroring the CLI
76+
flag:
77+
78+
```yaml
79+
name: Reproduce failure
80+
stages:
81+
- name: investigate
82+
steps:
83+
- title: Describe the screenshot
84+
role: backend
85+
attachments:
86+
- ./screenshot.png
87+
- ./architecture.svg
88+
```
89+
90+
The orchestrator builds the same `MultiModalContext` from the
91+
listed paths and applies the same capability gate.
92+
93+
## References
94+
95+
* `src/bernstein/core/agents/multimodal_attestation.py` -- spawn-time
96+
resolver, capability gate, and worktree pinning.
97+
* `src/bernstein/core/security/audit_chain.py` -- the
98+
`multimodal.attach` event type and the `AuditChainStore` facade.
99+
* `src/bernstein/core/persistence/lineage_signer.py` --
100+
`register_attachment_parents` for lineage receipt augmentation.

src/bernstein/adapters/base.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -503,6 +503,7 @@ def spawn(
503503
task_scope: str = "medium",
504504
budget_multiplier: float = 1.0,
505505
system_addendum: str = "",
506+
multimodal_context: Any | None = None,
506507
) -> SpawnResult:
507508
"""Launch an agent process with the given prompt.
508509
@@ -523,6 +524,14 @@ def spawn(
523524
Adapters that support a separate system prompt (e.g. Claude
524525
Code's ``--append-system-prompt``) should use it; others
525526
may append to the user prompt as a fallback.
527+
multimodal_context: Optional
528+
:class:`bernstein.core.agents.multimodal.MultiModalContext`
529+
carrying base64-encoded attachments to be passed to the
530+
model API. Multimodal-capable adapters (Claude, Gemini)
531+
encode the attached bytes inline in the request body;
532+
other adapters MUST raise :class:`CapabilityRefusal`
533+
before any process is launched (see
534+
:func:`bernstein.core.agents.multimodal_attestation.refuse_when_incapable`).
526535
"""
527536
...
528537

src/bernstein/adapters/claude.py

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,55 @@ def _task_budgets_opt_in() -> bool:
6363
return raw.strip().lower() in {"1", "true", "yes", "on"}
6464

6565

66+
# ---------------------------------------------------------------------------
67+
# Multimodal attachment encoding (issue #1797)
68+
# ---------------------------------------------------------------------------
69+
70+
71+
def _inject_multimodal_attachments(prompt: str, multimodal_context: Any) -> str:
72+
"""Inline encoded attachments at the head of *prompt*.
73+
74+
The Claude Code CLI does not accept image attachments as separate
75+
arguments; we wrap each base64-encoded blob in a structured
76+
``<attachment mime="..." sha256="...">`` block before the user
77+
prompt so the model API receives the bytes inline. Tests assert the
78+
exact format so downstream replay can reconstruct what was sent.
79+
80+
Args:
81+
prompt: The agent prompt that will be passed to the CLI.
82+
multimodal_context: A
83+
:class:`bernstein.core.agents.multimodal.MultiModalContext`.
84+
85+
Returns:
86+
Prompt with the encoded blocks prepended, or *prompt* unchanged
87+
when the context contains no inputs.
88+
"""
89+
inputs = getattr(multimodal_context, "inputs", ())
90+
if not inputs:
91+
return prompt
92+
93+
import hashlib as _hashlib
94+
95+
blocks: list[str] = []
96+
for inp in inputs:
97+
b64 = getattr(inp, "content_base64", None) or ""
98+
mime = getattr(inp, "mime_type", "application/octet-stream")
99+
path = getattr(inp, "content_path", None)
100+
if path is not None:
101+
try:
102+
raw = Path(path).read_bytes()
103+
digest = _hashlib.sha256(raw).hexdigest()
104+
except OSError:
105+
digest = ""
106+
else:
107+
digest = ""
108+
# Format documented in docs/operations/run.md so adapters and
109+
# tests share the same wire format.
110+
blocks.append(f'<attachment mime="{mime}" sha256="{digest}">\n{b64}\n</attachment>')
111+
header = "\n".join(blocks)
112+
return f"{header}\n\n{prompt}"
113+
114+
66115
# Map short model names to Claude Code CLI model IDs.
67116
# Last verified against upstream @anthropic-ai/claude-code 2.1.x on 2026-05-05.
68117
# Opus 4.7 is GA at the same price as 4.6 (Anthropic news, 2026-04-16); Sonnet
@@ -509,8 +558,16 @@ def spawn(
509558
task_scope: str = "medium",
510559
budget_multiplier: float = 1.0,
511560
system_addendum: str = "",
561+
multimodal_context: Any | None = None,
512562
) -> SpawnResult:
513563
self.enforce_network_policy()
564+
# Issue #1797: encode any attached images into the prompt body
565+
# as base64 with the correct MIME type so the upstream model API
566+
# sees them. The Claude Code CLI does not accept attachments
567+
# directly; we inline them in a structured ``<attachment>`` block
568+
# at the head of the prompt so the model picks them up.
569+
if multimodal_context is not None:
570+
prompt = _inject_multimodal_attachments(prompt, multimodal_context)
514571
log_path = workdir / ".sdd" / "runtime" / f"{session_id}.log"
515572
log_path.parent.mkdir(parents=True, exist_ok=True)
516573

src/bernstein/adapters/gemini.py

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,46 @@ def resolve_google_cli_binary(
151151
return _DISCOVERY_CASCADE[0]
152152

153153

154+
# ---------------------------------------------------------------------------
155+
# Multimodal attachment encoding (issue #1797)
156+
# ---------------------------------------------------------------------------
157+
158+
159+
def _inject_multimodal_attachments(prompt: str, multimodal_context: Any) -> str:
160+
"""Inline encoded attachments at the head of the Gemini prompt.
161+
162+
Gemini accepts inline image bytes via ``inline_data`` blocks in the
163+
Generative Language API request. The CLI surface here forwards the
164+
prompt as a single argument so we serialise attachments as
165+
``<attachment>`` XML-ish blocks that the CLI's prompt processor
166+
inlines verbatim. This matches the Claude adapter wire format so
167+
downstream replay can verify exact bytes for both providers.
168+
"""
169+
inputs = getattr(multimodal_context, "inputs", ())
170+
if not inputs:
171+
return prompt
172+
173+
import hashlib as _hashlib
174+
from pathlib import Path as _Path
175+
176+
blocks: list[str] = []
177+
for inp in inputs:
178+
b64 = getattr(inp, "content_base64", None) or ""
179+
mime = getattr(inp, "mime_type", "application/octet-stream")
180+
path = getattr(inp, "content_path", None)
181+
if path is not None:
182+
try:
183+
raw = _Path(path).read_bytes()
184+
digest = _hashlib.sha256(raw).hexdigest()
185+
except OSError:
186+
digest = ""
187+
else:
188+
digest = ""
189+
blocks.append(f'<attachment mime="{mime}" sha256="{digest}">\n{b64}\n</attachment>')
190+
header = "\n".join(blocks)
191+
return f"{header}\n\n{prompt}"
192+
193+
154194
class GeminiAdapter(CLIAdapter):
155195
"""Spawn and monitor Google Gemini / Antigravity CLI sessions."""
156196

@@ -171,8 +211,15 @@ def spawn(
171211
task_scope: str = "medium",
172212
budget_multiplier: float = 1.0,
173213
system_addendum: str = "",
214+
multimodal_context: Any | None = None,
174215
) -> SpawnResult:
175216
self.enforce_network_policy()
217+
# Issue #1797: inline encoded attachments at the head of the
218+
# prompt body so the Gemini API receives the bytes alongside
219+
# the text. The Antigravity / legacy Gemini CLIs do not accept
220+
# attachments as separate arguments.
221+
if multimodal_context is not None:
222+
prompt = _inject_multimodal_attachments(prompt, multimodal_context)
176223
log_path = workdir / ".sdd" / "runtime" / f"{session_id}.log"
177224
log_path.parent.mkdir(parents=True, exist_ok=True)
178225

src/bernstein/cli/run_bootstrap.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1124,6 +1124,20 @@ def exec_restart() -> None:
11241124
"score 1.0. Off by default; existing runs are unaffected."
11251125
),
11261126
)
1127+
@click.option(
1128+
"--attach",
1129+
"attach",
1130+
type=click.Path(exists=True, dir_okay=False, path_type=Path),
1131+
multiple=True,
1132+
default=(),
1133+
help=(
1134+
"Attach an image / diagram to the run (issue #1797). May be repeated "
1135+
"for multiple files. The orchestrator builds a MultiModalContext at "
1136+
"spawn time, records a multimodal.attach event in the audit chain, "
1137+
"and refuses adapters that do not advertise multimodal capability. "
1138+
"Capable adapters: claude, gemini."
1139+
),
1140+
)
11271141
def run(
11281142
plan_file: Path | None,
11291143
goal: str | None,
@@ -1165,6 +1179,7 @@ def run(
11651179
retry_budget_spec: str | None = None,
11661180
criterion_profile: str | None = None,
11671181
max_blast_radius: float | None = None,
1182+
attach: tuple[Path, ...] = (),
11681183
) -> None:
11691184
"""Parse seed, init workspace, start server, launch agents.
11701185
@@ -1214,6 +1229,7 @@ def run(
12141229
retry_budget_spec=retry_budget_spec,
12151230
criterion_profile=criterion_profile,
12161231
max_blast_radius=max_blast_radius,
1232+
attach=attach,
12171233
)
12181234
except (click.UsageError, SystemExit):
12191235
raise
@@ -1263,6 +1279,7 @@ def _run_impl(
12631279
retry_budget_spec: str | None,
12641280
criterion_profile: str | None,
12651281
max_blast_radius: float | None,
1282+
attach: tuple[Path, ...] = (),
12661283
) -> None:
12671284
"""Concrete ``run`` implementation; wrapped by :func:`run` for hinting.
12681285
@@ -1324,6 +1341,35 @@ def _run_impl(
13241341
raise click.UsageError(f"--criterion-profile {criterion_profile!r}: {exc}") from None
13251342
os.environ["BERNSTEIN_RUN_CRITERION_PROFILE"] = criterion_profile
13261343

1344+
# Issue #1797: capability-gate ``--attach`` BEFORE any process is
1345+
# launched. When the operator selected an adapter that does not
1346+
# advertise multimodal capability, surface a structured error that
1347+
# names capable adapters instead of spawning the run and failing
1348+
# mid-flight.
1349+
if attach:
1350+
from bernstein.core.agents.multimodal_attestation import (
1351+
CapabilityRefusal,
1352+
refuse_when_incapable,
1353+
)
1354+
1355+
# ``cli`` may be ``None`` (auto-detect) or "auto". When the
1356+
# operator has not pinned an adapter, only refuse if every
1357+
# candidate is incapable; with "claude" / "gemini" auto-detect
1358+
# is allowed.
1359+
explicit_adapter = (cli or "").strip().lower()
1360+
if explicit_adapter and explicit_adapter not in {"", "auto"}:
1361+
try:
1362+
refuse_when_incapable(
1363+
adapter_name=explicit_adapter,
1364+
attachments=[str(p) for p in attach],
1365+
)
1366+
except CapabilityRefusal as exc:
1367+
raise click.UsageError(f"--attach requires a multimodal-capable adapter. {exc!s}") from None
1368+
# Stash the attachments for downstream consumers via env var so
1369+
# the orchestrator subprocess can pick them up without
1370+
# additional argument threading.
1371+
os.environ["BERNSTEIN_RUN_ATTACHMENTS"] = os.pathsep.join(str(p) for p in attach)
1372+
13271373
# Issue #1320: ``--budget`` is the friendlier alias of ``--max-cost-usd``
13281374
# and shares the same env var. When both are set, the operator's
13291375
# explicit ``--max-cost-usd`` wins for backward compat.

0 commit comments

Comments
 (0)