Commit 835651d
feat: integration mount evals + prescriptive spawn instructions for non-claude CLIs (#375)
* feat: integration mount evals + prescriptive spawn instructions for non-claude CLIs
Adds an eval harness (evals/) that spawns real agents via the broker in a
fixture dir with a fake .integrations/ mount, then scores whether the agent
wrote to the correct path with a valid JSON payload. Covers 6 scenarios
(Slack channel/DM, Linear create/update/comment/delete) across 5 guidance
variants (bare, claude-md, slim-inject, full-inject, prescriptive).
Key findings from eval runs:
- `prescriptive` variant achieves 18/18 (100%) across all free and Chinese
models (deepseek-v4-flash-free, mimo, nemotron, north-mini-code, gpt-5.4-nano,
gpt-5.4-mini, gpt-5.1-codex-mini, gpt-5.5) — reliable for non-claude CLIs
- `full-inject` and `slim-inject` also reach 100% once absolute paths are
injected for CLIs whose cwd doesn't match the project fixture dir
- `bare` fails universally — no model self-discovers integration paths
Harness changes:
- opencode uses `spawnCli({ transport: 'headless' })` + `skipRelayPrompt`
- Default opencode model is `opencode/deepseek-v4-flash-free` (free, fast)
- Non-claude CLIs receive absolute fixture paths in the task prefix so writes
land in the correct temp dir regardless of CLI cwd detection
Production wiring:
- `IntegrationsManager.prescriptiveSpawnInstructions()`: derives the lookup
table from real `writebackCommandMountPaths` — same data as
`initialSpawnInstructions`, compact format instead of narrative prose
- `broker:spawn-agent` IPC handler routes `cli !== 'claude'` to prescriptive;
`recordSpawnInstructionDelivery` guarded to narrative path only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: CI test mock + address PR review feedback
CI fix:
- ipc-handlers.test.ts mocked integrationsManager was missing
prescriptiveSpawnInstructions/recordSpawnInstructionDelivery, so the
broker:spawn-agent test (cli: 'codex') threw "not a function". Added the
mocks plus focused routing tests (non-claude → prescriptive + no delivery
record; claude → narrative + delivery record).
Review feedback:
- package.json: depend on published @agent-relay/evals@^8.8.2 instead of
file:../relay/packages/evals so fresh clones / CI can resolve it (codex P2)
- integrations.ts: Linear comment path now references the canonical issue
resource file (<KEY>-<num>__<uuid>.json) instead of a bare <issueId> dir —
the local mount's linearIssueCommentRemotePath rejects UUID-less paths, so
the old instruction produced files that never became visible comments (codex P1)
- integrations.ts + variants.ts: drop redundant "userId" from the Slack DM
payload (path-derived; matches the discovery schema) (gemini)
- integrations.ts: remove the stale <channelDir> note — the emitted path is
already concrete (gemini)
- report.ts: add 'prescriptive' to VARIANT_ORDER so the HTML report column
sorts last instead of first (gemini)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore: apply pr-reviewer fixes for #375
* refactor(integrations): make prescriptiveSpawnInstructions fully adapter-driven
Removes the hardcoded per-provider branches (and the interim curated payload
map) from prescriptiveSpawnInstructions. Writable resources + path templates
now come entirely from each provider's discovery `.adapter.md` (shipped by
relayfile-adapters), and payload shape is pointed at that resource's discovery
`.create.example.json`. No per-provider knowledge lives in pear, so a new
integration works with zero code change here.
- Parse the adapter doc's "Writable resources" section (provider-agnostic)
- Resolve each resource's concrete, in-scope path from the integration's
writeback mount roots; preserve {id} placeholders for nested resources
- Point at the adapter's create example for fields instead of inlining payloads
- Graceful fallback to a discovery pointer when the adapter doc isn't mounted
Known gap tracked upstream: the local mount currently serves discovery inferred
from synced read records (which omits required write fields like Slack's
`text`), so the pointed-at example is imperfect until that's fixed —
AgentWorkforce/relayfile#299. The adapters already publish correct write-shaped
discovery; the fix belongs in the mount/sync pipeline, not pear.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore: remove stray bot-added incident file
agent-relay-code[bot] pushed an unrelated mount-root "incident" note (from its
own Daytona sandbox) onto this PR; it references a non-existent doc and has
nothing to do with the prescriptive-spawn/evals change. Removing it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: agent-relay-code[bot] <agent-relay-code[bot]@users.noreply.github.com>1 parent 5e2489f commit 835651d
32 files changed
Lines changed: 1831 additions & 3 deletions
File tree
- evals
- fixtures/discovery
- linear
- issues
- {issueId}/comments
- slack
- channels/{channelId}/messages
- users/{userId}/messages
- reports
- scenarios
- src/main
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
Lines changed: 5 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
0 commit comments