Skip to content

Commit 835651d

Browse files
khaliqgantclaudeagent-relay-code[bot]
authored
feat: integration mount evals + prescriptive spawn instructions for non-claude CLIs (#375)
* feat: integration mount evals + prescriptive spawn instructions for non-claude CLIs Adds an eval harness (evals/) that spawns real agents via the broker in a fixture dir with a fake .integrations/ mount, then scores whether the agent wrote to the correct path with a valid JSON payload. Covers 6 scenarios (Slack channel/DM, Linear create/update/comment/delete) across 5 guidance variants (bare, claude-md, slim-inject, full-inject, prescriptive). Key findings from eval runs: - `prescriptive` variant achieves 18/18 (100%) across all free and Chinese models (deepseek-v4-flash-free, mimo, nemotron, north-mini-code, gpt-5.4-nano, gpt-5.4-mini, gpt-5.1-codex-mini, gpt-5.5) — reliable for non-claude CLIs - `full-inject` and `slim-inject` also reach 100% once absolute paths are injected for CLIs whose cwd doesn't match the project fixture dir - `bare` fails universally — no model self-discovers integration paths Harness changes: - opencode uses `spawnCli({ transport: 'headless' })` + `skipRelayPrompt` - Default opencode model is `opencode/deepseek-v4-flash-free` (free, fast) - Non-claude CLIs receive absolute fixture paths in the task prefix so writes land in the correct temp dir regardless of CLI cwd detection Production wiring: - `IntegrationsManager.prescriptiveSpawnInstructions()`: derives the lookup table from real `writebackCommandMountPaths` — same data as `initialSpawnInstructions`, compact format instead of narrative prose - `broker:spawn-agent` IPC handler routes `cli !== 'claude'` to prescriptive; `recordSpawnInstructionDelivery` guarded to narrative path only Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: CI test mock + address PR review feedback CI fix: - ipc-handlers.test.ts mocked integrationsManager was missing prescriptiveSpawnInstructions/recordSpawnInstructionDelivery, so the broker:spawn-agent test (cli: 'codex') threw "not a function". Added the mocks plus focused routing tests (non-claude → prescriptive + no delivery record; claude → narrative + delivery record). Review feedback: - package.json: depend on published @agent-relay/evals@^8.8.2 instead of file:../relay/packages/evals so fresh clones / CI can resolve it (codex P2) - integrations.ts: Linear comment path now references the canonical issue resource file (<KEY>-<num>__<uuid>.json) instead of a bare <issueId> dir — the local mount's linearIssueCommentRemotePath rejects UUID-less paths, so the old instruction produced files that never became visible comments (codex P1) - integrations.ts + variants.ts: drop redundant "userId" from the Slack DM payload (path-derived; matches the discovery schema) (gemini) - integrations.ts: remove the stale <channelDir> note — the emitted path is already concrete (gemini) - report.ts: add 'prescriptive' to VARIANT_ORDER so the HTML report column sorts last instead of first (gemini) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore: apply pr-reviewer fixes for #375 * refactor(integrations): make prescriptiveSpawnInstructions fully adapter-driven Removes the hardcoded per-provider branches (and the interim curated payload map) from prescriptiveSpawnInstructions. Writable resources + path templates now come entirely from each provider's discovery `.adapter.md` (shipped by relayfile-adapters), and payload shape is pointed at that resource's discovery `.create.example.json`. No per-provider knowledge lives in pear, so a new integration works with zero code change here. - Parse the adapter doc's "Writable resources" section (provider-agnostic) - Resolve each resource's concrete, in-scope path from the integration's writeback mount roots; preserve {id} placeholders for nested resources - Point at the adapter's create example for fields instead of inlining payloads - Graceful fallback to a discovery pointer when the adapter doc isn't mounted Known gap tracked upstream: the local mount currently serves discovery inferred from synced read records (which omits required write fields like Slack's `text`), so the pointed-at example is imperfect until that's fixed — AgentWorkforce/relayfile#299. The adapters already publish correct write-shaped discovery; the fix belongs in the mount/sync pipeline, not pear. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore: remove stray bot-added incident file agent-relay-code[bot] pushed an unrelated mount-root "incident" note (from its own Daytona sandbox) onto this PR; it references a non-existent doc and has nothing to do with the prescriptive-spawn/evals change. Removing it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: agent-relay-code[bot] <agent-relay-code[bot]@users.noreply.github.com>
1 parent 5e2489f commit 835651d

32 files changed

Lines changed: 1831 additions & 3 deletions

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,6 @@ tests/playwright/screenshots/
2121

2222
# Local factory runtime config (not for commit)
2323
factory.config.json
24+
25+
evals/reports/*.json
26+
evals/reports/*.html

evals/fixture.ts

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
/**
2+
* Creates a fresh temp directory with a fake .integrations/ mount structure.
3+
*
4+
* Discovery schemas are copied from evals/fixtures/discovery/ so eval agents
5+
* see consistent, realistic schemas without depending on a live mount.
6+
* Provider dirs (slack/channels/..., linear/issues/) are empty and writable.
7+
*/
8+
9+
import { cpSync, existsSync, mkdirSync, mkdtempSync, writeFileSync } from 'node:fs'
10+
import { dirname, join } from 'node:path'
11+
import { fileURLToPath } from 'node:url'
12+
import os from 'node:os'
13+
14+
import { snapshotMount, newMountFiles } from '@agent-relay/evals/scoring/mount'
15+
16+
const __dirname = dirname(fileURLToPath(import.meta.url))
17+
const BUNDLED_DISCOVERY = join(__dirname, 'fixtures', 'discovery')
18+
19+
// Stable fake IDs used across all scenarios
20+
export const EVAL_CHANNEL_ID = 'C12345EVAL'
21+
export const EVAL_CHANNEL_SLUG = 'general'
22+
export const EVAL_CHANNEL_DIR = `${EVAL_CHANNEL_ID}__${EVAL_CHANNEL_SLUG}`
23+
export const EVAL_USER_ID = 'U67890EVAL'
24+
export const EVAL_ISSUE_ID = 'ARC-123EVL'
25+
26+
/**
27+
* Create a fresh temp directory with the fake mount.
28+
* Returns the absolute path to the temp dir.
29+
*/
30+
export function createFixture({ claudeMd }: { claudeMd?: string | null } = {}): string {
31+
const tmpDir = mkdtempSync(join(os.tmpdir(), 'pear-eval-'))
32+
33+
// Copy bundled discovery schemas
34+
const discoveryDest = join(tmpDir, '.integrations', 'discovery')
35+
mkdirSync(discoveryDest, { recursive: true })
36+
if (existsSync(BUNDLED_DISCOVERY)) {
37+
cpSync(BUNDLED_DISCOVERY, discoveryDest, { recursive: true })
38+
}
39+
40+
// Writable provider dirs — empty, agent creates files here
41+
mkdirSync(join(tmpDir, '.integrations', 'slack', 'channels', EVAL_CHANNEL_DIR, 'messages'), { recursive: true })
42+
mkdirSync(join(tmpDir, '.integrations', 'slack', 'users', EVAL_USER_ID, 'messages'), { recursive: true })
43+
mkdirSync(join(tmpDir, '.integrations', 'linear', 'issues'), { recursive: true })
44+
// Pre-create the comment subpath so s05 has a valid target dir
45+
mkdirSync(join(tmpDir, '.integrations', 'linear', 'issues', EVAL_ISSUE_ID, 'comments'), { recursive: true })
46+
47+
if (claudeMd) {
48+
writeFileSync(join(tmpDir, 'CLAUDE.md'), claudeMd)
49+
}
50+
51+
return tmpDir
52+
}
53+
54+
// Re-export for convenience so callers don't need to import from two places
55+
export { snapshotMount, newMountFiles }
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# linear writeback adapter
2+
3+
This provider supports file-native writeback. Before creating or
4+
updating a resource, read its schema and example files so you do not
5+
guess the payload shape.
6+
7+
## Writable resources
8+
9+
### issues — `.integrations/linear/issues`
10+
11+
- schema: `discovery/linear/issues/.schema.json`
12+
- create example: `discovery/linear/issues/.create.example.json`
13+
- update example: `discovery/linear/issues/.update.example.json`
14+
- delete example: `discovery/linear/issues/.delete.example.json`
15+
16+
### comments — `.integrations/linear/issues/{issueId}/comments`
17+
18+
- schema: `discovery/linear/issues/{issueId}/comments/.schema.json`
19+
- create example: `discovery/linear/issues/{issueId}/comments/.create.example.json`
20+
21+
## Actions
22+
23+
### Create
24+
Write a new JSON file (any filename, no leading dot) under the resource path.
25+
Example: `.integrations/linear/issues/new-bug.json`
26+
27+
### Update
28+
Write a JSON file with `"_action": "update"` and the `"id"` of the record to
29+
update, plus the fields to change.
30+
Example: `.integrations/linear/issues/update-ARC-123.json`
31+
32+
### Delete
33+
Write a JSON file with `"_action": "delete"` and the `"id"` of the record.
34+
Example: `.integrations/linear/issues/delete-ARC-123.json`
35+
36+
### Comment
37+
Write a JSON file under the issue's comments path with `"issueId"` and `"body"`.
38+
Example: `.integrations/linear/issues/ARC-123EVL/comments/comment-1.json`
39+
40+
## Contract
41+
42+
- Read `<resource>/.schema.json` (JSON Schema draft 2020-12). Fields
43+
with `readOnly: true` are server-managed; never set them.
44+
- Read `<resource>/.create.example.json` for a minimal valid create payload.
45+
- Do NOT write files inside `discovery/` — that directory is read-only schema reference.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"title": "Issue title",
3+
"description": "Describe the issue here",
4+
"priority": 2
5+
}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"id": "ISSUE_ID_HERE",
3+
"_action": "delete"
4+
}
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"title": "linear issues",
4+
"description": "Record shape for /linear/issues. Fields marked readOnly are server-managed; omit them from create payloads.",
5+
"type": "object",
6+
"additionalProperties": true,
7+
"properties": {
8+
"id": { "type": "string", "readOnly": true },
9+
"identifier": { "type": "string", "readOnly": true },
10+
"title": { "type": "string" },
11+
"description": { "type": "string" },
12+
"priority": { "type": "integer" },
13+
"assignee_name": { "type": "string" },
14+
"labelIds": { "type": "array", "items": { "type": "string" } },
15+
"createdAt": { "type": "string", "readOnly": true },
16+
"creatorId": { "type": "string", "readOnly": true }
17+
}
18+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"id": "ISSUE_ID_HERE",
3+
"_action": "update",
4+
"title": "Updated title",
5+
"priority": 1
6+
}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"issueId": "ISSUE_ID_HERE",
3+
"body": "Your comment text here."
4+
}
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"title": "linear issue comments",
4+
"description": "Record shape for /linear/issues/{issueId}/comments. Fields marked readOnly are server-managed; omit them from create payloads.",
5+
"type": "object",
6+
"additionalProperties": true,
7+
"properties": {
8+
"id": { "type": "string", "readOnly": true },
9+
"issueId": { "type": "string" },
10+
"body": { "type": "string" },
11+
"createdAt": { "type": "string", "readOnly": true },
12+
"userId": { "type": "string", "readOnly": true }
13+
},
14+
"required": ["issueId", "body"]
15+
}
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# slack writeback adapter
2+
3+
This provider supports file-native writeback. Before creating or
4+
updating a resource, read its schema and create example so you do not
5+
guess the payload shape.
6+
7+
## Writable resources
8+
9+
### messages — `/slack/channels/{channelId}/messages`
10+
11+
- schema: `discovery/slack/channels/{channelId}/messages/.schema.json`
12+
- create example: `discovery/slack/channels/{channelId}/messages/.create.example.json`
13+
14+
### reactions — `/slack/channels/{channelId}/messages/{messageTs}/reactions`
15+
16+
- schema: `discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.schema.json`
17+
- create example: `discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.create.example.json`
18+
19+
### replies — `/slack/channels/{channelId}/messages/{messageTs}/replies`
20+
21+
- schema: `discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.schema.json`
22+
- create example: `discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.create.example.json`
23+
24+
### direct-messages — `/slack/users/{userId}/messages`
25+
26+
- schema: `discovery/slack/users/{userId}/messages/.schema.json`
27+
- create example: `discovery/slack/users/{userId}/messages/.create.example.json`
28+
29+
## Contract
30+
31+
- Read `<resource>/.schema.json` (JSON Schema draft 2020-12). Fields
32+
with `readOnly: true` are server-managed; never set them.
33+
- Read `<resource>/.create.example.json` for a minimal valid create
34+
payload (read-only fields already omitted).
35+
- Schemas are inferred from synced records and refine on each
36+
sync; treat unknown extra fields as allowed.
37+
- Write a new JSON file (any filename) under the target resource path to
38+
dispatch the action. Example: `.integrations/slack/channels/C12345__general/messages/my-message.json`
39+
- Do NOT write files inside `discovery/` — that directory is read-only schema reference.

0 commit comments

Comments
 (0)