feat(agents): add OpenCode agent adapter (#5001)#701
Conversation
Add OpenCodeAgentAdapter (TypeScript) so scenario can simulate/evaluate the OpenCode coding agent (sst/opencode) via @opencode-ai/sdk, mirroring the in-house Claude Code adapter (class + lowercase `openCodeAgent` factory) and the realtime adapter's injection idiom. - Session-per-threadId: one server-side OpenCode session per thread, reused across turns; sends only the new user delta (not a full-history replay) because OpenCode holds the transcript server-side. - Completion primitive: awaits `session.prompt()` (resolves only after the assistant finishes; no SSE). Two-layer error handling (transport `result.error` + semantic `info.error`); empty-text fallback so a tool-only turn never yields a silent "". - Testability via dependency-injected OpencodeClient (no SDK module mock). - Pins @opencode-ai/sdk@1.17.9 (response envelope verified against the installed package, R-2). Design recorded in docs/adr/005. Closes #5001 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
WalkthroughAdds ChangesOpenCode Agent Adapter
Sequence Diagram(s)sequenceDiagram
participant Scenario
participant OpenCodeAgentAdapter
participant OpencodeClient
participant OpenCodeServer
rect rgba(70, 130, 180, 0.5)
note over OpenCodeAgentAdapter,OpenCodeServer: First call — server + session creation
Scenario->>OpenCodeAgentAdapter: call(input, threadId="t1")
OpenCodeAgentAdapter->>OpenCodeServer: spawn opencode serve (memoized)
OpenCodeServer-->>OpenCodeAgentAdapter: OpencodeClient
OpenCodeAgentAdapter->>OpencodeClient: session.create(model)
OpencodeClient-->>OpenCodeAgentAdapter: sessionId
end
rect rgba(60, 179, 113, 0.5)
note over OpenCodeAgentAdapter,OpencodeClient: Prompt dispatch and response parsing
OpenCodeAgentAdapter->>OpencodeClient: session.prompt(sessionId, deltaText, AbortSignal.timeout)
OpencodeClient-->>OpenCodeAgentAdapter: {data: {parts, info}, error}
OpenCodeAgentAdapter->>OpenCodeAgentAdapter: check result.error (transport)
OpenCodeAgentAdapter->>OpenCodeAgentAdapter: check info.error (semantic)
OpenCodeAgentAdapter->>OpenCodeAgentAdapter: partsToText(parts) or renderNonTextPart fallback
OpenCodeAgentAdapter-->>Scenario: assistant text string
end
rect rgba(220, 120, 60, 0.5)
note over OpenCodeAgentAdapter,OpencodeClient: Subsequent turn — session reused
Scenario->>OpenCodeAgentAdapter: call(input, threadId="t1")
OpenCodeAgentAdapter->>OpencodeClient: session.prompt(same sessionId, newDelta)
OpencodeClient-->>OpenCodeAgentAdapter: {data, error}
OpenCodeAgentAdapter-->>Scenario: assistant text string
end
Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
javascript/src/agents/opencode/opencode-agent.adapter.ts (1)
170-317: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winReorder class methods to keep public methods first and private helpers at the bottom.
call/close(public API) currently appear after private helpers. Reordering improves scanability and aligns with project conventions.As per coding guidelines,
**/*.ts: “In TypeScript classes, place public methods first, private methods at the bottom, and group related methods together.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@javascript/src/agents/opencode/opencode-agent.adapter.ts` around lines 170 - 317, The public methods `call()` and `close()` are currently positioned after private helper methods like `logger`, `ensureClient()`, `directoryQuery()`, and `resolveSessionId()`. Reorder the class methods so that the public `call()` and `close()` methods appear first, followed by all private helper methods. This improves code scanability and aligns with the project convention of placing public methods before private ones.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/docs/pages/agent-integration/opencode.mdx`:
- Around line 35-158: The documentation page references several scenario
framework APIs (scenario.run, userSimulatorAgent, judgeAgent, scenario.user,
scenario.agent, scenario.judge) and mentions a comparison to Claude Code without
providing links to their respective documentation pages. Add markdown links to
connect these API references to their documentation: link scenario.run,
userSimulatorAgent(), and judgeAgent() to the Writing Scenarios page, add a
dedicated link to the Judge Agent page when judgeAgent() is first mentioned,
link userSimulatorAgent() to the User Simulator Agent page, and ensure the
Claude Code comparison section references the agent integration guides. Use the
existing markdown link patterns from the codebase to maintain consistency with
other documentation pages.
In `@javascript/src/agents/__tests__/opencode-adapter.test.ts`:
- Around line 351-355: The current try/catch block around adapter.call() allows
the test to silently pass if no error is thrown, since the message-length
assertion inside the catch block would never execute. Replace this try/catch
pattern with a single await expect() assertion using .rejects to strictly verify
that adapter.call() throws an error and that the error message has length
greater than zero. This ensures the test fails if adapter.call() unexpectedly
succeeds without throwing.
In `@javascript/src/agents/opencode/opencode-agent.adapter.ts`:
- Around line 273-275: The condition in the ternary operator for the timeout
configuration uses a truthiness check on this.config.timeout, which treats zero
as falsy and skips timeout wiring when the value is explicitly set to 0. Change
the condition to explicitly check whether this.config.timeout is not undefined
(or not null) instead of relying on truthiness, so that an explicitly configured
timeout value of 0 is properly respected and passed to the AbortSignal.timeout()
method.
- Around line 259-261: The logger.log call in the OpenCode agent adapter is
logging raw sessionId and input.threadId values verbatim, which exposes
sensitive session and conversation metadata if logs are shared or centralized.
To fix this, replace the raw sessionId and input.threadId identifiers with
hashed or truncated versions before including them in the log message. Consider
using a hashing function or displaying only a partial identifier (like the first
few characters) to maintain logging utility while protecting sensitive metadata.
- Around line 206-229: The resolveSessionId method has a race condition where
two concurrent calls with the same threadId can both observe that no session
exists and then both call client.session.create, creating duplicate sessions.
Fix this by adding a private Map to track pending session creation promises
(e.g., pendingSessionCreations). At the start of resolveSessionId, after
checking this.sessions.get(threadId), also check if there is a pending creation
promise in the pendingSessionCreations Map for that threadId and await it if one
exists. When creating a new session, store the creation promise in
pendingSessionCreations before calling client.session.create, then remove it
after the session is successfully stored in this.sessions. This ensures only one
session creation happens per threadId even with concurrent calls.
---
Nitpick comments:
In `@javascript/src/agents/opencode/opencode-agent.adapter.ts`:
- Around line 170-317: The public methods `call()` and `close()` are currently
positioned after private helper methods like `logger`, `ensureClient()`,
`directoryQuery()`, and `resolveSessionId()`. Reorder the class methods so that
the public `call()` and `close()` methods appear first, followed by all private
helper methods. This improves code scanability and aligns with the project
convention of placing public methods before private ones.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ec49c418-4f63-4323-accb-d83320fb9c0e
⛔ Files ignored due to path filters (1)
javascript/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (8)
docs/adr/005-opencode-agent-adapter.mddocs/docs/pages/agent-integration/opencode.mdxdocs/vocs.config.tsxjavascript/package.jsonjavascript/src/agents/__tests__/opencode-adapter.test.tsjavascript/src/agents/index.tsjavascript/src/agents/opencode/index.tsjavascript/src/agents/opencode/opencode-agent.adapter.ts
…hable `describeError` Addresses a github-code-quality finding: the `error == null` guard in `describeInfoError` was provably dead (its only caller is guarded by `if (infoError)`, so `error` is always truthy). Merge the two near-identical `describeTransportError`/`describeInfoError` helpers into a single `describeError` — the null guard is now reachable via the `session.create` call site (`created.error` may be null when only `!data.id` tripped), and the duplication is gone. Output behavior is preserved (transport: "<msg> (status N)"; semantic: "<name>: <msg>"); all 23 unit tests stay green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…0, strict test, doc links Addresses CodeRabbit findings on #701: - Stability (race): `resolveSessionId` was check-then-create — two concurrent first-calls on the same threadId could both create. Now stores the in-flight CREATE PROMISE per thread (one session.create, evicted on failure). - Functional (timeout=0): a truthiness check silently ignored an explicit `0`. Add `timeoutSignal()` validation — non-positive/non-finite timeout throws a clear error before any RPC (mirrors the Claude Code sibling). - Test quality: the R2 semantic-error test used a try/catch that passed silently if call() resolved → a strict `.rejects.toThrow(/MessageOutputLengthError/)`. - Docs: add See-also cross-links (writing-scenarios, user-simulator, judge-agent). Adds 3 unit tests (concurrent-create dedup, timeout=0 throws, negative timeout). All 26 unit tests + AC-4 live e2e green; typecheck/lint clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Multi-reviewer pass on #701 (principles/hygiene/test + Uncle-Bob/Metz-Beck/ Fowler personas; design-soundness PASSed — correctly uses opencode's stateful session primitive, the opposite of the #687 replay miss): - partsToText: narrow on the `Part` discriminant (`Extract<Part,{type:"text"}>`) instead of a `Record<string,unknown>` cast — restores the compile-time safety the injection seam exists for. - renderContent: collapse to a text-only flattener; delete `renderContentBlock` (its tool-call/tool-result/reasoning branches were dead — `extractNewUserText` only renders user-role messages — and duplicated `utils.summarizeToolMessage`). - safeStringify: fix a real bug — the WeakSet was global-seen (added, never removed), mislabeling sibling refs as "[Circular]". Use the repo's try/JSON.stringify/String pattern (true circular → String fallback). - Rename public `Logger` → `OpenCodeLogger` (collided with utils Logger on the package barrel); drop the unused `warn`. - Extract `interpretPromptResult` + a single identity-guarded `evictSession`; compute `directoryQuery()` once per RPC; harden `close()` against a failed spawn; align domain imports to the no-extension house style. - Tests: deterministic concurrent-dedup (gated deferred), `/prompt failed/` transport assertion, + a session.create-failure/eviction test. 27 unit tests + AC-4 live e2e green; build/lint:all/typecheck:all clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Automated low-risk assessment This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.
This PR requires a manual review before merging. |
Review verdict: READYMulti-agent review of PR #701 ( Headline
Findings addressed (Fix)
Non-blocking (Decide / follow-up) — not gating
|
Summary
Adds an
OpenCodeAgentAdapter(TypeScript) soscenariocan simulate and evaluate the OpenCode coding agent (sst/opencode) via the official@opencode-ai/sdk. This brings scenario's evaluation loop to the coding-agent use case for the first time.Modeled on the in-house Claude Code adapter (a
class extends AgentAdapter+ a lowercaseopenCodeAgent(config)factory) and the realtime adapter's dependency-injection idiom. The genuinely new bit is stateful session-per-thread: OpenCode keeps the transcript server-side, so the adapter creates one session perthreadIdand sends only the new user turn each call — no full-history replay.What changed
javascript/src/agents/opencode/opencode-agent.adapter.tsOpenCodeAgentAdapterclass + helpers (extractNewUserText,partsToText,renderContent, …)javascript/src/agents/opencode/index.tsopenCodeAgent(config)factoryjavascript/src/agents/index.tsexport * from "./opencode"(re-exported through@langwatch/scenario)javascript/src/agents/__tests__/opencode-adapter.test.tsjavascript/package.json@opencode-ai/sdk@1.17.9docs/docs/pages/agent-integration/opencode.mdx+docs/vocs.config.tsxdocs/adr/005-opencode-agent-adapter.mdAcceptance criteria — all met
AgentAdapterinterfaceinstanceof AgentAdapter,role === AGENT;typecheck:allgreen; consumerimport { openCodeAgent } from "@langwatch/scenario"typecheckscall()perthreadId; reuse aftersession.createper thread across N calls; samepath.idreusedcreateOpencode()+ provider key env varsagent-integration/opencode.mdx(docs build green)Part[]→message handles text + skips unknown parts gracefully[text, tool, step-start, reasoning, unknown, text]→ only text, no throwDesign (ADR-005) — went through
/decide+ devils-advocateKey calls (full rationale in
docs/adr/005-opencode-agent-adapter.md):OpencodeClient(notvi.mockof the SDK) — the realtime-adapter idiom; tests run against the realOpencodeClientinterface, so an SDK envelope change fails the fake's compile.result.error(transport) andresult.data.info.error(an HTTP-200 reply can carryProviderAuthError | MessageOutputLengthError | …with empty text). Continuation failures evict the stale session id.""(AC-4 forbids empty/truncated).session.promptenvelope (result.data.partsfor text,result.data.infofor metadata) was verified against the installed@opencode-ai/sdk@1.17.9, then re-confirmed live (a real reply's parts are["step-start","text","step-finish"]).modelrequired — a product choice for reproducible evals (honestly: the SDK'smodelis optional with a server default).Human verification
To run the live adapter yourself:
The unit tests need no creds:
npx vitest run src/agents/__tests__/opencode-adapter.test.ts.How I can prove I was successful
AC-4 — live multi-turn scenario (the real proof), captured run. A real
scenario.run(...)drivesopenCodeAgent(auto-spawnedopencode serve,openai/gpt-4o-mini) through a two-turn coding task with a real user-simulator and a real LLM judge. Captured transcript of one run —success=true,messageCount=4(two user turns, two coherent opencode assistant turns; judge returned PASS):Run repeatedly (env-gated
RUN_OPENCODE_E2E=1, real binary +OPENAI_API_KEY), all green, no truncated/empty turns across runs:SDK chain confirmed live (isolates the binary/key/envelope from the framework):
Gates — CI (green at HEAD) + local:
ci-checks (24.x)→ the requiredjavascript-complete; plusdocs-complete):build:all·lint:all·typecheck:all·test:ciall pass. The opencode unit file runs in CI as27 tests | 1 skipped= 26 creds-free unit tests (AC-1/2/3/6) + the 1 env-gated AC-4 e2e (skips in CI).vitest runacross all workspace projects (CI splits these): 875 passed | 2 skipped | 0 failed — a superset run for my own verification, not the CI gating number.Closes #5001
Surface declaration
backend-only, no UI surface — scenario SDK adapter (library internals, alongside the other scenario adapters: Claude Code, realtime, judge, red-team); user proof = the AC-4 live multi-turn run above, NOT a langwatch-app UI change. (This is the rare-valid backend-only case — an SDK adapter consumed by developers writing scenario tests, not app UI.)