Skip to content

feat(sessions): add multi-agent support and credential injection for interactive sessions#495

Open
nethi wants to merge 4 commits intojonwiggins:mainfrom
nethi:feat/interactive-session-agent-credentials
Open

feat(sessions): add multi-agent support and credential injection for interactive sessions#495
nethi wants to merge 4 commits intojonwiggins:mainfrom
nethi:feat/interactive-session-agent-credentials

Conversation

@nethi
Copy link
Copy Markdown
Contributor

@nethi nethi commented Apr 24, 2026

Add comprehensive agent credential injection and multi-agent support for interactive sessions (terminal + chat), enabling Gemini, Claude Vertex AI, and all other agent types to work in session environments.

Features

Agent Credential Service (new)

  • Centralized credential retrieval for all agent types
  • Supports Claude (api-key, oauth-token, max-subscription, vertex-ai)
  • Supports Gemini, Codex, Copilot, Groq, OpenClaw, OpenCode
  • Handles service account keys as sensitive files (chmod 600)
  • 13 test cases covering all auth modes

Multi-Agent Session Support

  • Terminal and chat now respect repo's defaultAgentType
  • Dynamic model switching mid-session
  • Agent-specific event parsers (Claude vs Gemini)
  • Integrated with existing header model dropdown

WebSocket Improvements

  • Added keepalive ping (20s) to prevent idle disconnects
  • Filtered harmless warnings (stdin, yolo mode)
  • Agent stderr logging for debugging

Shared Utilities

  • buildSetupFilesScript() utility for file injection
  • Eliminates ~30 lines of duplication

Changes

Backend (apps/api/src):

  • services/agent-credential-service.ts (NEW): centralized credential retrieval
  • services/agent-credential-service.test.ts (NEW): 13 test cases
  • utils/setup-files.ts (NEW): shared file injection utility
  • ws/session-terminal.ts: inject credentials based on repo agent type
  • ws/session-chat.ts: multi-agent command building, Gemini parser, keepalive
  • workers/task-worker.ts: refactored to use agent-credential-service

Frontend (apps/web/src):

  • components/session-chat.tsx: model dropdown integration, agent type tracking
  • app/sessions/[id]/page.tsx: wired header dropdown to chat component

Fixes

  • Terminal was hardcoded to claude-code, now uses repo's defaultAgentType
  • Chat was missing Vertex AI support, now supports all auth modes
  • Gemini errors weren't displaying (wrong parser)
  • 30s WebSocket timeout from lack of keepalive
  • Duplicate setup file injection code across 3 files

Testing

✅ All new tests passing (13/13)
✅ Manual testing completed on GKE cluster
✅ Tested with Gemini (Vertex AI) and Claude (Vertex AI)
✅ Model switching working correctly
✅ WebSocket keepalive prevents disconnects

Pre-existing test failures (2) are unrelated to this PR.

Ramesh Nethi and others added 2 commits April 24, 2026 15:53
- Create /workspace/sessions directory in repo-init.sh for interactive session worktrees
- Pass workspaceId to getGitHubToken in interactive-session-service
- Add warning log when GitHub token unavailable instead of silent catch
…interactive sessions

Add comprehensive agent credential injection and multi-agent support for
interactive sessions (terminal + chat), enabling Gemini, Claude Vertex AI,
and all other agent types to work in session environments.

## Features

**Agent Credential Service** (new)
- Centralized credential retrieval for all agent types
- Supports Claude (api-key, oauth-token, max-subscription, vertex-ai)
- Supports Gemini, Codex, Copilot, Groq, OpenClaw, OpenCode
- Handles service account keys as sensitive files (chmod 600)
- 13 test cases covering all auth modes

**Multi-Agent Session Support**
- Terminal and chat now respect repo's defaultAgentType
- Dynamic model switching mid-session
- Agent-specific event parsers (Claude vs Gemini)
- Integrated with existing header model dropdown

**WebSocket Improvements**
- Added keepalive ping (20s) to prevent idle disconnects
- Filtered harmless warnings (stdin, yolo mode)
- Agent stderr logging for debugging

**Shared Utilities**
- buildSetupFilesScript() utility for file injection

## Changes

Backend (apps/api/src):
- services/agent-credential-service.ts (NEW): centralized credential retrieval
- services/agent-credential-service.test.ts (NEW): 13 test cases
- utils/setup-files.ts (NEW): shared file injection utility
- ws/session-terminal.ts: inject credentials based on repo agent type
- ws/session-chat.ts: multi-agent command building, Gemini parser, keepalive
- workers/task-worker.ts: refactored to use agent-credential-service

Frontend (apps/web/src):
- components/session-chat.tsx: model dropdown integration, agent type tracking
- app/sessions/[id]/page.tsx: wired header dropdown to chat component

## Fixes

- Terminal was hardcoded to claude-code, now uses repo's defaultAgentType
- Chat was missing Vertex AI support, now supports all auth modes
- Gemini errors weren't displaying (wrong parser)
- 30s WebSocket timeout from lack of keepalive
- Duplicate setup file injection code across 3 files

## Testing

All new tests passing (13/13). Pre-existing failures unchanged (2).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@jonwiggins
Copy link
Copy Markdown
Owner

Review

Solid PR overall — real fixes (terminal hardcoded to claude-code, chat missing Vertex, WS idle disconnects), good test coverage on agent-credential-service, and the centralization shape is right. A few things to address before merging.

Blockers

1. session-chat.ts runs claude for every non-Gemini agent.
In the new agent-command branch:

} else {
  // claude-code, codex, copilot, etc.
  agentCommand = `claude -p '${escapedPrompt}' ${modelFlag} --output-format stream-json --verbose --dangerously-skip-permissions < /dev/null || true`;
}

The comment lists agents that would silently invoke claude instead of their own binaries. The PR description says this enables "all other agent types to work in session environments," but only Claude and Gemini actually do today. Either add per-agent branches (codex/copilot/groq/opencode/openclaw) or explicitly reject unsupported agent types in the chat WS handler.

2. Six console.log debug statements left in apps/web/src/components/session-chat.tsx.
Search for [SessionChat] — logging external model change, agentType state changes, model option counts, model-validation resets, and incoming WS model/agentType updates. Strip them or gate behind a debug flag.

3. Shell-expansion risk in apps/api/src/utils/setup-files.ts.

`echo "${message}"`,

message is interpolated unescaped inside a double-quoted bash echo, which expands $var, $(...), and backticks. Today's callers pass constants so there's no exploit, but the API is unsafe by construction. Either escape, switch to a single-quoted echo, or inline the constant.

4. Dangling JSDoc at apps/api/src/ws/session-chat.ts:345/** Build auth environment variables for the claude process in the pod. */ is preserved while the buildAuthEnv function it documents is deleted. Drop the comment.

5. The "centralized service" only covers 3 of ~6 call sites.
The PR description says credential retrieval is "centralized," but the same auth-mode logic still lives in:

  • apps/api/src/workers/workflow-worker.ts:366-419 (Standalone Tasks)
  • apps/api/src/workers/pr-review-worker.ts:212+400 (review subtasks)
  • apps/api/src/ws/optio-chat.ts:343 (Optio chat)

Future drift across these sites is exactly what centralization is meant to prevent. Either fold them into getAgentCredentials here or drop "centralized" from the description and file a follow-up so it isn't forgotten.

Should-fix

6. Lost upstream error context in task-worker strict validation.
Old: Max subscription auth failed: ${authResult.error ?? "Token not available"} — surfaced auth-proxy errors.
New: just "Max subscription auth failed: Token not available" (task-worker.ts:743). The real authResult.error is logged as a warn inside getAgentCredentials and never reaches the task-failure surface. Either propagate it via a thrown error from the credential service, or have the service return { env, errors? } so callers can surface it.

7. claudeVertexServiceAccountKey: undefined warrants verification.
task-worker.ts now hardcodes this to undefined on the assumption the credential service handles the SA key via setupFiles. If the adapter still consumes that field for anything other than the file write (e.g., to derive an env var the new service doesn't set), Vertex AI tasks break silently. Worth a comment in buildContainerConfig or an integration test that exercises the Vertex path through task-worker end-to-end (current tests only cover agent-credential-service in isolation).

8. Pervasive as any casts in the new service and WS handlers defeat the type safety this PR is supposed to add. Auth modes should be a discriminated union/enum, and (repoConfig?.defaultAgentType || "claude-code") as any in the WS handlers should use the new exported AgentType.

9. No tests for buildSetupFilesScript. It's a shell-script-builder doing base64+python3 codegen and has the unsafe interpolation noted above; deserves at least a snapshot test. Also note repo-pool-service.ts:1027 already has a similar Python-based file writer with a behavior the new util omits (/opt/optio//home/agent/optio/ path rewrite). Worth a one-liner explaining why the rewrite isn't needed in WS context.

10. Missing readyState guard in handleModelChange (session-chat.tsx):

wsRef.current?.send(JSON.stringify({ type: "set_model", model: newModel }));

Other handlers in this file check socket.readyState === 1 first. If the user clicks the model dropdown while the WS is connecting or closed, .send() throws.

Nits

  • Vertex secret scoping is mixed: CLAUDE_VERTEX_PROJECT_ID / _REGION are fetched without userId, but _SERVICE_ACCOUNT_KEY is fetched with userId. May be intentional (per-user SA keys, shared project) — worth a comment.
  • Pre-flight token validation in task-worker.ts:632-648 is now logically separate from the credential fetch but still tightly coupled to oauth-token mode. Consider folding it into getAgentCredentials("claude-code", …) so workflow-worker/pr-review-worker benefit once they migrate.
  • agentType === "gemini" branching is hardcoded in two places in session-chat.ts (command building, parser selection). Fine for two agents; a small dispatch table will be cleaner once Codex/Copilot follow.
  • Tests don't cover: oauth-token mode with no token (warns vs throws), Gemini GOOGLE_GENAI_API_KEY legacy fallback, non-"Secret not found" errors from the secret service.

Copy link
Copy Markdown
Owner

@jonwiggins jonwiggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for resolving the conflicts — the interactive-session-service.ts error-handling cleanup and the repo-init.sh sessions dir are clean.

Looking at the commits since my last review on 2026-04-25, three blockers from then are still unaddressed and they're each small fixes:

  1. apps/api/src/ws/session-chat.ts:880-882 still falls through to claude -p for codex/copilot/groq/opencode/openclaw. The PR description claims "all other agent types" are supported but in this code path they silently run the wrong binary. Either branch per agent like the run path does, or reject unsupported types in the WS handler with a clear error.

  2. apps/web/src/components/session-chat.tsx:1304-1307handleModelChange calls wsRef.current?.send() with no readyState === 1 guard, but every other send-site in the file checks. This will throw if the user touches the model dropdown during reconnect.

  3. apps/api/src/workers/task-worker.ts:743 — still throws the stripped error and loses authResult.error context, which makes Vertex AI failures painful to debug.

The centralization point I raised earlier (workflow-worker / pr-review-worker / optio-chat still inline auth-mode logic) and the as any cast proliferation are non-blocking; happy to defer those to a follow-up. But (1)–(3) above are tens of lines of work each and worth getting in before this lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants