feat(agent): runtime operations manager + widget host refresh#7166
Conversation
agent/api/index.ts
- Re-export `matchPluginRoutePath` and `tryHandleRuntimePluginRoute`
from `./runtime-plugin-routes.js`. Plugin authors and their tests
(apps/app-vincent/src/vincent-plugin-dispatch.test.ts is the most
visible) reach for the matcher via `@elizaos/agent` — without the
re-export the tests fail with `TypeError: matchPluginRoutePath is
not a function`.
steward-sidecar.ts + steward-sidecar/process-management.ts +
steward-sidecar/wallet-setup.ts
- Replace `await import("node:fs|node:path|node:child_process")`
with static namespace imports (`import * as fs from "node:fs"`).
These files only run inside the bun process — they manage the
Steward API child process and were never meant to load in the
renderer. The dynamic loading wasn't preventing browser bundling
(other steward modules already use static node:*) and was the
source of the Vite "dynamically imported but also statically
imported" warnings for `node:fs`, `node:path`, and
`node:child_process` in the renderer build.
Net build: 0 warnings (was 11+ across circular-chunk, empty-chunk,
and dynamic↔static collision categories).
Rollup warning: `widgets/index.ts` re-exports `WidgetHost` from `widgets/WidgetHost.tsx`, but the WidgetHost module also pulls other widgets/* code that depends back through the barrel. When two consumers end up in different chunks (e.g. the lazy AutomationsView chunk vs the main shell), the cycle blocks proper splitting: Export "WidgetHost" of module ".../widgets/WidgetHost.tsx" was reexported through module ".../widgets/index.ts" while both modules are dependencies of each other and will end up in different chunks by current Rollup settings. This scenario is not well supported at the moment as it will produce a circular dependency between chunks and will likely lead to broken execution order. Fix: switch the four non-widget callers (AutomationsView, HeartbeatsView, CharacterHubView, TasksEventsPanel) from barrel imports to a direct sub-path import of `widgets/WidgetHost`. The barrel itself stays for non-cyclic exports (resolveWidgetsForSlot, declarations registry, etc.). Build: still 0 warnings.
…ycle-and-greenup # Conflicts: # packages/app-core/src/components/pages/AppDetailsView.tsx # packages/app-core/src/components/pages/AppsView.tsx
Snapshot of the in-flight work before pulling origin/develop. Covers: - electrobun-webview tab kit (cursor overlay, realistic events, wallet shims) - LAUNCHPAD_LAUNCH action + four.meme/flap.sh profile engine - Solana tx signing through wallet-browser-compat-routes - cloud balance string|number coercion across three readers - dev-ui.mjs API supervisor restart-on-clean-exit Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…up' into feat/widget-host-cycle-and-greenup
flap.sh launches tokens via Portal.newTokenV6(NewTokenV6Params) on BNB Chain (mainnet chain 56, testnet chain 97). Per https://docs.flap.sh/flap/developers/token-launcher-developers — the website UI wraps that contract call; the resulting eth_sendTransaction flows through our existing browser-wallet bridge and steward approval path. No Solana plumbing was needed. Renames flap-sh:devnet -> flap-sh:testnet across the action / tests / profile, and adds Portal contract addresses + docs reference to the profile header so future selector tuning has the on-chain context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… health, hot reload New `packages/agent/src/runtime/operations/` module: - `manager.ts` — RuntimeOperationManager, the single-flight gate for provider switches, restarts, and reloads. Replaces the ad-hoc `providerSwitchInProgress` boolean. - `repository.ts` — operation-state store, surfaces pending/active/done. - `classifier.ts` + `classifier.test.ts` — decides whether an inbound request is a duplicate of an in-flight op via the idempotency key. - `health.ts` + `health.test.ts` + `health-checks.ts` — runtime health predicates used by the reload/cold strategies. - `cold-strategy.ts`, `reload-hot.ts` — strategy implementations for full restart vs hot reload. - `index.ts` + `types.ts` — module barrel and shared types. Wires the manager through: - `api/provider-switch-routes.ts` — reads Idempotency-Key header, routes through the manager rather than the legacy boolean gate. - `api/server.ts`, `runtime/restart.ts` — refactored against the new manager; the old single-flight scaffolding is gone. - `app-core/scripts/dev-platform.mjs`, `app-core/src/api/client-base.ts`, `client-types-core.ts`, `cli/run-main.ts`, `runtime/error-handlers.ts`, `shared/scripts/generate-keywords.mjs` — companion changes the manager required (dev-platform integration, error surfacing, client-side typing for the new op-status responses). WIP — preserving in a commit so it doesn't sit as uncommitted state on the feature branch. Squash / split as needed in the PR.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| await applyOnboardingConnectionConfig(config, connection); | ||
| ctx.saveElizaConfig(config); | ||
|
|
||
| const restartReason = `provider switch to ${normalizedProvider}`; | ||
| const restarted = ctx.restartRuntime | ||
| ? await ctx.restartRuntime(restartReason) | ||
| : false; | ||
| if (!restarted) { | ||
| ctx.scheduleRuntimeRestart(restartReason); | ||
| const intent: ProviderSwitchIntent = { | ||
| kind: "provider-switch", | ||
| provider: normalizedProvider, | ||
| apiKey: trimmedApiKey, | ||
| primaryModel: | ||
| typeof body.primaryModel === "string" | ||
| ? body.primaryModel.trim() | ||
| : undefined, | ||
| }; | ||
| const idempotencyKey = readIdempotencyKey(req.headers); | ||
|
|
||
| const outcome = await ctx.runtimeOperationManager.start({ | ||
| intent, | ||
| idempotencyKey, | ||
| }); | ||
|
|
||
| if (outcome.kind === "accepted") { | ||
| logger.info( | ||
| `[api] Provider switch accepted: provider=${normalizedProvider} op=${outcome.operation.id}`, | ||
| ); | ||
| json( | ||
| res, | ||
| { | ||
| success: true, | ||
| provider: normalizedProvider, | ||
| restarting: true, | ||
| operationId: outcome.operation.id, | ||
| }, | ||
| 202, | ||
| ); | ||
| return true; | ||
| } | ||
|
|
||
| ctx.setProviderSwitchInProgress(false); | ||
| if (outcome.kind === "deduped") { | ||
| const op = outcome.operation; | ||
| logger.info( | ||
| `[api] Provider switch deduped: provider=${normalizedProvider} op=${op.id} status=${op.status}`, | ||
| ); | ||
| json(res, { | ||
| success: true, | ||
| provider: normalizedProvider, | ||
| restarting: op.status === "running" || op.status === "pending", | ||
| operationId: op.id, | ||
| deduped: true, | ||
| }); | ||
| return true; | ||
| } | ||
|
|
||
| json(res, { | ||
| success: true, | ||
| provider: normalizedProvider, | ||
| restarting: restarted, | ||
| }); | ||
| // outcome.kind === "rejected-busy" | ||
| json( | ||
| res, | ||
| { | ||
| error: "Provider switch already in progress", | ||
| activeOperationId: outcome.activeOperationId, | ||
| }, | ||
| 409, | ||
| ); | ||
| return true; |
There was a problem hiding this comment.
Config written to disk before rejection is checked
applyOnboardingConnectionConfig and saveElizaConfig run at lines 136–137 before the operation manager's outcome is evaluated. When the manager returns "rejected-busy", the route correctly returns 409, but the config file has already been mutated on disk with the new provider settings. The system is now in a split state: the config reflects provider B while the running runtime still uses provider A (and no restart will happen). On the next cold restart the new provider config will be loaded, potentially breaking the running agent.
The config mutation should only be persisted once the manager has accepted the request, or the hot-strategy's defaultApplyProviderEnv should be the sole writer (it's called by the strategy itself on the async execution path).
| return cachedRuntimeOperationManager; | ||
| } | ||
|
|
||
| // PluginConfigMutationRejection, resolvePluginConfigMutationRejections, | ||
| // WalletExportRejection, resolveWalletExportRejection |
There was a problem hiding this comment.
warm tier has no registered strategy — same-family provider switches always fail
strategies: { cold: coldStrategy, hot: hotStrategy } omits the warm strategy. The classifier in classifier.ts returns "warm" for same-family provider switches (e.g. openai ↔ openai-subscription). In manager.ts lines 182–188, a missing strategy calls failOperation with "no-strategy-for-tier". Any user switching between providers in the same family will receive a silent internal failure with no clear error surfaced.
Either register a warm strategy here (falling back to the hot strategy is a safe interim choice) or change the classifier to collapse warm to cold until the warm strategy exists.
strategies: { cold: coldStrategy, hot: hotStrategy, warm: hotStrategy },| const intent: ProviderSwitchIntent = { | ||
| kind: "provider-switch", | ||
| provider: normalizedProvider, | ||
| apiKey: trimmedApiKey, | ||
| primaryModel: | ||
| typeof body.primaryModel === "string" | ||
| ? body.primaryModel.trim() | ||
| : undefined, | ||
| }; |
There was a problem hiding this comment.
API key written to disk in plaintext via intent serialization
The ProviderSwitchIntent built here includes the raw apiKey value, and the FilesystemRuntimeOperationRepository serializes the full RuntimeOperation (including intent) to <stateDir>/runtime-operations/<id>.json (mode 0600, but still a plain JSON file). Every provider switch stores the user's API key on the filesystem in cleartext for up to the 24-hour retention window.
The API key should be redacted from the persisted intent. One approach is to store only a boolean flag apiKeyProvided: true in the persisted intent while keeping the real key in-memory only for the duration of the operation.
| const shutdownStart = Date.now(); | ||
| await ctx.reportPhase({ | ||
| name: "shutdown-old", | ||
| status: "running", | ||
| startedAt: shutdownStart, | ||
| }); | ||
|
|
||
| const startNewStart = Date.now(); | ||
| await ctx.reportPhase({ | ||
| name: "shutdown-old", | ||
| status: "succeeded", | ||
| startedAt: shutdownStart, | ||
| finishedAt: startNewStart, | ||
| }); |
There was a problem hiding this comment.
"shutdown-old" phase is appended twice instead of updated once
ctx.reportPhase maps to repository.appendPhase, which always adds a new entry to the phases array. Calling it first with status: "running" and then immediately with status: "succeeded" produces two separate "shutdown-old" entries in the log rather than one entry whose status transitions. No actual shutdown work occurs between the two calls — restartRuntime is invoked in the "start-new" phase below.
The manager's health-check code uses appendPhase (for "running") + updateLastPhase (for the terminal state). The cold strategy should follow the same pattern, or both phases should be a single append recording the final status since shutdown is effectively instantaneous here.
Summary
Bundles in-flight feature work on
feat/widget-host-cycle-and-greenupso it isn't sitting as uncommitted state. Includes:packages/agent/src/runtime/operations/module —RuntimeOperationManageras the single-flight gate for provider switches / restarts / reloads, plus classifier (idempotency-key dedupe), health predicates, hot-reload + cold-restart strategies. Tests included.Idempotency-Key, routes through the manager rather than the legacyproviderSwitchInProgressboolean.WIP — squash / split as makes sense for review. The runtime-operations module is the largest new surface and the most reviewable as a single unit.
Test plan
bun run test— exercises classifier + health unit tests🤖 Generated with Claude Code
Greptile Summary
This PR introduces a
RuntimeOperationManagermodule as the single-flight gate for provider switches and runtime restarts, replacing the previousproviderSwitchInProgressboolean with a filesystem-backed operation repository, idempotency-key dedup, tier-classified reload strategies (hot/warm/cold), and health-gated promotion. Three issues need attention before merging:409 rejected-busyresponse leaves the on-disk config mutated with the new provider while the running runtime is unchanged.warmtier strategy is never registered inserver.ts, so same-family provider switches (classified aswarmby the classifier) always fail internally with\"no-strategy-for-tier\".ProviderSwitchIntentJSON on disk.Confidence Score: 3/5
Not safe to merge — config-mutation-before-rejection and the missing warm strategy are current defects on the provider switch hot path.
Two P1 behavioral bugs (config written before rejection check, warm strategy unregistered causing silent failures) plus one P1 security issue (API key on disk) pull the score below the P1 ceiling of 4. Multiple P1s in core paths warrant a 3.
packages/agent/src/api/provider-switch-routes.ts and packages/agent/src/api/server.ts
Security Review
packages/agent/src/api/provider-switch-routes.ts,packages/agent/src/runtime/operations/repository.ts): TheProviderSwitchIntentincludes the rawapiKeyand is serialized in full to<stateDir>/runtime-operations/<id>.json. Although the file is created with mode0600, the API key is readable by any process running as the same user and persists for up to 24 hours. API keys should be stripped from the persisted record.Important Files Changed
Sequence Diagram
sequenceDiagram participant C as Client participant R as ProviderSwitchRoute participant M as RuntimeOperationManager participant Repo as FilesystemRepository participant S as ReloadStrategy (hot/cold) participant H as HealthChecker C->>R: POST /api/provider/switch R->>R: saveElizaConfig ⚠️ written before rejection check R->>M: start({intent, idempotencyKey}) M->>Repo: findByIdempotencyKey(key) alt key exists Repo-->>M: existing op M-->>R: deduped R-->>C: 200 end M->>Repo: findActive() alt op in flight Repo-->>M: active op M-->>R: rejected-busy ⚠️ config already written R-->>C: 409 end M->>Repo: create(op) M-->>R: accepted R-->>C: 202 + operationId Note over M: async execution chain M->>S: apply(ctx) S-->>M: newRuntime M->>H: runForRuntime(newRuntime) H-->>M: HealthCheckReport alt ok M->>Repo: update succeeded else failed M->>Repo: update failed endReviews (1): Last reviewed commit: "feat(agent): runtime operations manager ..." | Re-trigger Greptile