Skip to content

feat(agent): runtime operations manager + widget host refresh#7166

Merged
lalalune merged 11 commits into
elizaOS:developfrom
Dexploarer:feat/widget-host-cycle-and-greenup
Apr 29, 2026
Merged

feat(agent): runtime operations manager + widget host refresh#7166
lalalune merged 11 commits into
elizaOS:developfrom
Dexploarer:feat/widget-host-cycle-and-greenup

Conversation

@Dexploarer
Copy link
Copy Markdown
Collaborator

@Dexploarer Dexploarer commented Apr 28, 2026

Summary

Bundles in-flight feature work on feat/widget-host-cycle-and-greenup so it isn't sitting as uncommitted state. Includes:

  • New packages/agent/src/runtime/operations/ module — RuntimeOperationManager as the single-flight gate for provider switches / restarts / reloads, plus classifier (idempotency-key dedupe), health predicates, hot-reload + cold-restart strategies. Tests included.
  • Provider switch route now reads Idempotency-Key, routes through the manager rather than the legacy providerSwitchInProgress boolean.
  • Restart, server, and dev-platform paths refactored against the new manager.
  • Earlier widget-host cycle/chunking + steward static-import fixes (already on the branch).
  • Splash asset + launchpad fixes (already on the branch).

WIP — squash / split as makes sense for review. The runtime-operations module is the largest new surface and the most reviewable as a single unit.

Test plan

  • bun run test — exercises classifier + health unit tests
  • Manual provider switch under load with idempotency key — verify dedupe
  • Hot reload path — verify health gate prevents premature traffic resume

🤖 Generated with Claude Code

Greptile Summary

This PR introduces a RuntimeOperationManager module as the single-flight gate for provider switches and runtime restarts, replacing the previous providerSwitchInProgress boolean with a filesystem-backed operation repository, idempotency-key dedup, tier-classified reload strategies (hot/warm/cold), and health-gated promotion. Three issues need attention before merging:

  • Config is persisted to disk before the manager's rejection check — a 409 rejected-busy response leaves the on-disk config mutated with the new provider while the running runtime is unchanged.
  • The warm tier strategy is never registered in server.ts, so same-family provider switches (classified as warm by the classifier) always fail internally with \"no-strategy-for-tier\".
  • The raw API key is serialized into the persisted ProviderSwitchIntent JSON on disk.

Confidence Score: 3/5

Not safe to merge — config-mutation-before-rejection and the missing warm strategy are current defects on the provider switch hot path.

Two P1 behavioral bugs (config written before rejection check, warm strategy unregistered causing silent failures) plus one P1 security issue (API key on disk) pull the score below the P1 ceiling of 4. Multiple P1s in core paths warrant a 3.

packages/agent/src/api/provider-switch-routes.ts and packages/agent/src/api/server.ts

Security Review

  • Plaintext credential storage (packages/agent/src/api/provider-switch-routes.ts, packages/agent/src/runtime/operations/repository.ts): The ProviderSwitchIntent includes the raw apiKey and is serialized in full to <stateDir>/runtime-operations/<id>.json. Although the file is created with mode 0600, the API key is readable by any process running as the same user and persists for up to 24 hours. API keys should be stripped from the persisted record.

Important Files Changed

Filename Overview
packages/agent/src/runtime/operations/manager.ts New DefaultRuntimeOperationManager: single-flight gate with idempotency-key dedup, async execution chain, and health-gated promotion — logic is sound but warm-tier strategy gap causes silent failures for same-family switches
packages/agent/src/runtime/operations/types.ts Well-typed contracts for operations, phases, repository, health checks, and strategies; clean discriminated union intent model
packages/agent/src/runtime/operations/classifier.ts Pure tier classifier: returns "warm" for same-family switches but no warm strategy is wired in server.ts, making those operations always fail
packages/agent/src/runtime/operations/repository.ts Filesystem-backed repo with atomic writes and in-memory O(1) cache; abandoned-op reaping on hydrate is solid; file mode 0600 is appropriate but intent JSON (including API keys) still lands on disk in cleartext
packages/agent/src/api/provider-switch-routes.ts Route correctly routes through the new manager, but saves config to disk before checking the manager outcome (config mutated even on 409) and embeds the raw API key in the persisted intent
packages/agent/src/api/server.ts Manager wiring looks correct; warm strategy is missing from the strategies map causing all same-family provider switches to fail with "no-strategy-for-tier"
packages/agent/src/runtime/operations/cold-strategy.ts Cold restart delegates correctly to the injected restartRuntime closure, but double-appends "shutdown-old" phase producing a duplicate entry in the log
packages/agent/src/runtime/operations/reload-hot.ts Hot strategy correctly applies env vars and best-effort notifies plugins; defaultApplyProviderEnv double-writes config since the route also writes before submitting the operation
packages/agent/src/runtime/operations/health.ts HealthChecker with parallel execution, per-check timeouts via Promise.race, and clean required/optional semantics — well implemented
packages/agent/src/runtime/operations/index.ts Clean barrel export for the operations module

Sequence Diagram

sequenceDiagram
    participant C as Client
    participant R as ProviderSwitchRoute
    participant M as RuntimeOperationManager
    participant Repo as FilesystemRepository
    participant S as ReloadStrategy (hot/cold)
    participant H as HealthChecker

    C->>R: POST /api/provider/switch
    R->>R: saveElizaConfig ⚠️ written before rejection check
    R->>M: start({intent, idempotencyKey})
    M->>Repo: findByIdempotencyKey(key)
    alt key exists
        Repo-->>M: existing op
        M-->>R: deduped
        R-->>C: 200
    end
    M->>Repo: findActive()
    alt op in flight
        Repo-->>M: active op
        M-->>R: rejected-busy ⚠️ config already written
        R-->>C: 409
    end
    M->>Repo: create(op)
    M-->>R: accepted
    R-->>C: 202 + operationId
    Note over M: async execution chain
    M->>S: apply(ctx)
    S-->>M: newRuntime
    M->>H: runForRuntime(newRuntime)
    H-->>M: HealthCheckReport
    alt ok
        M->>Repo: update succeeded
    else failed
        M->>Repo: update failed
    end
Loading

Fix All in Claude Code Fix All in Codex Fix All in Cursor

Reviews (1): Last reviewed commit: "feat(agent): runtime operations manager ..." | Re-trigger Greptile

Greptile also left 4 inline comments on this PR.

Dexploarer and others added 11 commits April 26, 2026 18:14
agent/api/index.ts
- Re-export `matchPluginRoutePath` and `tryHandleRuntimePluginRoute`
  from `./runtime-plugin-routes.js`. Plugin authors and their tests
  (apps/app-vincent/src/vincent-plugin-dispatch.test.ts is the most
  visible) reach for the matcher via `@elizaos/agent` — without the
  re-export the tests fail with `TypeError: matchPluginRoutePath is
  not a function`.

steward-sidecar.ts + steward-sidecar/process-management.ts +
steward-sidecar/wallet-setup.ts
- Replace `await import("node:fs|node:path|node:child_process")`
  with static namespace imports (`import * as fs from "node:fs"`).
  These files only run inside the bun process — they manage the
  Steward API child process and were never meant to load in the
  renderer. The dynamic loading wasn't preventing browser bundling
  (other steward modules already use static node:*) and was the
  source of the Vite "dynamically imported but also statically
  imported" warnings for `node:fs`, `node:path`, and
  `node:child_process` in the renderer build.

Net build: 0 warnings (was 11+ across circular-chunk, empty-chunk,
and dynamic↔static collision categories).
Rollup warning: `widgets/index.ts` re-exports `WidgetHost` from
`widgets/WidgetHost.tsx`, but the WidgetHost module also pulls other
widgets/* code that depends back through the barrel. When two consumers
end up in different chunks (e.g. the lazy AutomationsView chunk vs the
main shell), the cycle blocks proper splitting:

  Export "WidgetHost" of module ".../widgets/WidgetHost.tsx" was
  reexported through module ".../widgets/index.ts" while both modules
  are dependencies of each other and will end up in different chunks
  by current Rollup settings. This scenario is not well supported at
  the moment as it will produce a circular dependency between chunks
  and will likely lead to broken execution order.

Fix: switch the four non-widget callers (AutomationsView, HeartbeatsView,
CharacterHubView, TasksEventsPanel) from barrel imports to a direct
sub-path import of `widgets/WidgetHost`. The barrel itself stays for
non-cyclic exports (resolveWidgetsForSlot, declarations registry, etc.).

Build: still 0 warnings.
…ycle-and-greenup

# Conflicts:
#	packages/app-core/src/components/pages/AppDetailsView.tsx
#	packages/app-core/src/components/pages/AppsView.tsx
Snapshot of the in-flight work before pulling origin/develop. Covers:
- electrobun-webview tab kit (cursor overlay, realistic events, wallet shims)
- LAUNCHPAD_LAUNCH action + four.meme/flap.sh profile engine
- Solana tx signing through wallet-browser-compat-routes
- cloud balance string|number coercion across three readers
- dev-ui.mjs API supervisor restart-on-clean-exit

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
flap.sh launches tokens via Portal.newTokenV6(NewTokenV6Params) on BNB
Chain (mainnet chain 56, testnet chain 97). Per
https://docs.flap.sh/flap/developers/token-launcher-developers — the
website UI wraps that contract call; the resulting eth_sendTransaction
flows through our existing browser-wallet bridge and steward approval
path. No Solana plumbing was needed.

Renames flap-sh:devnet -> flap-sh:testnet across the action / tests /
profile, and adds Portal contract addresses + docs reference to the
profile header so future selector tuning has the on-chain context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… health, hot reload

New `packages/agent/src/runtime/operations/` module:
- `manager.ts` — RuntimeOperationManager, the single-flight gate for
  provider switches, restarts, and reloads. Replaces the ad-hoc
  `providerSwitchInProgress` boolean.
- `repository.ts` — operation-state store, surfaces pending/active/done.
- `classifier.ts` + `classifier.test.ts` — decides whether an inbound
  request is a duplicate of an in-flight op via the idempotency key.
- `health.ts` + `health.test.ts` + `health-checks.ts` — runtime health
  predicates used by the reload/cold strategies.
- `cold-strategy.ts`, `reload-hot.ts` — strategy implementations for
  full restart vs hot reload.
- `index.ts` + `types.ts` — module barrel and shared types.

Wires the manager through:
- `api/provider-switch-routes.ts` — reads Idempotency-Key header,
  routes through the manager rather than the legacy boolean gate.
- `api/server.ts`, `runtime/restart.ts` — refactored against the new
  manager; the old single-flight scaffolding is gone.
- `app-core/scripts/dev-platform.mjs`, `app-core/src/api/client-base.ts`,
  `client-types-core.ts`, `cli/run-main.ts`, `runtime/error-handlers.ts`,
  `shared/scripts/generate-keywords.mjs` — companion changes the
  manager required (dev-platform integration, error surfacing,
  client-side typing for the new op-status responses).

WIP — preserving in a commit so it doesn't sit as uncommitted state on
the feature branch. Squash / split as needed in the PR.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e9179051-18f8-46fe-8f26-464c00b1f2c6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines 136 to +196
await applyOnboardingConnectionConfig(config, connection);
ctx.saveElizaConfig(config);

const restartReason = `provider switch to ${normalizedProvider}`;
const restarted = ctx.restartRuntime
? await ctx.restartRuntime(restartReason)
: false;
if (!restarted) {
ctx.scheduleRuntimeRestart(restartReason);
const intent: ProviderSwitchIntent = {
kind: "provider-switch",
provider: normalizedProvider,
apiKey: trimmedApiKey,
primaryModel:
typeof body.primaryModel === "string"
? body.primaryModel.trim()
: undefined,
};
const idempotencyKey = readIdempotencyKey(req.headers);

const outcome = await ctx.runtimeOperationManager.start({
intent,
idempotencyKey,
});

if (outcome.kind === "accepted") {
logger.info(
`[api] Provider switch accepted: provider=${normalizedProvider} op=${outcome.operation.id}`,
);
json(
res,
{
success: true,
provider: normalizedProvider,
restarting: true,
operationId: outcome.operation.id,
},
202,
);
return true;
}

ctx.setProviderSwitchInProgress(false);
if (outcome.kind === "deduped") {
const op = outcome.operation;
logger.info(
`[api] Provider switch deduped: provider=${normalizedProvider} op=${op.id} status=${op.status}`,
);
json(res, {
success: true,
provider: normalizedProvider,
restarting: op.status === "running" || op.status === "pending",
operationId: op.id,
deduped: true,
});
return true;
}

json(res, {
success: true,
provider: normalizedProvider,
restarting: restarted,
});
// outcome.kind === "rejected-busy"
json(
res,
{
error: "Provider switch already in progress",
activeOperationId: outcome.activeOperationId,
},
409,
);
return true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Config written to disk before rejection is checked

applyOnboardingConnectionConfig and saveElizaConfig run at lines 136–137 before the operation manager's outcome is evaluated. When the manager returns "rejected-busy", the route correctly returns 409, but the config file has already been mutated on disk with the new provider settings. The system is now in a split state: the config reflects provider B while the running runtime still uses provider A (and no restart will happen). On the next cold restart the new provider config will be loaded, potentially breaking the running agent.

The config mutation should only be persisted once the manager has accepted the request, or the hot-strategy's defaultApplyProviderEnv should be the sole writer (it's called by the strategy itself on the async execution path).

Fix in Claude Code Fix in Codex Fix in Cursor

Comment on lines +1098 to 1102
return cachedRuntimeOperationManager;
}

// PluginConfigMutationRejection, resolvePluginConfigMutationRejections,
// WalletExportRejection, resolveWalletExportRejection
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 warm tier has no registered strategy — same-family provider switches always fail

strategies: { cold: coldStrategy, hot: hotStrategy } omits the warm strategy. The classifier in classifier.ts returns "warm" for same-family provider switches (e.g. openaiopenai-subscription). In manager.ts lines 182–188, a missing strategy calls failOperation with "no-strategy-for-tier". Any user switching between providers in the same family will receive a silent internal failure with no clear error surfaced.

Either register a warm strategy here (falling back to the hot strategy is a safe interim choice) or change the classifier to collapse warm to cold until the warm strategy exists.

strategies: { cold: coldStrategy, hot: hotStrategy, warm: hotStrategy },

Fix in Claude Code Fix in Codex Fix in Cursor

Comment on lines +139 to +147
const intent: ProviderSwitchIntent = {
kind: "provider-switch",
provider: normalizedProvider,
apiKey: trimmedApiKey,
primaryModel:
typeof body.primaryModel === "string"
? body.primaryModel.trim()
: undefined,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security API key written to disk in plaintext via intent serialization

The ProviderSwitchIntent built here includes the raw apiKey value, and the FilesystemRuntimeOperationRepository serializes the full RuntimeOperation (including intent) to <stateDir>/runtime-operations/<id>.json (mode 0600, but still a plain JSON file). Every provider switch stores the user's API key on the filesystem in cleartext for up to the 24-hour retention window.

The API key should be redacted from the persisted intent. One approach is to store only a boolean flag apiKeyProvided: true in the persisted intent while keeping the real key in-memory only for the duration of the operation.

Fix in Claude Code Fix in Codex Fix in Cursor

Comment on lines +29 to +42
const shutdownStart = Date.now();
await ctx.reportPhase({
name: "shutdown-old",
status: "running",
startedAt: shutdownStart,
});

const startNewStart = Date.now();
await ctx.reportPhase({
name: "shutdown-old",
status: "succeeded",
startedAt: shutdownStart,
finishedAt: startNewStart,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 "shutdown-old" phase is appended twice instead of updated once

ctx.reportPhase maps to repository.appendPhase, which always adds a new entry to the phases array. Calling it first with status: "running" and then immediately with status: "succeeded" produces two separate "shutdown-old" entries in the log rather than one entry whose status transitions. No actual shutdown work occurs between the two calls — restartRuntime is invoked in the "start-new" phase below.

The manager's health-check code uses appendPhase (for "running") + updateLastPhase (for the terminal state). The cold strategy should follow the same pattern, or both phases should be a single append recording the final status since shutdown is effectively instantaneous here.

Fix in Claude Code Fix in Codex Fix in Cursor

@lalalune lalalune merged commit eb9528c into elizaOS:develop Apr 29, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants