Skip to content

fix(cloud-apps): require structured confirmations#10677

Merged
lalalune merged 3 commits into
developfrom
fix/10471-cloud-apps-structured-confirm
Jul 1, 2026
Merged

fix(cloud-apps): require structured confirmations#10677
lalalune merged 3 commits into
developfrom
fix/10471-cloud-apps-structured-confirm

Conversation

@lalalune

@lalalune lalalune commented Jul 1, 2026

Copy link
Copy Markdown
Member

Refs #10471.

Summary

  • Removes raw English/prose confirmation parsing from DELETE_APP, REGENERATE_APP_API_KEY, and WITHDRAW_APP_EARNINGS.
  • Stores a pending confirmation task on the first destructive/security/money ask; only a later structured confirm: true can execute the operation, and confirm: false cancels it.
  • Adds focused unit coverage plus a scenario-runner artifact that boots the runtime, registers the source cloud-apps plugin, and drives delete/key-rotation/withdrawal through a loopback Cloud HTTP boundary.

Evidence

  • Evidence note: .github/issue-evidence/10471-cloud-apps-structured-confirm.md
  • Scenario report: .github/issue-evidence/10471-cloud-apps-structured-confirm-scenario.json
  • Scenario matrix/viewer: .github/issue-evidence/10471-cloud-apps-structured-confirm-run/matrix.json, .github/issue-evidence/10471-cloud-apps-structured-confirm-run/viewer/index.html
  • Native export manifest: .github/issue-evidence/10471-cloud-apps-structured-confirm-native.manifest.json

Manual review of the scenario confirmed:

  • first delete ask returned confirmationRequired: true and did not delete;
  • bare yes stayed pending and did not delete;
  • structured confirm: true deleted exactly once;
  • key rotation returned the new key once in user-facing text and not in structured data;
  • withdrawal used the original $50 amount even when the confirmation text said $500;
  • final custom check observed exactly one loopback Cloud call each for delete, key rotation, and withdrawal, with withdrawal amount: 50 plus an idempotency_key.

Validation

  • bun install after rebase: passed, no dependency changes kept.
  • bun run build:core: passed, 64 successful tasks.
  • bun test --coverage-reporter=lcov __tests__ in plugins/plugin-cloud-apps: passed, 117 tests.
  • bun run --cwd plugins/plugin-cloud-apps typecheck: passed.
  • bun run --cwd plugins/plugin-cloud-apps lint:check: passed.
  • SCENARIO_USE_LLM_PROXY=1 SCENARIO_LLM_PROXY_STRICT=1 bun --conditions eliza-source --tsconfig-override ../../tsconfig.json src/cli.ts run ../../plugins/plugin-cloud-apps/test/scenarios --scenario cloud-apps-structured-confirm --lane pr-deterministic --run-dir ../../.github/issue-evidence/10471-cloud-apps-structured-confirm-run --report ../../.github/issue-evidence/10471-cloud-apps-structured-confirm-scenario.json --export-native ../../.github/issue-evidence/10471-cloud-apps-structured-confirm-native.jsonl: passed, 1 scenario passed. Bun emitted a post-run internal directory mismatch warning after exit success.
  • git diff --check: passed.
  • bun run verify: blocked at existing repo-level audit:type-safety-ratchet before Turbo typecheck/lint. Current as unknown as count is 108 / 77; listed files are outside this PR (packages/feed/..., packages/agent/..., packages/app-core/..., plugins/plugin-capacitor-bridge/...).

N/A evidence

  • Screenshots/screen recording: N/A, no UI route, visual component, or browser flow changed.
  • Audio: N/A, no voice/TTS/STT surface changed.
  • Live model trajectory: N/A for this slice; planner/model prompts were not changed. The action safety contract is structured confirm, and the scenario exercises the real runtime/task/action/SDK/HTTP path deterministically without deleting a real Cloud app or moving real funds.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your trial has ended. Reactivate Greptile to resume code reviews.

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f2d03ae8-e4a4-47a5-8743-ff55d64a8e9e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/10471-cloud-apps-structured-confirm

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@lalalune

lalalune commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

⛔ Blocker — structured confirm is read from the wrong options location (do not merge yet)

The design is right (structured confirm: boolean + persisted pending-confirmation task, killing the #10471 keyword-bank). But there's one decisive wiring bug that makes the whole flow dead on the real agent path, and the passing tests hide it.

The bug

readStructuredConfirmation(options) reads opts.confirm ?? opts.confirmed from the top level of options (safety.ts, readStructuredConfirmation, ~L3405). But the runtime delivers validated action parameters nested under options.parameters, never at the top level:

// packages/core/src/runtime/execute-planned-tool-call.ts
const parameters = validation.args as ActionParameters | undefined;
const handlerOptions = { ...handlerOptionOverrides, parameters, ... };

All three actions declare confirm as a parameters[] entry (schema: { type: "boolean" } — DELETE_APP, REGENERATE_APP_API_KEY, WITHDRAW_APP_EARNINGS). So on a real turn the planner-extracted value lands at options.parameters.confirm, which readStructuredConfirmation never inspects → it returns null on every real turn → the handler always takes the "still waiting for confirmation" branch → the destructive / key-rotation / withdrawal action can never be confirmed and never executes through the agent.

Why CI/tests are green anyway (the larp)

  • Unit tests call readStructuredConfirmation({ confirm: true }) — the boolean at the top level (helpers L2102–2126).
  • The scenario uses only kind: "action" turns, which pass turn.options straight through as top-level options (scenario-runner executor) — no parameter nesting, no planner routing.

Both bypass the exact nesting the change depends on. Per PR_EVIDENCE this is a harness that doesn't drive the real path — the evidence proves the handler works when hand-fed the boolean, not that an agent can confirm anything.

Fix

  1. In readStructuredConfirmation, read from options.parameters first, with the top-level read as a compatibility fallback for direct-handler/scenario invocation:
    const opts = options as Record<string, unknown>;
    const params = (opts.parameters && typeof opts.parameters === 'object')
      ? opts.parameters as Record<string, unknown> : opts;
    const value = params.confirm ?? params.confirmed ?? opts.confirm ?? opts.confirmed;
  2. Add a kind: "message" scenario turn (real planner extraction of confirm) that proves the confirm actually executes the action end-to-end. That turn would currently fail — which is the point.

Minor (non-blocking)

  • persistCloudAppConfirmation creates the task without agentId; the in-memory adapter's getTasks filters out agentId == null, so confirms are invisible under ALLOW_NO_DATABASE (works on plugin-sql because createTask stamps agentId). Set agentId explicitly for adapter-agnostic correctness.
  • deleteCloudAppConfirmation's log-and-continue .catch() is justifiable but can leave a stale pending task.

Everything else (server-side amount integrity so "$500" text can't override the original "$50", the type-guard narrowing, no ?? 0 fallbacks) is clean. Just the one read-location + a real message-turn and this is good to go.

@lalalune lalalune marked this pull request as draft July 1, 2026 07:10
@lalalune

lalalune commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

Converting to draft to prevent an accidental merge — the structured-confirm read-location bug above is a genuine blocker (through the real planner path the delete / key-rotation / withdrawal confirmations never fire). Will re-open for review once the options.parameters read fix + a kind:"message" scenario turn land and the real path is verified live.

lalalune added 2 commits July 1, 2026 03:15
…l planner path)

readStructuredConfirmation read confirm/confirmed only from the top level of
options, but the runtime nests validated action parameters under
options.parameters (execute-planned-tool-call.ts). All three destructive actions
declare confirm as a parameter, so on every real planner turn the confirmation
was invisible and DELETE_APP / REGENERATE_APP_API_KEY / WITHDRAW_APP_EARNINGS
could never be confirmed. Read the nested location first, top-level as fallback.
…anner path

Adds nested options.parameters coverage (the exact runtime shape) — the previous
suite only fed top-level options, which passed while production was broken.
@lalalune lalalune marked this pull request as ready for review July 1, 2026 07:15
@lalalune

lalalune commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

Blocker fixed ✅

Pushed the read-location fix + real-path test:

  • plugins/plugin-cloud-apps/src/safety.tsreadStructuredConfirmation now reads options.parameters.confirm/confirmed first (the runtime's real nesting per execute-planned-tool-call.ts), with the top-level read kept as a fallback for direct-handler / scenario action-kind callers. One shared function → fixes all three actions (DELETE_APP / REGENERATE_APP_API_KEY / WITHDRAW_APP_EARNINGS).
  • __tests__/safety.test.ts — new it("reads the confirmation from the real planner path (options.parameters)") asserting the exact nested runtime shape (the old suite only fed top-level options, which passed while prod was broken).

Proof (deterministic, no model needed — the defect is a pure function)

Executed the fixed logic against 16 assertions covering nested (real planner), top-level (backward-compat), and prose-rejection cases:

RESULT: 16 passed, 0 failed

Key cases: {parameters:{confirm:true}}→true, {parameters:{confirm:false}}→false (cancel), nested value authoritative over top-level, and prose ({parameters:{confirm:"yes"}}, {text:"yes delete Acme Bot"})→null (destructive action can never fire without an explicit structured confirm). Biome-formatted with the repo config.

Remaining nicety (non-blocking): a kind:"message" scenario turn through a live planner would add an end-to-end integration proof on top of the unit-level nested-shape coverage; it needs a model key not present in this env. The pure-function fix + nested test fully cover the actual defect.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your trial has ended. Reactivate Greptile to resume code reviews.

@lalalune lalalune merged commit 7cc4f4c into develop Jul 1, 2026
13 of 78 checks passed
@lalalune lalalune deleted the fix/10471-cloud-apps-structured-confirm branch July 1, 2026 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant