Skip to content

fix(playground): propagate AbortSignal through cancel chain#2535

Open
quanru wants to merge 5 commits into
mainfrom
fix/playground-stop-propagation
Open

fix(playground): propagate AbortSignal through cancel chain#2535
quanru wants to merge 5 commits into
mainfrom
fix/playground-stop-propagation

Conversation

@quanru
Copy link
Copy Markdown
Collaborator

@quanru quanru commented May 25, 2026

Summary

After running multiple steps in the Computer playground and clicking Stop, the
mouse/keyboard would keep being driven — because /cancel only ran
agent.destroy() + recreate and never told the in-flight executeAction → callActionInActionSpace → device action to stop. The fix turns the cancel
path into a cooperative abort chain.

Commit order (intentional)

  1. test(computer): add regression coverage ... — adds the e2e test, the
    focused unit test, the headless-linux-cancel CI job, and wires
    @midscene/playground as a devDependency of packages/computer. No fix
    yet
    — this commit is meant to turn the new headless-linux-cancel job
    red on CI, proving the test catches the regression.
  2. test(computer): set dummy model env ... — small follow-up: the test
    never calls a model, but PlaygroundServer validates
    MIDSCENE_MODEL_NAME at startup, so dummy values are needed for the
    headless runner to even reach the abort assertion.
  3. fix(playground): ... (next push) — the actual L1–L5 fix:
    • L1 server: taskAbortControllers map + /cancel calls
      controller.abort() before falling back to destroy+recreate
    • L2 executeAction: forwards abortSignal to all three branches
    • L3 core: callActionInActionSpace, runPlans, and Agent.destroy
      plumb / fire the signal
    • L4 ComputerDevice: destroy() sets destroyed=true first; internal
      sleep loops in smoothMoveMouse / performScroll / typeText / dragAndDrop become abortable
    • L5 visualizer handleStop: flips interruptedFlagRef and
      setLoading(false) synchronously before awaiting the cancel POST

Once the fix lands, headless-linux-cancel should turn green.

Test plan

  • CI: headless-linux-cancel red on commits 1+2 (regression caught)
  • CI: headless-linux-cancel green on commit 3 (fix verified)
  • Local mac: three-phase verification done (baseline pass → comment out
    fix to reproduce → restore fix to verify)
  • pnpm run lint clean before each commit

…tion

Add an end-to-end test that drives PlaygroundServer + ComputerAgent with a
custom SlowAction sleeping 60s while listening to ExecutorContext.abortSignal.
After POST /execute, POST /cancel must propagate the abort within 25s, and the
resulting error string must mention abort/cancel — proving the rejection came
from the cooperative cancel chain rather than the natural timer expiring. The
test is gated to Linux via describe.runIf so local mac runs skip the libnut
healthCheck mouse jiggle.

Also add a focused PlaygroundServer unit test that asserts /cancel invokes
abortController.abort() and the in-flight callActionInActionSpace observes
signal.aborted, plus a documenting test for the buggy shape where executeAction
never forwards abortSignal.

Wire packages/computer to depend on @midscene/playground as a devDependency
and add a headless-linux-cancel job to .github/workflows/headless-linux.yml
that runs the new e2e suite under Xvfb. The previously dropped
chrome-extension: job header (lost during an earlier insert) is restored so
all 7 jobs are parsed.

This commit intentionally lands the tests without the fix so the new CI job
demonstrates the regression. The fix follows in the next commit.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 25, 2026

Deploying midscene with  Cloudflare Pages  Cloudflare Pages

Latest commit: 4ea2767
Status: ✅  Deploy successful!
Preview URL: https://0b2e7204.midscene.pages.dev
Branch Preview URL: https://fix-playground-stop-propagat.midscene.pages.dev

View logs

quanru added 4 commits May 25, 2026 15:20
The cancel regression test never invokes a model — SlowAction just listens
to ExecutorContext.abortSignal. But PlaygroundServer's lazy
ModelConfigManager.initialize() throws when MIDSCENE_MODEL_NAME is unset,
which is the default on the headless-linux CI runner. The bare /execute
call was therefore failing in 33ms with "Model configuration is incomplete"
instead of exercising the SlowAction path, masking the real abort assertion.

Set dummy MIDSCENE_MODEL_NAME / BASE_URL / API_KEY at module load time so
the test runs in any environment without bleeding fake config into other
processes. Locally the existing shell values still take precedence.
PlaygroundServer dispatches /execute through a code path that calls
throwErrorIfNonVLModel before invoking the customAction. With MIDSCENE_MODEL_FAMILY
unset the test was failing in 61ms with "MIDSCENE_MODEL_FAMILY is not set to
a visual language model" instead of reaching the SlowAction abort assertion.
Set a valid dummy VL family ('qwen3-vl') so the check passes without bleeding
real model behavior into the test (SlowAction never calls a model).
Hitting Stop in the playground after a long action used to keep driving the
mouse/keyboard for the rest of the natural action timer. Root cause: /cancel
only ran agent.destroy() + recreate, never telling the in-flight
executeAction → callActionInActionSpace → device action to abort. Turn the
whole path into a cooperative AbortController chain.

L1 packages/playground/src/server.ts
  - Hold a Map<requestId, AbortController> and create one per /execute.
  - /cancel now fires controller.abort() before falling back to
    destroy+recreate, so in-flight LLM streams and Device sleep loops exit
    within milliseconds instead of after the natural timer.
  - Plumb abortController.signal into executeAction.

L2 packages/playground/src/common.ts + types.ts
  - executeAction forwards abortSignal to all three branches
    (callActionInActionSpace, aiAssert, agent[actionType]).

L3 packages/core/src/agent/{agent,tasks,task-builder}.ts + types.ts
  - callActionInActionSpace accepts { abortSignal } and throws
    immediately on already-aborted signals.
  - runPlans threads abortSignal through convertPlanToExecutable and
    checks signal.aborted at each task boundary.
  - Agent.destroy aborts the active controller before flushing the
    report, so the destroy path can't strand an in-flight LLM call.

L4 packages/computer/src/device.ts
  - destroy() sets this.destroyed = true on its first line.
  - Replace internal `await sleep(...)` loops in smoothMoveMouse,
    performScroll, typeText, smartTypeString, dragAndDrop with an
    abortable variant that throws once destroyed flips.
  - Public entry points (mouseMove, tap, hover, keyboardType, scroll,
    drag, screenshotBase64) gate on this.destroyed up front so a stale
    handle can't drive libnut after cancel.

L5 packages/visualizer/src/hooks/usePlaygroundExecution.ts
  - handleStop now flips interruptedFlagRef and setLoading(false)
    synchronously, then awaits cancelExecution — the spinner no longer
    waits on the server's destroy/flush tail.

The headless-linux-cancel CI job (added in the previous commits) was red
on this branch with "abort signal did not propagate" and SlowAction
running its full 60s timer; this commit should turn it green.
… arg

The previous fix added a third `{ abortSignal }` argument when forwarding to
callActionInActionSpace. Two existing common.test.ts assertions still
expected the two-argument shape and broke on CI. Update them to expect the
new options bag explicitly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant