Skip to content

Process shell messages concurrently while async cells await#24

Closed
ncoop57 wants to merge 1 commit into
mainfrom
fix-concurrent-execute
Closed

Process shell messages concurrently while async cells await#24
ncoop57 wants to merge 1 commit into
mainfrom
fix-concurrent-execute

Conversation

@ncoop57

@ncoop57 ncoop57 commented Jun 11, 2026

Copy link
Copy Markdown

Issue

solveit's load_dialog times out (240s) with ipymini whenever the loaded dialog has code messages. The call is re-entrant: a code cell awaits an HTTP roundtrip to the solveit server, and the server's import_code sends execute_requests back to the same kernel before responding. Subshells handled mailbox items strictly one at a time, so those requests sat queued behind the awaiting cell forever — a deadlock. (ipyku_launcher avoids this by nulling ipykernel's _main_asyncio_lock; stock ipykernel_launcher deadlocks the same way ipymini did.)

Fix

Subshell._handle_actor_item now spawns each shell message as a task on the subshell loop instead of awaiting it inline, so the mailbox keeps draining while an async cell awaits. Sync cells never suspend, so they still run strictly in order, and the stop_on_error abort check runs in each task's first synchronous step. Supporting changes:

  • Execution state uses a counter; idle/executing only flip when no execute is active.
  • Cancel scopes are a per-execution set (tracked via contextvar); interrupt cancels all awaiting async cells.
  • Pending handler tasks are cancelled when the subshell stops.

Semantic change

An erroring async cell no longer aborts executes submitted while it was awaiting — they may already have run. This matches ipyku_launcher (solveit's current default). test_subshell_stop_on_error_isolated now uses a sync failing cell, preserving its intent (cross-subshell abort isolation).

Testing

  • New tests/kernel/test_concurrent_execute.py: reproduces the deadlock (red before this change) and asserts sync cells still serialize.
  • Full suite + slow tests pass (the meta/ notebook test needs an uncommitted local file, unrelated).
  • Verified end-to-end through solveit's SolveitKernelManager/ConKernelClient with IPY_LAUNCHER=ipymini: the import_code pattern completes in ~0.5s; the same script on main reproduces the TimeoutError.

🤖 Generated with Claude Code

Subshells handled mailbox items strictly one at a time, so an execute_request
sent while an async cell was awaiting could not run until that cell finished.
solveit's load_dialog depends on this re-entrancy (a cell awaits a server
roundtrip that sends executes back to the same kernel), so it deadlocked
until the client timeout.

Each shell message now runs in its own task on the subshell loop. Sync cells
never suspend, so they still run strictly in order, and stop_on_error
aborting is checked in each task's first synchronous step. Execution state
uses a counter, cancel scopes are tracked per execution (interrupt cancels
all awaiting cells), and pending tasks are cancelled on subshell shutdown.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@jph00

jph00 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Fixed in microio (I think!)

@jph00 jph00 closed this Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants