-
Notifications
You must be signed in to change notification settings - Fork 108
Description
Issue: aiosqlite>=0.22.0 Causes Deadlock During Application Exit with nonebot-plugin-orm and Alembic
Description
When using aiosqlite>=0.22.0 as the backend for an async SQLAlchemy setup, the application hangs indefinitely upon completion of database operations, such as those performed by Alembic migrations, preventing normal exit. This is a regression introduced in version 0.22.0. Downgrading to aiosqlite<=0.21.0 resolves the issue immediately.
Framework Context & How to Reproduce
This issue is prominently encountered within the NoneBot framework when using the nonebot-plugin-orm plugin, because the plugin’s migration commands essentially invoke Alembic. The provided reproduction steps use this common scenario to demonstrate the problem concretely.
Reproduction Steps (Using a NoneBot Project as a Test Case):
- Create a new NoneBot2 project.
- Install the ORM plugin and its default extras:
nb plugin install nonebot-plugin-ormanduv add nonebot-plugin-orm[default]. - Run an Alembic migration command via the plugin’s CLI wrapper:
uv run nb orm upgrade. - Observed Behavior: The migration logs indicate success, but the process does not terminate. It must be killed manually (e.g., Ctrl+C).
- Expected Behavior: The command should execute the migration and exit cleanly.
- Verification: Pin the dependency to
aiosqlite==0.21.0. Re-run the command. It will now exit successfully.
Evidence and Diagnostics
A stack trace captured via py-spy dump --locals on the hanging process reveals the root of the deadlock:
Thread 0x7B2B1EFF8740 (active): "MainThread"
_shutdown (threading.py:1567)
Locals:
tlock: <_thread.lock at 0x7b2b1e04da80>
locks: [<_thread.lock at 0x7b2b0e443e80>]
lock: <_thread.lock at 0x7b2b0e443e80>
Thread 0x7B2B0E2006C0 (active): "Thread-1 (_connection_worker_thread)"
_connection_worker_thread (aiosqlite/core.py:59)
Arguments:
tx: <_queue.SimpleQueue at 0x7b2b0e45d490>
Locals:
future: <_asyncio.Future at 0x7b2b0e5ec510>
function: <functools.partial at 0x7b2b0e4ee110>
result: None
Key Findings from the Stack Trace:
- The MainThread is stuck in
threading._shutdown(), attempting to join a non-daemon thread (the lock0x7b2b0e443e80). - The aiosqlite
_connection_worker_threadis still active and waiting for work (result: None). - This indicates that the connection's worker thread was not shut down cleanly before the Python interpreter began its main thread shutdown sequence.
Environment
- Python Version: 3.10.16, 3.12.9 (Confirmed)
- aiosqlite Version: 0.22.0, 0.22.1 (BROKEN), 0.21.0 (WORKS)
- Key Dependencies:
nonebot-plugin-orm[default]==0.8.2,SQLAlchemy==2.0.45,alembic==1.17.2 - OS: Ubuntu Linux
Root Cause Analysis
The breaking change is the architectural refactor of the Connection class's shutdown mechanism between 0.21.0 and 0.22.0:
- Thread Model Change: In
0.21.0,Connectionwas aThreadsubclass. In0.22.0, it owns a worker thread (self._thread). This alters lifecycle management semantics that higher-level async frameworks may rely on. - Synchronous to Asynchronous Close: The
_stop_running()method, responsible for stopping the worker, was changed from a synchronous function (sending a sentinel) to an asynchronous function that returns aFutureand must beawaited. close()method change: Consequently,Connection.close()must nowawait self._stop_running().
The Problem: The async engine disposal process in SQLAlchemy (triggered by Alembic or application teardown) calls close() on connections. With the new version, this await may not be completing successfully within the context of a shutting down event loop. The worker thread's Future (0x7b2b0e5ec510 as seen in the dump) never resolves, leaving the thread alive. The Python interpreter then deadlocks in _shutdown() because a non-daemon thread (the aiosqlite worker) is still running.
Proposed Solution
We need a migration path for async frameworks. A potential fix could involve:
- Backwards-Compatible Shutdown Path: Ensure the cleanup in
__del__or provide a mechanism that can finalize the connection without deadlocking, even if the event loop is in a terminal state. - Documentation & Guidance: Clearly document this breaking change for async context managers and provide best practices for framework integrators.
Immediate Impact: This change breaks the shutdown sequence for applications using SQLAlchemy's async extension with Alembic, as exemplified by the NoneBot plugin case. Users are currently forced to pin aiosqlite<0.22.0.
Additional Context
The CHANGELOG for v0.22.0 lists "Wait for transaction queue to complete when closing connection (#305)", which is the PR that introduced this behavioral change. While waiting for transactions to complete is a valuable safety improvement, the new async requirement for close() has created a severe compatibility issue with established async ORM patterns.