Skip to content

Breaking Changes: Alembic(SQLAlchemy) migration hangs on exit with aiosqlite >=0.22.0 #371

@JohnRichard4096

Description

@JohnRichard4096

Issue: aiosqlite>=0.22.0 Causes Deadlock During Application Exit with nonebot-plugin-orm and Alembic

Description

When using aiosqlite>=0.22.0 as the backend for an async SQLAlchemy setup, the application hangs indefinitely upon completion of database operations, such as those performed by Alembic migrations, preventing normal exit. This is a regression introduced in version 0.22.0. Downgrading to aiosqlite<=0.21.0 resolves the issue immediately.

Framework Context & How to Reproduce
This issue is prominently encountered within the NoneBot framework when using the nonebot-plugin-orm plugin, because the plugin’s migration commands essentially invoke Alembic. The provided reproduction steps use this common scenario to demonstrate the problem concretely.

Reproduction Steps (Using a NoneBot Project as a Test Case):

  1. Create a new NoneBot2 project.
  2. Install the ORM plugin and its default extras: nb plugin install nonebot-plugin-orm and uv add nonebot-plugin-orm[default].
  3. Run an Alembic migration command via the plugin’s CLI wrapper: uv run nb orm upgrade.
  4. Observed Behavior: The migration logs indicate success, but the process does not terminate. It must be killed manually (e.g., Ctrl+C).
  5. Expected Behavior: The command should execute the migration and exit cleanly.
  6. Verification: Pin the dependency to aiosqlite==0.21.0. Re-run the command. It will now exit successfully.

Evidence and Diagnostics

A stack trace captured via py-spy dump --locals on the hanging process reveals the root of the deadlock:

Thread 0x7B2B1EFF8740 (active): "MainThread"
    _shutdown (threading.py:1567)
        Locals:
            tlock: <_thread.lock at 0x7b2b1e04da80>
            locks: [<_thread.lock at 0x7b2b0e443e80>]
            lock: <_thread.lock at 0x7b2b0e443e80>
Thread 0x7B2B0E2006C0 (active): "Thread-1 (_connection_worker_thread)"
    _connection_worker_thread (aiosqlite/core.py:59)
        Arguments:
            tx: <_queue.SimpleQueue at 0x7b2b0e45d490>
        Locals:
            future: <_asyncio.Future at 0x7b2b0e5ec510>
            function: <functools.partial at 0x7b2b0e4ee110>
            result: None

Key Findings from the Stack Trace:

  1. The MainThread is stuck in threading._shutdown(), attempting to join a non-daemon thread (the lock 0x7b2b0e443e80).
  2. The aiosqlite _connection_worker_thread is still active and waiting for work (result: None).
  3. This indicates that the connection's worker thread was not shut down cleanly before the Python interpreter began its main thread shutdown sequence.

Environment

  • Python Version: 3.10.16, 3.12.9 (Confirmed)
  • aiosqlite Version: 0.22.0, 0.22.1 (BROKEN), 0.21.0 (WORKS)
  • Key Dependencies: nonebot-plugin-orm[default]==0.8.2, SQLAlchemy==2.0.45, alembic==1.17.2
  • OS: Ubuntu Linux

Root Cause Analysis

The breaking change is the architectural refactor of the Connection class's shutdown mechanism between 0.21.0 and 0.22.0:

  1. Thread Model Change: In 0.21.0, Connection was a Thread subclass. In 0.22.0, it owns a worker thread (self._thread). This alters lifecycle management semantics that higher-level async frameworks may rely on.
  2. Synchronous to Asynchronous Close: The _stop_running() method, responsible for stopping the worker, was changed from a synchronous function (sending a sentinel) to an asynchronous function that returns a Future and must be awaited.
  3. close() method change: Consequently, Connection.close() must now await self._stop_running().

The Problem: The async engine disposal process in SQLAlchemy (triggered by Alembic or application teardown) calls close() on connections. With the new version, this await may not be completing successfully within the context of a shutting down event loop. The worker thread's Future (0x7b2b0e5ec510 as seen in the dump) never resolves, leaving the thread alive. The Python interpreter then deadlocks in _shutdown() because a non-daemon thread (the aiosqlite worker) is still running.

Proposed Solution

We need a migration path for async frameworks. A potential fix could involve:

  1. Backwards-Compatible Shutdown Path: Ensure the cleanup in __del__ or provide a mechanism that can finalize the connection without deadlocking, even if the event loop is in a terminal state.
  2. Documentation & Guidance: Clearly document this breaking change for async context managers and provide best practices for framework integrators.

Immediate Impact: This change breaks the shutdown sequence for applications using SQLAlchemy's async extension with Alembic, as exemplified by the NoneBot plugin case. Users are currently forced to pin aiosqlite<0.22.0.

Additional Context

The CHANGELOG for v0.22.0 lists "Wait for transaction queue to complete when closing connection (#305)", which is the PR that introduced this behavioral change. While waiting for transactions to complete is a valuable safety improvement, the new async requirement for close() has created a severe compatibility issue with established async ORM patterns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions