fix: make DuckDB attachments logic more robust #509

sgrebnov · 2025-12-05T07:19:06Z

PR fixes DuckDB errors caused by a race condition when multiple connections attempt to attach databases concurrently.

When multiple connections called query_arrow() simultaneously, each would:

Check if attachments exist via PRAGMA database_list
If not found, create new DuckDBAttachments with a unique random ID
Run ATTACH IF NOT EXISTS '{db}' AS attachment_{random_id}_{i} one by one

Race condition: Between step 1 (check) and steps 3 (attach one by one), another connection could attach the same file or retrieve only partially attached databases. This result into errors

Spice query failed. Status: 400, body: Execution error: Failed to execute query.\nDuckDB connection failed.\nBinder Error: Unique file handle conflict: Cannot attach "attachment_SDFrVyz5_0" - the database file "/app/.spice/data/accelerated_duckdb.db" is already attached by database "attachment_23yXahuv_0"\nFor details, refer to the DuckDB manual: https://duckdb.org/docs/"}

2025-11-17T19:30:47.876652Z WARN datafusion_table_providers::sql::db_connection_pool::dbconnection::duckdbconn: my_table.duckdb not found among existing attachments

All DuckDB connections acquired from a single pool (including its clones) or created via try_clone() share the same catalog including attached databases, but not the search_path which is connection-level setting.

┌─────────────────────────────────────────────────────────┐
│                  duckdb_database (self.db)              │
│  ┌─────────────────────────────────────────────────┐    │
│  │              Catalog (shared state)              │    │
│  │  - Tables                                        │    │
│  │  - Attached databases (attachment_xxx_0, etc.)   │    │
│  │  - Search paths (per-connection setting)         │    │
│  └─────────────────────────────────────────────────┘    │
│                                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
│  │ Connection 1 │  │ Connection 2 │  │ Connection 3 │   │
│  │  (original)  │  │ (try_clone)  │  │ (pool.get())  │   │
│  │              │  │              │  │              │   │
│  │ search_path: │  │ search_path: │  │ search_path: │   │
│  │   "main"     │  │   (default)  │  │   (default)  │   │
│  └──────────────┘  └──────────────┘  └──────────────┘   │
└─────────────────────────────────────────────────────────┘

Solution

Share DuckDBAttachments across pool clones using Arc<OnceCell<...>> to ensure database attachments are configured exactly once per underlying connection pool (with first set_database_attachments wins, all datasets using the same pool has the same attachments)

this guarantees that only single instance ofDuckDBAttachments exist per pool and its clones.
we can't pass attachments as parameter for pool initialization as attachments are not always available/calculated when pool is created first time.

Similar to pool use Arc<OnceCell for search_path to guarantee that actual logic to apply attachments is executed once. This also leads to better performance as we execute attach only once and re-use cached search path w/o executing additional statements to retrieve/verify/parse existing configuration before each call.

Other alternatives considered:

Apply attachments immediately when set_attachments is called - unfortunately, at that moment not all tables are created and they won't be added to the catalog. Approach above do attach as part of first query, with combination of ready state this could be considered as very robust approach (all tables exist).
Apply attachments as part of connection setup, not query logic - pool proactively creates connections so this leads to the same issue as above
There still could be cases where attach happens when not all tables exist, for example - one of accelerated tables with different duckdb_file can't connect to source during initialization to get schema and create initial empty table and first table is fully accelerated and queried - in this case first table will initialize attachments, but actual federated query will fail later (same for view that can start initialization later). This is not related to this specific implementation/change, just a note, that this could be further improved, for example by using additional step to update metadata when all tables are loaded. Current approach has been selected as robust, simple to implement and covering all important cases.

…g-attachments-lock # Conflicts: # Cargo.lock # core/Cargo.toml # core/src/sql/db_connection_pool/dbconnection/duckdbconn.rs

core/src/sql/db_connection_pool/dbconnection/duckdbconn.rs

core/Cargo.toml

core/src/sql/db_connection_pool/dbconnection/duckdbconn.rs

fix: make DuckDB attachments logic more robust

e95b6f8

sgrebnov mentioned this pull request Dec 5, 2025

Make DuckDB attachments logic more robust #508

Closed

Merge remote-tracking branch 'origin/spiceai' into sgrebnov/1204-debu…

e14389b

…g-attachments-lock # Conflicts: # Cargo.lock # core/Cargo.toml # core/src/sql/db_connection_pool/dbconnection/duckdbconn.rs

sgrebnov self-assigned this Dec 5, 2025

sgrebnov marked this pull request as ready for review December 5, 2025 07:27

sgrebnov commented Dec 5, 2025

View reviewed changes

core/src/sql/db_connection_pool/dbconnection/duckdbconn.rs Show resolved Hide resolved

sgrebnov commented Dec 5, 2025

View reviewed changes

core/Cargo.toml Show resolved Hide resolved

sgrebnov commented Dec 5, 2025

View reviewed changes

core/src/sql/db_connection_pool/dbconnection/duckdbconn.rs Show resolved Hide resolved

phillipleblanc approved these changes Dec 5, 2025

View reviewed changes

Merge branch 'spiceai' into sgrebnov/1204-debug-attachments-lock

e8220c3

sgrebnov merged commit 8b2d747 into spiceai Dec 5, 2025
11 checks passed

sgrebnov deleted the sgrebnov/1204-debug-attachments-lock branch December 5, 2025 17:27

sgrebnov mentioned this pull request Dec 5, 2025

fix: make DuckDB attachments logic more robust spiceai/spiceai#8411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: make DuckDB attachments logic more robust #509

fix: make DuckDB attachments logic more robust #509

sgrebnov commented Dec 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: make DuckDB attachments logic more robust #509

fix: make DuckDB attachments logic more robust #509

Conversation

sgrebnov commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Solution

Other alternatives considered:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sgrebnov commented Dec 5, 2025 •

edited

Loading