fix: make DuckDB attachments logic more robust #509
Merged
+99
−37
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR fixes DuckDB errors caused by a race condition when multiple connections attempt to attach databases concurrently.
When multiple connections called
query_arrow()simultaneously, each would:PRAGMA database_listATTACH IF NOT EXISTS '{db}' AS attachment_{random_id}_{i}one by oneRace condition: Between step 1 (check) and steps 3 (attach one by one), another connection could attach the same file or retrieve only partially attached databases. This result into errors
All DuckDB connections acquired from a single pool (including its clones) or created via
try_clone()share the same catalog including attached databases, but not thesearch_pathwhich is connection-level setting.Solution
DuckDBAttachmentsacross pool clones usingArc<OnceCell<...>>to ensure database attachments are configured exactly once per underlying connection pool (with firstset_database_attachmentswins, all datasets using the same pool has the same attachments)DuckDBAttachmentsexist per pool and its clones.Arc<OnceCellforsearch_pathto guarantee that actual logic to apply attachments is executed once. This also leads to better performance as we execute attach only once and re-use cached search path w/o executing additional statements to retrieve/verify/parse existing configuration before each call.Other alternatives considered:
set_attachmentsis called - unfortunately, at that moment not all tables are created and they won't be added to the catalog. Approach above doattachas part of first query, with combination of ready state this could be considered as very robust approach (all tables exist).