-
Notifications
You must be signed in to change notification settings - Fork 194
Description
Description
When opening a DuckDB database with a corrupted or incomplete WAL (Write-Ahead Log) file, DuckDB crashes due to an assertion failure, causing the entire process to abort. This failure occurs before any error can be propagated to the caller, making the situation unrecoverable at the application level.
This behavior is particularly problematic for embedded or desktop applications, where unexpected shutdowns (e.g., force-quit, power loss) are realistic scenarios.
Environment
- DuckDB Version: Tested on 1.1.x, confirmed present in 1.4.3 (latest as of Dec 2024)
- Bindings / Integration:
duckdb-rsv1.4.3 via Rust/Tauri desktop application - OS: macOS 14.x (Apple Silicon), also reproducible on other platforms
Error Message
Assertion failed: (index.IsBound()), function operator(),
file row_group_collection.cpp, line 671.
Steps to Reproduce
- Create a DuckDB database and perform write operations.
- Force-quit the application mid-transaction (simulating a crash or power failure).
- This leaves a WAL file in an inconsistent or partially written state.
- Attempt to reopen the database.
- DuckDB crashes during WAL replay with the assertion failure above.
Expected Behavior
DuckDB should handle this scenario gracefully by one of the following means:
- Returning a recoverable error (e.g.,
Result::Err) that allows the caller to handle the failure. - Gracefully skipping or invalidating corrupted WAL entries.
- At minimum, avoiding a hard process abort and allowing the application to continue running.
Actual Behavior
- The process is terminated via a C++
abort()triggered by an assertion. - No error is returned to the caller.
- Application-level recovery code never runs.
This is especially problematic because:
- Rust’s
catch_unwindcannot catch C++abort()calls. - The entire application crashes, not just the database subsystem.
- Users lose all application state and context.
- There is no opportunity for graceful recovery or user-facing error handling.
Workaround
We implemented a defensive workaround that deletes WAL files before attempting to open the database:
// Pre-emptively remove WAL files BEFORE attempting to open
let wal_path = format!("{}.wal", db_path);
if std::path::Path::new(&wal_path).exists() {
std::fs::remove_file(&wal_path)?;
}
// Now safe to open - DuckDB won't try to replay WAL
let conn = Connection::open(&db_path)?;This avoids the crash, but results in data loss for any uncommitted transactions present in the WAL.
Suggested Fix
-
Replace assertions in WAL replay and recovery code paths with proper error handling (return errors instead of asserting).
-
Consider adding a configuration option such as:
ignore_corrupted_walwal_recovery_mode = { strict | best_effort | skip }
-
At minimum:
- Log a warning
- Skip corrupted WAL entries
- Avoid calling
abort()during WAL replay
Relevant Code Path (Approximate)
DuckDBConnection::open()
→ DatabaseInstance::Initialize()
→ StorageManager::LoadDatabase()
→ WAL::Replay()
→ RowGroupCollection::operator() // assertion failure
→ abort()
Impact
- Severity: Critical – causes complete, unrecoverable application crashes
- Data Loss: Potential loss of entire application state (not limited to DB contents)
- User Experience: Extremely poor; application appears to crash randomly on startup
- Affected Users: Anyone using DuckDB in embedded, desktop, or offline-first applications where unexpected shutdowns can occur
Related Areas
This issue may be related to:
- WAL replay and recovery mechanisms
- Row group state management during recovery
- Checkpointing and commit boundary handling