Skip to content

Assertion failed: (index.IsBound()) crash during corrupted WAL replay - process aborts without recoverable error #649

@arghaffari

Description

@arghaffari

Description
When opening a DuckDB database with a corrupted or incomplete WAL (Write-Ahead Log) file, DuckDB crashes due to an assertion failure, causing the entire process to abort. This failure occurs before any error can be propagated to the caller, making the situation unrecoverable at the application level.

This behavior is particularly problematic for embedded or desktop applications, where unexpected shutdowns (e.g., force-quit, power loss) are realistic scenarios.


Environment

  • DuckDB Version: Tested on 1.1.x, confirmed present in 1.4.3 (latest as of Dec 2024)
  • Bindings / Integration: duckdb-rs v1.4.3 via Rust/Tauri desktop application
  • OS: macOS 14.x (Apple Silicon), also reproducible on other platforms

Error Message

Assertion failed: (index.IsBound()), function operator(), 
file row_group_collection.cpp, line 671.

Steps to Reproduce

  1. Create a DuckDB database and perform write operations.
  2. Force-quit the application mid-transaction (simulating a crash or power failure).
  3. This leaves a WAL file in an inconsistent or partially written state.
  4. Attempt to reopen the database.
  5. DuckDB crashes during WAL replay with the assertion failure above.

Expected Behavior
DuckDB should handle this scenario gracefully by one of the following means:

  • Returning a recoverable error (e.g., Result::Err) that allows the caller to handle the failure.
  • Gracefully skipping or invalidating corrupted WAL entries.
  • At minimum, avoiding a hard process abort and allowing the application to continue running.

Actual Behavior

  • The process is terminated via a C++ abort() triggered by an assertion.
  • No error is returned to the caller.
  • Application-level recovery code never runs.

This is especially problematic because:

  • Rust’s catch_unwind cannot catch C++ abort() calls.
  • The entire application crashes, not just the database subsystem.
  • Users lose all application state and context.
  • There is no opportunity for graceful recovery or user-facing error handling.

Workaround
We implemented a defensive workaround that deletes WAL files before attempting to open the database:

// Pre-emptively remove WAL files BEFORE attempting to open
let wal_path = format!("{}.wal", db_path);
if std::path::Path::new(&wal_path).exists() {
    std::fs::remove_file(&wal_path)?;
}

// Now safe to open - DuckDB won't try to replay WAL
let conn = Connection::open(&db_path)?;

This avoids the crash, but results in data loss for any uncommitted transactions present in the WAL.


Suggested Fix

  • Replace assertions in WAL replay and recovery code paths with proper error handling (return errors instead of asserting).

  • Consider adding a configuration option such as:

    • ignore_corrupted_wal
    • wal_recovery_mode = { strict | best_effort | skip }
  • At minimum:

    • Log a warning
    • Skip corrupted WAL entries
    • Avoid calling abort() during WAL replay

Relevant Code Path (Approximate)

DuckDBConnection::open()
  → DatabaseInstance::Initialize()
    → StorageManager::LoadDatabase()
      → WAL::Replay()
        → RowGroupCollection::operator()   // assertion failure
          → abort()

Impact

  • Severity: Critical – causes complete, unrecoverable application crashes
  • Data Loss: Potential loss of entire application state (not limited to DB contents)
  • User Experience: Extremely poor; application appears to crash randomly on startup
  • Affected Users: Anyone using DuckDB in embedded, desktop, or offline-first applications where unexpected shutdowns can occur

Related Areas
This issue may be related to:

  • WAL replay and recovery mechanisms
  • Row group state management during recovery
  • Checkpointing and commit boundary handling

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions