Skip to content

Comments

Attach fixes sim#5509

Open
glommer wants to merge 17 commits intotursodatabase:mainfrom
glommer:attach-fixes-sim
Open

Attach fixes sim#5509
glommer wants to merge 17 commits intotursodatabase:mainfrom
glommer:attach-fixes-sim

Conversation

@glommer
Copy link
Contributor

@glommer glommer commented Feb 21, 2026

Description

Add attach to the simulator. Then fix the absolute flurry of bugs that it found =)

Motivation and context

Productizing ATTACH

Description of AI Usage

I reviewed the simulator code closely with Claude, then when happy let it run overnight with a prompt instructing it to run simulations, stop when finding a bug, ignore bugs not related to attach, and then fix, commit, and move on.

Copy link

@turso-bot turso-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review @jussisaurio

glommer and others added 15 commits February 21, 2026 08:28
Add full infrastructure for testing ATTACH databases in the simulator:
- SimulatorEnv tracks attached databases with path helpers and cleanup
- Selective per-file fault injection via inject_fault_selective() on SimIO
- CREATE TABLE redirect to attached DBs (30% probability) in plan generation
- Proper QualifiedName/DoublyQualified handling for dotted table names in
  sql_generation (FROM clauses, column references, result columns)
- Re-ATTACH on database reopen with DatabaseOpts::with_attach(true)
- Fix core/connection.rs attach_database() to reuse parent DB IO instead
  of creating a new PlatformIO (enables simulator fault injection)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multiple interrelated fixes that make ATTACH database writes work
correctly across connections:

- Pager initialization: attach_database() now uses Database::_init()
  instead of init_pager(), ensuring encryption and auto_vacuum config
  are applied to attached pagers.

- Schema visibility: Schema changes on attached DBs are now kept in a
  connection-local copy until WAL commit succeeds, then published to
  the shared Database via publish_attached_schema(). Previously,
  changes were written directly to the shared schema, making
  uncommitted changes visible to other connections.

- Write lock handling: end_attached_write_txns() now iterates ALL
  attached pagers (via holds_write_lock()) instead of only the current
  program's write_databases set. This fixes explicit transactions
  where the COMMIT statement's program differs from the statement that
  acquired the write lock.

- IO yield during commit: Added CommittingAttached commit state to
  properly handle IO yields from commit_dirty_pages on attached
  pagers. Fixed commit_txn() early return that skipped re-entry when
  tx_state was already None (set by main pager commit before attached
  commit completes).

- Read transactions: SELECT on attached databases now emits
  Transaction opcodes to acquire WAL read locks, preventing stale
  reads. Added begin_read_on_database() to ProgramBuilder.

- Connection drop: WAL locks on attached pagers are now released
  when the connection is dropped.

Reproducer: cargo run --bin limbo_sim -- --seed 2 -n 200

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ALTER TABLE RENAME and other operations on attached databases were
updating the wrong sqlite_schema table because the UPDATE statement
wasn't qualified with the database name. Also, the sqlite_schema
btree lookup used the main schema instead of the attached database's
schema via with_schema().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When SchemaUpdated triggers a reprepare, attached database pagers
could retain stale read locks and connection-local schema changes from
the previous prepare. This caused an infinite SchemaUpdated loop
because the pager still saw old page 1 with a stale schema cookie.

Now reprepare() releases locks and discards local schema for attached
databases in write_databases before re-translating.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When ALTER TABLE RENAME was executed on an attached database table
(e.g., aux1.foo RENAME TO bar), the shadow model renamed the table
to just bar instead of aux1.bar, losing the database prefix.
Subsequent queries generated by the simulator would use the bare name,
which the database resolver could not find.

Reproducer: cargo run --bin limbo_sim -- --seed 500 -n 200

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
resolve_upsert_target() and get_index() in emit_upsert() were using
resolver.schema() (main DB schema) instead of the attached DB schema,
causing ON CONFLICT clause does not match any PRIMARY KEY or UNIQUE
constraint errors when INSERT ... ON CONFLICT targets a table in an
attached database.

Fix: pass database_id through bind_insert() and use
resolver.with_schema(database_id, ...) for both resolve_upsert_target
and get_index calls.

Reproducer: simulator seeds 22, 36, 58, 71, 73, 75

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
reprepare() only iterated write_databases to clean up attached pager
locks, but SELECT queries on attached databases only populate
read_databases. This caused an infinite SchemaUpdated loop when a
DDL change followed by ROLLBACK left stale schema state on the
attached pager used by a subsequent SELECT.

Fix: chain both write_databases and read_databases iterators when
cleaning up attached pager state during reprepare.

Reproducer: simulator seeds 64, 88

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two related issues with attached database write transactions:

1. Statement savepoints were only created for the main pager (db 0).
   When a write to an attached DB triggered ABORT (e.g. UNIQUE
   constraint violation), the rollback had no savepoint to restore
   on the attached pager, leaving dirty pages in place.

   Fix: open savepoints on attached pagers in the Transaction opcode
   and handle release/rollback in end_statement().

2. end_attached_write_txns() silently swallowed commit_dirty_pages
   errors (logging a warning and falling through to success), causing
   data loss when a fault occurred during attached pager commit.

   Fix: propagate the error with return Err(e).

Reproducer: simulator seeds 245 (savepoints), 251 (error propagation)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When commit_txn yields on IO during the attached DB commit phase
(CommittingAttached), re-entry into op_auto_commit needs to recognize
this state and continue the commit. Previously only CommitState::Committing
was checked, causing attached DB commits that yielded on IO to fall
through and attempt a new commit instead of resuming.

Reproducer: limbo_sim --seed 16

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When an explicit ROLLBACK or error-triggered rollback occurs after DDL
operations (e.g., DROP TABLE) on attached databases, the connection-local
schema cache was not being cleared. This left stale schema entries that
showed tables as dropped even after the rollback restored them on disk.

The fix adds database_schemas().remove() calls alongside
rollback_attached() in both the ROLLBACK opcode path (op_auto_commit)
and the error rollback path (rollback_current_txn), matching the
existing pattern in end_attached_write_txns.

Reproducer: limbo_sim --seed 313 (also fixes seed 202)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The IntegrityCk instruction (used by PRAGMA integrity_check) was always
using the main database pager, even when checking an attached database
(e.g., PRAGMA aux0.integrity_check). This caused it to read pages from
the main DB file instead of the attached DB file, leading to ShortRead
errors or incorrect integrity results.

Add a `db` field to IntegrityCk that carries the database index from
translation, and resolve the correct pager via
get_pager_from_database_index() at execution time.

Found via simulator with ATTACH support enabled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…OpenWrite

When UPDATE changes the primary key (rowid) on an attached database table
with a UNIQUE index, the PrebuiltEphemeralTable code path emitted OpenWrite
instructions with database_id=0 (main) instead of the attached database's
id. This caused "short read on page" errors (single connection) or
"IdxDelete: no matching index entry found" (multi-connection).

The bug had two locations:
- main_loop.rs: PrebuiltEphemeralTable case used the ephemeral table's
  database_id instead of target_table.database_id for the table cursor
- emitter.rs: target_database_id was derived from plan.table_references
  which contains the ephemeral table (database_id=0) after the optimizer
  replaces it, instead of using target_table.database_id

Found via simulator seed 867.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…th_flags

open_file_with_flags() would attempt to open the file and acquire an
exclusive lock before checking the DATABASE_MANAGER registry for an
already-open Database. When multiple connections ATTACH the same file,
the second ATTACH would fail with "Failed locking file" because the
first connection already holds the lock.

Fix by checking the registry early: if the database is already open
in-process, return the existing Arc<Database> without opening a new
file handle.

Found via simulator seed 37 (n=1000).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a statement writes to the main DB and reads from an attached DB,
step_end_write_txn only released write locks on attached pagers via
end_attached_write_txns, but never released read locks on attached
pagers that only had read transactions. This caused subsequent
statements to reuse a stale WAL snapshot on the attached DB, missing
data committed by other connections.

Add end_attached_read_txns calls to both Done paths in
step_end_write_txn (initial commit and CommittingAttached re-entry).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SQLite reserves database index 0 for "main" and 1 for "temp"; attached
databases start at index 2. The codebase had raw numeric comparisons
like `database_id >= 2` and `db: 0` scattered across translate/ and
vdbe/, which were unclear without knowing this convention.

Add MAIN_DB_ID, TEMP_DB_ID, FIRST_ATTACHED_DB_ID constants and an
is_attached_db() helper in core/lib.rs. Replace all magic numbers
across 17 files. Remove the duplicate local constants from
Resolver in emitter.rs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
glommer and others added 2 commits February 21, 2026 10:40
…ator

The differential simulator's SQLite schema cache workaround
(pragma_user_version query) only invalidated the main database's schema.
When ALTER TABLE DROP COLUMN was executed on an attached database, other
rusqlite connections still saw the old column count, causing false
divergences between limbo and rusqlite.

Fix by also querying {db_name}.sqlite_master for each attached database,
which forces SQLite to re-read the attached database's schema.

Discovered via seed: 12173377563160143215

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant