Skip to content

Commit c70a032

Browse files
authored
Merge 'feat: add PostgreSQL-style sequences and MVCC-safe AUTOINCREMENT' from Glauber Costa
Add CREATE SEQUENCE / DROP SEQUENCE / nextval() / setval() / currval() as first-class schema objects, and reimplement AUTOINCREMENT on top of the same infrastructure so it works correctly under MVCC. Sequences are the standard mechanism for generating monotonically increasing (or decreasing) identifiers. This commit adds full support for user-created sequences (`CREATE SEQUENCE name [INCREMENT BY n] [MINVALUE n] [MAXVALUE n] [START n] [CYCLE | NO CYCLE]`) and the functions `nextval()`, `setval()`, `currval()`. AUTOINCREMENT tables now implicitly create a sequence named `__turso_internal_autoincrement_<table>` so high-water-mark logic is unified across both features. ## Design: disk is the source of truth Every sequence is backed by a hidden B-tree table: ```sql CREATE TABLE "__turso_internal_seq_<name>"( value INTEGER PRIMARY KEY, -- rowid alias = the sequence value is_called INTEGER, start INTEGER, inc INTEGER, min INTEGER, max INTEGER, cycle INTEGER) ``` `value` is `INTEGER PRIMARY KEY` (the rowid alias). The B-tree is keyed by the sequence value, so `MIN(value)` / `MAX(value)` is an O(1) seek to the first/last leaf — **the B-tree _is_ the watermark.** The in-memory `Sequence` is **pure schema data** — `start`, `increment`, `min`, `max`, `cycle`. No `AtomicI64`, no `is_called` flag, no dirty bit, no in-process current-value cache. This is a deliberate change from an earlier in-memory-atomic design, which couldn't be made safe across processes (multiple workers on the same disk would each maintain their own atomic and diverge). Reading from disk on every call is the only design that survives multiprocess WAL, MVCC, and crash-recovery uniformly. ## nextval execution At compile time, `nextval()` emits `begin_write_on_database`, making the statement a write transaction. At runtime the translator emits an RMW sequence (see `core/translate/sequence.rs::emit_disk_read_nextval`): 1. **`SequenceBeginInnerTx`** — under MVCC, open an autonomous inner Concurrent tx scoped to this RMW. The outer tx (if any) is saved off. Under WAL or when the outer tx is exclusive, this is a no-op — the RMW runs in the outer tx. The single-writer WAL lock provides the same serialization the inner tx provides under MVCC. 2. Cursor RMW: `Rewind`/`Last` on the backing table → read the single watermark row → `SequenceComputeNext` opcode computes the next value from `(current, is_called, start, inc, min, max, cycle)` baked into the program → `Insert` the new row keyed by the new value (since `value` is `INTEGER PRIMARY KEY`, distinct nextvals land on distinct B-tree keys and never key-conflict). 3. **Inline backing-table compaction**: a small loop emitted right after the insert deletes every row except the new watermark. After every successful nextval the backing table holds exactly one row. (This used to run at commit time via `flush_dirty_sequences`, but the implementation issued nested `prepare_internal()` calls that drove `pager.io.step()` synchronously and broke the vdbe async contract — the outer `Statement::step()` would block instead of returning `StepResult::IO`. Moving the compaction inline fixed that.) 4. For AUTOINCREMENT-backing sequences, mirror the watermark into `sqlite_sequence` in the same inner tx so SQLite-compat readers see the high-water mark without needing a checkpoint. 5. **`SequenceCommitInnerTx`** — commit the inner tx. On `WriteWriteConflict` (against another nextval racing on the same backing table), `op_sequence_commit_inner_tx` retries the whole RMW with a bounded budget (`ProgramState::sequence_inner_retry_count`); when the budget is exhausted it surfaces `LimboError::Busy` so the caller can retry at their level. The outer tx (if any) is restored. Why an inner tx under MVCC: a nextval that aborts because its outer tx rolls back must still leave its watermark advance durable on disk — otherwise a parallel reader could observe the same value twice (the first reader sees the in-memory state, the rolled-back tx releases its disk write, the next reader recomputes and lands on the same value). The autonomous inner tx commits the backing-table write independent of the outer tx, so a rolled-back nextval still "consumed" the value. This is the standard PostgreSQL semantics (rollback does not reclaim nextval'd values; gaps are expected). ## setval `setval(seq, value)` can place the watermark anywhere, which can't be expressed as a conflict-free append (two concurrent `setval(s, 5)` followed by `setval(s, 3)` are not commutative). It requires an **exclusive write transaction** and is rejected inside `BEGIN CONCURRENT`. ## currval `currval(seq)` is per-connection. It returns the value most recently emitted by `nextval` or `setval` on _this_ connection. It's a register lookup populated by the `SetSequenceCurrval` opcode at the tail of every nextval/setval; it does not touch disk. ## Restart / recovery On database open, sequences are initialized from disk: - User sequences: read the single watermark row from each `__turso_internal_seq_*` backing table. - AUTOINCREMENT: read `sqlite_sequence` and seed the implicit sequence to that value. If a crash occurs after a nextval committed but before its compaction loop deleted the prior row, the backing table may briefly contain multiple rows. The compaction always re-runs on the next nextval, and the open-time read uses `MAX(value)` for ascending or `MIN(value)` for descending sequences, so multi-row state is recovered cleanly. The MVCC checkpoint state machine also rewrites `sqlite_sequence` from the in-schema sequence list during checkpoint, ensuring the on-disk autoinc state is clean after every checkpoint cycle. ## Testing - Integration tests covering all sequence operations, concurrent nextval from multiple connections, AUTOINCREMENT interaction, ATTACH database sequences, transaction rollback semantics, descending and CYCLE sequences, and restart-recovery scenarios. - Simulator (`limbo_sim`): sequence-aware query generation (CREATE/DROP SEQUENCE, nextval, setval) and the `SequenceMonotonicity` property — creates a fresh sequence in a reserved namespace, runs N nextvals, asserts the engine returned `start + k*increment` (or correct CYCLE wrap). - Concurrent simulator (`whopper`): dedicated sequence workloads testing MVCC concurrent nextval, setval exclusivity, in-flight setval that survives worker death, autocommit and in-tx nextval interleavings, CYCLE wrap-aware duplicate / wrong-direction tracking, and cross-checkpoint persistence. New simulator coverage includes `InsertSeqDefault` (INSERT into table with `DEFAULT (nextval('s'))`) and tracker handling for sequences advanced through that opaque path. - SQL tests (sqltest): `sequence.sqltest`, `mvcc_sequence.sqltest`, and `attach/sequence.sqltest` covering the full feature surface. Closes #5688 Closes #7137
2 parents 7715979 + 20b7c23 commit c70a032

66 files changed

Lines changed: 12828 additions & 758 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

bindings/rust/tests/integration_tests.rs

Lines changed: 18 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1795,39 +1795,32 @@ async fn test_ghost_commits() {
17951795
}
17961796
}
17971797

1798-
/// AUTOINCREMENT is not supported in MVCC mode. Verify that CREATE TABLE
1799-
/// with AUTOINCREMENT fails with a clear error message.
1798+
/// AUTOINCREMENT is supported in MVCC mode via the sequence-backed
1799+
/// implementation: each AUTOINCREMENT table implicitly creates a
1800+
/// `__turso_internal_seq___turso_internal_autoincrement_<table>` backing
1801+
/// table, and rowid allocation goes through the shared Sequence atomic so
1802+
/// concurrent inserts don't conflict on `sqlite_sequence`. Verify CREATE
1803+
/// TABLE with AUTOINCREMENT succeeds and assigns monotonic rowids.
18001804
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
1801-
async fn test_autoincrement_blocked_in_mvcc() {
1805+
async fn test_autoincrement_works_in_mvcc() {
18021806
let (db, _dir) = setup_mvcc_db("").await;
18031807
let conn = db.connect().unwrap();
18041808

1805-
// CREATE TABLE with AUTOINCREMENT should fail
1806-
let result = conn
1807-
.execute(
1808-
"CREATE TABLE t(a INTEGER PRIMARY KEY AUTOINCREMENT, b TEXT)",
1809-
(),
1810-
)
1811-
.await;
1812-
assert!(
1813-
result.is_err(),
1814-
"CREATE TABLE with AUTOINCREMENT should fail in MVCC mode"
1815-
);
1816-
let err = result.unwrap_err().to_string();
1817-
assert!(
1818-
err.contains("AUTOINCREMENT is not supported in MVCC mode"),
1819-
"unexpected error: {err}"
1820-
);
1821-
1822-
// Regular tables without AUTOINCREMENT should still work
1823-
conn.execute("CREATE TABLE t(a INTEGER PRIMARY KEY, b TEXT)", ())
1809+
conn.execute(
1810+
"CREATE TABLE t(a INTEGER PRIMARY KEY AUTOINCREMENT, b TEXT)",
1811+
(),
1812+
)
1813+
.await
1814+
.unwrap();
1815+
conn.execute("INSERT INTO t(b) VALUES ('one')", ())
18241816
.await
18251817
.unwrap();
1826-
conn.execute("INSERT INTO t VALUES (1, 'hello')", ())
1818+
conn.execute("INSERT INTO t(b) VALUES ('two')", ())
18271819
.await
18281820
.unwrap();
1829-
let count = query_i64(&conn, "SELECT COUNT(*) FROM t").await;
1830-
assert_eq!(count, 1);
1821+
1822+
let max = query_i64(&conn, "SELECT MAX(a) FROM t").await;
1823+
assert_eq!(max, 2, "AUTOINCREMENT must assign monotonic rowids");
18311824
}
18321825

18331826
#[tokio::test]

core/connection.rs

Lines changed: 403 additions & 22 deletions
Large diffs are not rendered by default.

core/error.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -276,3 +276,10 @@ pub const SQLITE_CONSTRAINT_NOTNULL: usize = SQLITE_CONSTRAINT | (5 << 8);
276276
pub const SQLITE_CONSTRAINT_TRIGGER: usize = SQLITE_CONSTRAINT | (7 << 8);
277277
pub const SQLITE_FULL: usize = 13; // we want this in autoincrement - incase if user inserts max allowed int
278278
pub const SQLITE_CONSTRAINT_UNIQUE: usize = 2067;
279+
// Standard SQLite error code; kept for documentation and potential
280+
// reuse. The sequence inner-tx wrap used to emit Insn::Halt with this
281+
// code, but halt()'s constraint catch-all mis-wrapped it; Busy is now
282+
// returned directly via Err(LimboError::Busy) from
283+
// op_sequence_commit_inner_tx.
284+
#[allow(dead_code)]
285+
pub const SQLITE_BUSY: usize = 5;

core/function.rs

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -835,6 +835,10 @@ pub enum ScalarFunc {
835835
UnionValueFunc,
836836
UnionTagFunc,
837837
UnionExtractFunc,
838+
// Sequence functions
839+
NextVal,
840+
CurrVal,
841+
SetVal,
838842
}
839843

840844
impl Deterministic for ScalarFunc {
@@ -948,6 +952,7 @@ impl Deterministic for ScalarFunc {
948952
| ScalarFunc::UnionValueFunc
949953
| ScalarFunc::UnionTagFunc
950954
| ScalarFunc::UnionExtractFunc => true,
955+
ScalarFunc::NextVal | ScalarFunc::CurrVal | ScalarFunc::SetVal => false,
951956
}
952957
}
953958
}
@@ -1086,6 +1091,9 @@ impl Display for ScalarFunc {
10861091
Self::UnionValueFunc => "union_value",
10871092
Self::UnionTagFunc => "union_tag",
10881093
Self::UnionExtractFunc => "union_extract",
1094+
Self::NextVal => "nextval",
1095+
Self::CurrVal => "currval",
1096+
Self::SetVal => "setval",
10891097
};
10901098
write!(f, "{str}")
10911099
}
@@ -1232,6 +1240,9 @@ impl ScalarFunc {
12321240
Self::UnionValueFunc => &[2], // union_value('tag', value)
12331241
Self::UnionTagFunc => &[1], // union_tag(col)
12341242
Self::UnionExtractFunc => &[2], // union_extract(col, 'tag')
1243+
// Sequence functions
1244+
Self::NextVal | Self::CurrVal => &[1],
1245+
Self::SetVal => &[2, 3],
12351246
}
12361247
}
12371248

@@ -1801,6 +1812,10 @@ impl Func {
18011812
"union_value" => Ok(Some(Self::Scalar(ScalarFunc::UnionValueFunc))),
18021813
"union_tag" => Ok(Some(Self::Scalar(ScalarFunc::UnionTagFunc))),
18031814
"union_extract" => Ok(Some(Self::Scalar(ScalarFunc::UnionExtractFunc))),
1815+
// Sequence functions
1816+
"nextval" => Ok(Some(Self::Scalar(ScalarFunc::NextVal))),
1817+
"currval" => Ok(Some(Self::Scalar(ScalarFunc::CurrVal))),
1818+
"setval" => Ok(Some(Self::Scalar(ScalarFunc::SetVal))),
18041819
_ => Ok(None),
18051820
}
18061821
}

core/lib.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,6 @@ use storage::{page_cache::PageCache, sqlite3_ondisk::PageSize};
123123
use tracing::{instrument, Level};
124124
use turso_macros::{match_ignore_ascii_case, AtomicEnum};
125125
use turso_parser::{ast, ast::Cmd, parser::Parser};
126-
use util::parse_schema_rows;
127126

128127
pub use connection::{resolve_ext_path, Connection, Row, StepResult, SymbolTable};
129128
pub(crate) use connection::{AtomicTransactionState, TransactionState};
@@ -1973,6 +1972,7 @@ impl Database {
19731972
vdbe_trace: AtomicBool::new(false),
19741973
dml_require_where: AtomicBool::new(false),
19751974
dqs_dml: AtomicBool::new(true),
1975+
sequence_inner_retries: AtomicU64::new(0),
19761976
mv_tx: RwLock::new(None),
19771977
attached_mv_txs: RwLock::new(HashMap::default()),
19781978
#[cfg(any(test, injected_yields))]
@@ -2008,6 +2008,7 @@ impl Database {
20082008
named_savepoints: RwLock::new(Vec::new()),
20092009
schema_reparse_in_progress: AtomicBool::new(false),
20102010
prepare_context_generation: AtomicU64::new(0),
2011+
sequence_currvals: RwLock::new(HashMap::default()),
20112012
});
20122013
self.n_connections
20132014
.fetch_add(1, crate::sync::atomic::Ordering::SeqCst);

0 commit comments

Comments
 (0)