Commit c70a032
authored
Merge 'feat: add PostgreSQL-style sequences and MVCC-safe AUTOINCREMENT' from Glauber Costa
Add CREATE SEQUENCE / DROP SEQUENCE / nextval() / setval() / currval()
as first-class schema objects, and reimplement AUTOINCREMENT on top of
the same infrastructure so it works correctly under MVCC.
Sequences are the standard mechanism for generating monotonically
increasing (or decreasing) identifiers. This commit adds full support
for user-created sequences (`CREATE SEQUENCE name [INCREMENT BY n]
[MINVALUE n] [MAXVALUE n] [START n] [CYCLE | NO CYCLE]`) and the
functions `nextval()`, `setval()`, `currval()`. AUTOINCREMENT tables now
implicitly create a sequence named
`__turso_internal_autoincrement_<table>` so high-water-mark logic is
unified across both features.
## Design: disk is the source of truth
Every sequence is backed by a hidden B-tree table:
```sql
CREATE TABLE "__turso_internal_seq_<name>"(
value INTEGER PRIMARY KEY, -- rowid alias = the sequence value
is_called INTEGER,
start INTEGER, inc INTEGER,
min INTEGER, max INTEGER, cycle INTEGER)
```
`value` is `INTEGER PRIMARY KEY` (the rowid alias). The B-tree is keyed
by the sequence value, so `MIN(value)` / `MAX(value)` is an O(1) seek to
the first/last leaf — **the B-tree _is_ the watermark.**
The in-memory `Sequence` is **pure schema data** — `start`, `increment`,
`min`, `max`, `cycle`. No `AtomicI64`, no `is_called` flag, no dirty
bit, no in-process current-value cache. This is a deliberate change from
an earlier in-memory-atomic design, which couldn't be made safe across
processes (multiple workers on the same disk would each maintain their
own atomic and diverge). Reading from disk on every call is the only
design that survives multiprocess WAL, MVCC, and crash-recovery
uniformly.
## nextval execution
At compile time, `nextval()` emits `begin_write_on_database`, making the
statement a write transaction. At runtime the translator emits an RMW
sequence (see `core/translate/sequence.rs::emit_disk_read_nextval`):
1. **`SequenceBeginInnerTx`** — under MVCC, open an autonomous inner
Concurrent tx scoped to this RMW. The outer tx (if any) is saved off.
Under WAL or when the outer tx is exclusive, this is a no-op — the RMW
runs in the outer tx. The single-writer WAL lock provides the same
serialization the inner tx provides under MVCC.
2. Cursor RMW: `Rewind`/`Last` on the backing table → read the single
watermark row → `SequenceComputeNext` opcode computes the next value
from `(current, is_called, start, inc, min, max, cycle)` baked into the
program → `Insert` the new row keyed by the new value (since `value` is
`INTEGER PRIMARY KEY`, distinct nextvals land on distinct B-tree keys
and never key-conflict).
3. **Inline backing-table compaction**: a small loop emitted right after
the insert deletes every row except the new watermark. After every
successful nextval the backing table holds exactly one row. (This used
to run at commit time via `flush_dirty_sequences`, but the
implementation issued nested `prepare_internal()` calls that drove
`pager.io.step()` synchronously and broke the vdbe async contract — the
outer `Statement::step()` would block instead of returning
`StepResult::IO`. Moving the compaction inline fixed that.)
4. For AUTOINCREMENT-backing sequences, mirror the watermark into
`sqlite_sequence` in the same inner tx so SQLite-compat readers see the
high-water mark without needing a checkpoint.
5. **`SequenceCommitInnerTx`** — commit the inner tx. On
`WriteWriteConflict` (against another nextval racing on the same backing
table), `op_sequence_commit_inner_tx` retries the whole RMW with a
bounded budget (`ProgramState::sequence_inner_retry_count`); when the
budget is exhausted it surfaces `LimboError::Busy` so the caller can
retry at their level. The outer tx (if any) is restored.
Why an inner tx under MVCC: a nextval that aborts because its outer tx
rolls back must still leave its watermark advance durable on disk —
otherwise a parallel reader could observe the same value twice (the
first reader sees the in-memory state, the rolled-back tx releases its
disk write, the next reader recomputes and lands on the same value). The
autonomous inner tx commits the backing-table write independent of the
outer tx, so a rolled-back nextval still "consumed" the value. This is
the standard PostgreSQL semantics (rollback does not reclaim nextval'd
values; gaps are expected).
## setval
`setval(seq, value)` can place the watermark anywhere, which can't be
expressed as a conflict-free append (two concurrent `setval(s, 5)`
followed by `setval(s, 3)` are not commutative). It requires an
**exclusive write transaction** and is rejected inside `BEGIN
CONCURRENT`.
## currval
`currval(seq)` is per-connection. It returns the value most recently
emitted by `nextval` or `setval` on _this_ connection. It's a register
lookup populated by the `SetSequenceCurrval` opcode at the tail of every
nextval/setval; it does not touch disk.
## Restart / recovery
On database open, sequences are initialized from disk:
- User sequences: read the single watermark row from each
`__turso_internal_seq_*` backing table.
- AUTOINCREMENT: read `sqlite_sequence` and seed the implicit sequence
to that value.
If a crash occurs after a nextval committed but before its compaction
loop deleted the prior row, the backing table may briefly contain
multiple rows. The compaction always re-runs on the next nextval, and
the open-time read uses `MAX(value)` for ascending or `MIN(value)` for
descending sequences, so multi-row state is recovered cleanly.
The MVCC checkpoint state machine also rewrites `sqlite_sequence` from
the in-schema sequence list during checkpoint, ensuring the on-disk
autoinc state is clean after every checkpoint cycle.
## Testing
- Integration tests covering all sequence operations, concurrent nextval
from multiple connections, AUTOINCREMENT interaction, ATTACH database
sequences, transaction rollback semantics, descending and CYCLE
sequences, and restart-recovery scenarios.
- Simulator (`limbo_sim`): sequence-aware query generation (CREATE/DROP
SEQUENCE, nextval, setval) and the `SequenceMonotonicity` property —
creates a fresh sequence in a reserved namespace, runs N nextvals,
asserts the engine returned `start + k*increment` (or correct CYCLE
wrap).
- Concurrent simulator (`whopper`): dedicated sequence workloads testing
MVCC concurrent nextval, setval exclusivity, in-flight setval that
survives worker death, autocommit and in-tx nextval interleavings, CYCLE
wrap-aware duplicate / wrong-direction tracking, and cross-checkpoint
persistence. New simulator coverage includes `InsertSeqDefault` (INSERT
into table with `DEFAULT (nextval('s'))`) and tracker handling for
sequences advanced through that opaque path.
- SQL tests (sqltest): `sequence.sqltest`, `mvcc_sequence.sqltest`, and
`attach/sequence.sqltest` covering the full feature surface.
Closes #5688
Closes #713766 files changed
Lines changed: 12828 additions & 758 deletions
File tree
- bindings/rust/tests
- core
- mvcc
- database
- persistent_storage
- translate
- emitter
- expr
- vdbe
- docs
- sql-reference/statements
- parser/src
- ast
- sync/engine/src
- testing
- concurrent-simulator
- simulator
- generation
- model
- profiles
- runner
- sqltests
- tests
- turso-tests
- attach
- tests/integration
- functions
- query_processing
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1795 | 1795 | | |
1796 | 1796 | | |
1797 | 1797 | | |
1798 | | - | |
1799 | | - | |
| 1798 | + | |
| 1799 | + | |
| 1800 | + | |
| 1801 | + | |
| 1802 | + | |
| 1803 | + | |
1800 | 1804 | | |
1801 | | - | |
| 1805 | + | |
1802 | 1806 | | |
1803 | 1807 | | |
1804 | 1808 | | |
1805 | | - | |
1806 | | - | |
1807 | | - | |
1808 | | - | |
1809 | | - | |
1810 | | - | |
1811 | | - | |
1812 | | - | |
1813 | | - | |
1814 | | - | |
1815 | | - | |
1816 | | - | |
1817 | | - | |
1818 | | - | |
1819 | | - | |
1820 | | - | |
1821 | | - | |
1822 | | - | |
1823 | | - | |
| 1809 | + | |
| 1810 | + | |
| 1811 | + | |
| 1812 | + | |
| 1813 | + | |
| 1814 | + | |
| 1815 | + | |
1824 | 1816 | | |
1825 | 1817 | | |
1826 | | - | |
| 1818 | + | |
1827 | 1819 | | |
1828 | 1820 | | |
1829 | | - | |
1830 | | - | |
| 1821 | + | |
| 1822 | + | |
| 1823 | + | |
1831 | 1824 | | |
1832 | 1825 | | |
1833 | 1826 | | |
| |||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
276 | 276 | | |
277 | 277 | | |
278 | 278 | | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
835 | 835 | | |
836 | 836 | | |
837 | 837 | | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
838 | 842 | | |
839 | 843 | | |
840 | 844 | | |
| |||
948 | 952 | | |
949 | 953 | | |
950 | 954 | | |
| 955 | + | |
951 | 956 | | |
952 | 957 | | |
953 | 958 | | |
| |||
1086 | 1091 | | |
1087 | 1092 | | |
1088 | 1093 | | |
| 1094 | + | |
| 1095 | + | |
| 1096 | + | |
1089 | 1097 | | |
1090 | 1098 | | |
1091 | 1099 | | |
| |||
1232 | 1240 | | |
1233 | 1241 | | |
1234 | 1242 | | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
1235 | 1246 | | |
1236 | 1247 | | |
1237 | 1248 | | |
| |||
1801 | 1812 | | |
1802 | 1813 | | |
1803 | 1814 | | |
| 1815 | + | |
| 1816 | + | |
| 1817 | + | |
| 1818 | + | |
1804 | 1819 | | |
1805 | 1820 | | |
1806 | 1821 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
126 | | - | |
127 | 126 | | |
128 | 127 | | |
129 | 128 | | |
| |||
1973 | 1972 | | |
1974 | 1973 | | |
1975 | 1974 | | |
| 1975 | + | |
1976 | 1976 | | |
1977 | 1977 | | |
1978 | 1978 | | |
| |||
2008 | 2008 | | |
2009 | 2009 | | |
2010 | 2010 | | |
| 2011 | + | |
2011 | 2012 | | |
2012 | 2013 | | |
2013 | 2014 | | |
| |||
0 commit comments