-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Description
MemBuffer stores staging mutations before a transaction is committed. It offers a rich interface to meet TiDB's requirements.
- A high-performance, single-threaded in-memory indexing data structure, supporting set/get operations and iterators.
- Cascading transactions to support statement or row-level rollback.
- Checkpointing functionality.
- Snapshot reads/scans to provide historical versions.
The current usage in TiDB exceeds the capabilities MemBuffer was designed for. This issue identifies improper usages and tracks improvements.
Unsafe Concurrent Opearations
MemBuffer is not thread-safe, meaning it may panic due to data races if TiDB attempts concurrent operations.
Even with mutexes to avoid data races, TiDB cannot guarantee correctness, as explained in this comment.
There is already an RWMutex in MemBuffer, used to prevent SnapshotGetter races with write operations. While it feels unusual to have an RWMutex in a single-threaded data structure, at least the write operations will not affect snapshot read results.
Improper Snapshot Usage
Consider a simple scenario: we have a B-tree, and how do we update it during iteration?
package main
import (
"fmt"
"github.com/tidwall/btree"
)
func main() {
// create a map
var tree btree.Map[int, int]
// init
for i := 0; i < 10; i++ {
tree.Set(i, i)
}
// iterate while updating
tree.Scan(func(key, value int) bool {
fmt.Printf("%d %d\n", key, value)
tree.Set(2*key, 2*key) // this can be considered as a UB
return true
})
}MemBuffer provides SnapshotIter and SnapshotIterReverse methods. However, during iteration over a snapshot, TiDB does not fully drain it at once and may write new data into MemBuffer midway. To ensure the correctness of the snapshot iterator, ART introduces a complex node retention mechanism (tikv/client-go#1503). This seems overly complex for an in-memory data structure. IMO, the iterator should fully drain the data in a single call, as MemBuffer is not designed to support MVCC.
Little changes.
type MemBuffer interface {
...
+ // deprecated
SnapshotIter([]byte, []byte) Iterator
+ // deprecated
SnapshotIterReverse([]byte, []byte) Iterator
+ SnapshotScan([]byte, []byte) [][]byte
+ SnapshotScanReverse([]byte, []byte) [][]byteMore changes.
type MemBuffer interface {
...
+ // deprecated
SnapshotIter([]byte, []byte) Iterator
+ // deprecated
SnapshotIterReverse([]byte, []byte) Iterator
+ // deprecated
SnapshotGetter() Getter
+ Snapshot() MemBufferSnapshot
...
}
// TiDB might read from a snapshot across multiple concurrent threads. Each operation within `MemBufferSnapshot` is guarded by an `RWMutex` to prevent data races with writes to the `MemBuffer`.
+type MemBufferSnapshot interface {
+ Scan([]byte, []byte) [][]byte
+ ScanReverse([]byte, []byte) [][]byte
+ Get(ctx context.Context, k []byte) ([]byte, error)
+}During each statement execution, TiDB should using the snapshot following this procedure:
- Check if the
MemBufferis dirty, if so, create aMemBufferSnapshotbySnapshotmethod. - When need to read snapshot data, always read from the
Snapshotfetched in step1, it's thread safe. - When the statement ends, should drop the snapshot and never use it anymore.