Skip to content

Proposal: Make MemBuffer Simple and Robust #58289

@you06

Description

@you06

MemBuffer stores staging mutations before a transaction is committed. It offers a rich interface to meet TiDB's requirements.

  • A high-performance, single-threaded in-memory indexing data structure, supporting set/get operations and iterators.
  • Cascading transactions to support statement or row-level rollback.
  • Checkpointing functionality.
  • Snapshot reads/scans to provide historical versions.

The current usage in TiDB exceeds the capabilities MemBuffer was designed for. This issue identifies improper usages and tracks improvements.

Unsafe Concurrent Opearations

MemBuffer is not thread-safe, meaning it may panic due to data races if TiDB attempts concurrent operations.

Even with mutexes to avoid data races, TiDB cannot guarantee correctness, as explained in this comment.

There is already an RWMutex in MemBuffer, used to prevent SnapshotGetter races with write operations. While it feels unusual to have an RWMutex in a single-threaded data structure, at least the write operations will not affect snapshot read results.

Improper Snapshot Usage

Consider a simple scenario: we have a B-tree, and how do we update it during iteration?

package main

import (
	"fmt"

	"github.com/tidwall/btree"
)

func main() {
	// create a map
	var tree btree.Map[int, int]

	// init
	for i := 0; i < 10; i++ {
		tree.Set(i, i)
	}

	// iterate while updating
	tree.Scan(func(key, value int) bool {
		fmt.Printf("%d %d\n", key, value)
		tree.Set(2*key, 2*key) // this can be considered as a UB
		return true
	})
}

MemBuffer provides SnapshotIter and SnapshotIterReverse methods. However, during iteration over a snapshot, TiDB does not fully drain it at once and may write new data into MemBuffer midway. To ensure the correctness of the snapshot iterator, ART introduces a complex node retention mechanism (tikv/client-go#1503). This seems overly complex for an in-memory data structure. IMO, the iterator should fully drain the data in a single call, as MemBuffer is not designed to support MVCC.

Little changes.
type MemBuffer interface {
	...
+ 	// deprecated
	SnapshotIter([]byte, []byte) Iterator
+ 	// deprecated
	SnapshotIterReverse([]byte, []byte) Iterator

+ 	SnapshotScan([]byte, []byte) [][]byte
+ 	SnapshotScanReverse([]byte, []byte) [][]byte
More changes.
type MemBuffer interface {
	...
+ 	// deprecated
	SnapshotIter([]byte, []byte) Iterator
+ 	// deprecated
	SnapshotIterReverse([]byte, []byte) Iterator
+ 	// deprecated
	SnapshotGetter() Getter

+ 	Snapshot() MemBufferSnapshot
	...
}

// TiDB might read from a snapshot across multiple concurrent threads. Each operation within `MemBufferSnapshot` is guarded by an `RWMutex` to prevent data races with writes to the `MemBuffer`.
+type MemBufferSnapshot interface {
+	Scan([]byte, []byte) [][]byte
+	ScanReverse([]byte, []byte) [][]byte
+	Get(ctx context.Context, k []byte) ([]byte, error)
+}

During each statement execution, TiDB should using the snapshot following this procedure:

  1. Check if the MemBuffer is dirty, if so, create a MemBufferSnapshot by Snapshot method.
  2. When need to read snapshot data, always read from the Snapshot fetched in step1, it's thread safe.
  3. When the statement ends, should drop the snapshot and never use it anymore.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/enhancementThe issue or PR belongs to an enhancement.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions