Skip to content

types: replace Datum's x any with decPtr *MyDecimal (56B -> 48B)#68011

Open
mjonss wants to merge 22 commits into
pingcap:masterfrom
mjonss:datum-typed-decimal-pointer
Open

types: replace Datum's x any with decPtr *MyDecimal (56B -> 48B)#68011
mjonss wants to merge 22 commits into
pingcap:masterfrom
mjonss:datum-typed-decimal-pointer

Conversation

@mjonss

@mjonss mjonss commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: ref #67976

Problem Summary:

Final step of the types.Datum size-reduction series tracked in #67976.
After Option A (72→56 B), Option B (pack Time into d.i), and the
removal of the KindInterfaceDeprecated escape hatch (#68007), the
heterogeneous x any slot in Datum only ever carries a *MyDecimal.
Replace it with a typed pointer to drop the 16-byte interface header
and reach a 48-byte Datum.

What changed and how does it work?

Core change — Datum struct layout

before (56 B):  k(1) decimal(1) collationID(2) length(4) i(8) b(24) x any(16)
after  (48 B):  k(1) decimal(1) collationID(2) length(4) i(8) b(24) decPtr(8)
  • pkg/types/datum.go:
    • x anydecPtr *MyDecimal.
    • GetMysqlDecimal / SetMysqlDecimal use d.decPtr directly (no
      type assertion, no interface unbox).
    • SetNull / SetMinNotNull clear decPtr (was d.x = nil).
    • MarshalJSON's "unexpected non-nil x" guard now checks decPtr.
    • Struct comment block refreshed.
  • pkg/types/datum_test.go: TestMarshalDatum and TestComputePlus
    adapted to the new field name / shape (the former compared the
    interface-held value by type; now it compares the typed *MyDecimal
    directly for decimal datums and uses the GetMysqlTime() facade for
    time datums, matching the Option B approach).

Bug caught and fixed along the way

GetMysqlDuration now restores the int(int8(d.decimal)) sign-extension
when unpacking Fsp. d.decimal is uint8 (after Option A), but
callers encode Fsp = -1 (the "unspecified" sentinel, see
types.UnspecifiedFsp), which round-trips as 255 in the narrow field
and needs the int8 cast to come back as -1. Surfaced by
TestMemJsonObjectagg, which formats a Duration with Fsp = -1 — the
missing sign-extension produced s[0:255] on a 6-byte string inside
Duration.formatFrac and crashed. This was a latent defect introduced
by Option A's narrowing of decimal; fixing it here also makes Option
A + friends safer in isolation.

Measurements

Linux/amd64, Intel i7-13700, benchstat -count=3 comparing A+B+pre-C
against A+B+pre-C+C (i.e. the effect of this PR's single commit on
top of the stack).

Metric A+B+pre-C A+B+pre-C+C Delta
unsafe.Sizeof(types.Datum) 56 B 48 B −14%
BenchmarkDecodeWithSize B/op (codec, make([]Datum, 100)) 6144 4864 −20.8%
BenchmarkDecodeWithOutSize B/op (codec) 15.83 KiB 15.33 KiB −3.2%
Codec B/op geomean −12.4%

Full journey from master: BenchmarkDecodeWithSize goes 8192 → 6144
(A) → 4864 (A+B+pre-C+C) = −40.6% B/op.

Dependencies

This PR cannot land before its parents. The branch contains them as
parent commits so reviewers can see the full stack:

ea7722bc34 types: replace Datum's x any with decPtr *MyDecimal  ← this PR
311aef687c types, parser/test_driver: soften KindInterfaceDeprecated doc to avoid SA1019
a1e1beed22 *: remove KindInterface escape hatch from types.Datum     (#68007)
d2e99eec18 types: pack Time into Datum.i to kill SetMysqlTime alloc  (#67978)
5fc0037ab1 *: shrink types.Datum from 72 to 56 bytes                 (#67977)
8ab1f4d9be types, tablecodec, codec: add benchmark baseline (part of #67977)

When the parents merge, rebasing this PR onto master should reduce to
the single Option C commit (ea7722bc34).

Verification

Local runs clean on the touched surface:

  • pkg/types/..., pkg/types/parser_driver/...,
    pkg/util/codec/..., pkg/util/collate/...,
    pkg/util/ranger/..., pkg/tablecodec/...,
    pkg/lightning/backend/kv/..., pkg/executor/aggfuncs/...: all pass.
  • pkg/util/chunk/...: the failpoint-gated tests
    (TestPanicWhenSpillToDisk, TestPanicDuringSortedRowContainerSpill)
    pass via tools/check/failpoint-go-test.sh. Also pre-existing on
    master.
  • SELECT * FROM t TABLESAMPLE (33.3 + 44.4) parses through the TiDB
    parser driver without panic (same smoke test as *: remove KindInterface escape hatch from types.Datum #68007).

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Reduce the in-memory footprint of types.Datum from 72 bytes (master) to 48 bytes via the A/B/pre-C/C series. This PR is the final step, replacing the internal `any` slot for decimals with a typed `*MyDecimal` pointer.

Summary by CodeRabbit

  • Bug Fixes

    • More consistent datum handling and unified collation lookup improving comparison correctness and memory estimates.
  • Tests

    • Large test-suite updates to reflect stricter type handling and removal of legacy interface behaviors.
  • Benchmarks

    • Benchmarks now report allocation metrics for clearer performance insights.
  • Breaking Changes

    • Some previously-accepted unsupported inputs now panic; legacy interface-based datum behavior is deprecated.

mjonss and others added 15 commits April 22, 2026 18:08
Turn on b.ReportAllocs() across Datum-related benchmarks so benchstat can
diff B/op and allocs/op against upcoming struct-size changes. Add three
new targeted benches in pkg/types:

- BenchmarkDatumCopy: tightest signal on Datum struct size (heterogeneous
  slice copied per iteration).
- BenchmarkDatumSetMysqlTime: locks in the current 1 alloc / 8 B per call
  caused by boxing Time through d.x; will drop to 0 allocs once Time packs
  into d.i.
- BenchmarkCompareDatumCollation: exercises the named-collator compare
  path, which the existing BenchmarkCompareDatum skips by using the
  binary collator.

Behavior-only change in benchmarks; no production code touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the 16-byte `collation string` field on types.Datum with a 2-byte
`collationID uint16` carrying the MySQL collation id, and shrink
`decimal uint16` (frac; MySQL DECIMAL frac <= 30, fsp <= 6) to `uint8`.
The freed padding absorbs the new collationID so the first struct word
stays 8 bytes. Net: Datum is now 56 bytes instead of 72 (-22%), every
`[]Datum` allocation is correspondingly smaller, and Datum.Equals now
compares collations with a uint16 compare instead of a string compare.

API compat:
- Collation() string and SetCollation(string) are preserved as
  facades that resolve via the charset package. External callers keep
  working unchanged.
- New CollationID() uint16 / SetCollationID(uint16) for hot paths.
- MarshalJSON/UnmarshalJSON keep the on-wire "collation" field as a
  name string, so older JSON blobs round-trip.
- collate.GetCollatorByID(0) now falls through silently (matching
  GetCollator("")), since id 0 is the uint16 counterpart of the
  legacy empty-string "no collation" sentinel.

Hot-path callers migrated from GetCollator(d.Collation()) to
GetCollatorByID(int(d.CollationID())) in:
codec.encodeString, ranger/detacher, planner/cardinality,
table/tables/mutation_checker, executor/foreign_key,
plus the internal Datum compare/key paths.

Datum.MemUsage drops the len(collation) term — the collation is now
inline (counted by EmptyDatumSize); TestRangeMemUsage is updated to
match. GetDatumMemSize in aggfuncs/func_group_concat follows suit.

Measured on linux/amd64, Intel i7-13700:
- unsafe.Sizeof(types.Datum) = 56 B (was 72 B)
- BenchmarkDecodeWithSize (codec):  8.0 KiB -> 6.0 KiB B/op (-25%),
  2435 -> 1862 ns/op (-24%)
- BenchmarkDecodeWithOutSize (codec): 20.88 KiB -> 15.83 KiB (-24%)
- BenchmarkCompareDatum (types): 34.06 -> 32.12 ns/op (-5.7%)
- BenchmarkEncodeValue (tablecodec): 135 -> 122 ns/op (-10%)
- No allocation regressions across the bench set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`types.Time` is `struct{CoreTime}` where `CoreTime` is a `uint64` —
exactly 8 bytes, the same size as `Datum.i`. Storing it through
`Datum.x any` boxes the value into an interface, and because Go stores
non-pointer interface payloads on the heap this allocates 8 bytes per
`SetMysqlTime` call. `SetMysqlDuration` already avoids the box by
packing into `d.i`; do the same for Time.

This also removes the `KindMysqlTime` branch from `Datum.Copy`: once
Time lives in `d.i`, the bitwise `*dst = *d` already produces an
independent value, so there is no need to re-set Time on the
destination (which itself was an allocation).

Benchstat before/after, linux/amd64, Intel i7-13700, -count=5:

  BenchmarkDatumSetMysqlTime   7.215 ns/op -> 0.102 ns/op  (-98.6%)
                               8 B/op      -> 0 B/op       (-100%)
                               1 allocs/op -> 0 allocs/op  (-100%)
  BenchmarkDatumCopy           46.14 ns/op -> 36.88 ns/op  (-20%)
                               64 B/op     -> 53 B/op      (-17%)
                               3 allocs/op -> 2 allocs/op  (-33%)

No struct-size change; Datum stays 72 bytes. The size reduction is a
separate change (pingcap#67977).

`TestMarshalDatum` previously asserted round-trip via `d.x.(Time)`;
updated to use `GetMysqlTime().Compare` since the round-trip value
no longer traverses `d.x`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The inl_join range-fallback thresholds (2900/2300/700) were calibrated
to Datum's old 72-byte size. After shrinking Datum to 56 bytes, ranges
fit comfortably under the old limits and no longer exercise the
progressive fallback. Lower the thresholds to 2300/1500/500 so the
same three fallback levels (drop trailing col, drop two cols, INL_JOIN
inapplicable) are triggered again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shrinking Datum.decimal from uint16 to uint8 in 2a3c81e dropped the
int8 sign-extending cast in GetMysqlDuration, so Fsp=-1 (UnspecifiedFsp)
came back as 255 and crashed Duration.formatFrac with a slice-bounds
panic (TestDateArithFuncs).

Restore the int8 round-trip so the -1 sentinel survives the uint storage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After shrinking Datum.collation from a byte string to a uint16 id, Equals
compares the canonical MySQL collation id instead of the raw name. That
makes "UTF8MB4_BIN" and "utf8mb4_bin" compare equal where they previously
compared unequal. It aligns with MySQL's case-insensitive collation names
but was not previously test-pinned, so a future refactor could silently
flip it back.

Add TestDatumEqualsCollation covering same-canonical/different-casing,
distinct-canonical, and the unknown-name-collapses-to-0 edge case. Also
call out the lossy name -> id storage in SetCollation's doc so future
readers understand the trade-off without hunting the test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Packing `Time` into `Datum.i` turned `GetMysqlTime` from a
panic-on-mismatch `d.x.(Time)` into an unchecked reinterpret of
`d.i` as CoreTime bits. Callers in `appendDatumForChecksum` relied
on the panic (caught by defer-recover) to convert a kind mismatch
into an error; without the panic `TestColumnEncode/mismatch/*`
for timestamp/datetime/date/newdate regressed.

Add an `intest.Assert(d.k == KindMysqlTime)` to `GetMysqlTime` so
tests catch any silent misuse, and add an explicit kind check in
`appendDatumForChecksum` before the `GetMysqlTime` call so the
"invalid type for checksum" error path is restored for release
builds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups to packing Time into Datum.i:

- Add a compile-time guard that unsafe.Sizeof(Time{}) == 8, so any
  future field added to Time fails the build rather than silently
  losing data through GetMysqlTime/SetMysqlTime.
- Drop the sizeOfMysqlTime charge from EstimatedMemUsage for
  KindMysqlTime: Time now lives in d.i, which is already counted in
  sizeOfEmptyDatum. The old addition double-counted 8 bytes per
  Time-valued Datum. Extend TestEstimatedMemUsage with a Time datum.
- In MarshalJSON, zero jd.I when marshaling KindMysqlTime so the
  packed coreTime isn't serialized twice (once via jd.I, once via
  jd.Time). UnmarshalJSON re-derives d.i from jd.Time, so round-trip
  is unchanged.
After shrinking types.Datum from 72 to 56 bytes, Ranges.MemUsage()
returns 16 fewer bytes per datum. The hardcoded thresholds in
TestPlanCacheForIndexRangeFallback (1330) and
TestPlanCacheForIndexJoinRangeFallback (1260) now sit above both the
short- and long-value scenarios, so the long case no longer triggers
fallback. Measured new usage in-process and lowered thresholds to
1120 and 1014 respectively so each test exercises the same
accept/fallback boundary as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Turn on b.ReportAllocs() across Datum-related benchmarks so benchstat can
diff B/op and allocs/op against upcoming struct-size changes. Add three
new targeted benches in pkg/types:

- BenchmarkDatumCopy: tightest signal on Datum struct size (heterogeneous
  slice copied per iteration).
- BenchmarkDatumSetMysqlTime: locks in the current 1 alloc / 8 B per call
  caused by boxing Time through d.x; will drop to 0 allocs once Time packs
  into d.i.
- BenchmarkCompareDatumCollation: exercises the named-collator compare
  path, which the existing BenchmarkCompareDatum skips by using the
  binary collator.

Behavior-only change in benchmarks; no production code touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the 16-byte `collation string` field on types.Datum with a 2-byte
`collationID uint16` carrying the MySQL collation id, and shrink
`decimal uint16` (frac; MySQL DECIMAL frac <= 30, fsp <= 6) to `uint8`.
The freed padding absorbs the new collationID so the first struct word
stays 8 bytes. Net: Datum is now 56 bytes instead of 72 (-22%), every
`[]Datum` allocation is correspondingly smaller, and Datum.Equals now
compares collations with a uint16 compare instead of a string compare.

API compat:
- Collation() string and SetCollation(string) are preserved as
  facades that resolve via the charset package. External callers keep
  working unchanged.
- New CollationID() uint16 / SetCollationID(uint16) for hot paths.
- MarshalJSON/UnmarshalJSON keep the on-wire "collation" field as a
  name string, so older JSON blobs round-trip.
- collate.GetCollatorByID(0) now falls through silently (matching
  GetCollator("")), since id 0 is the uint16 counterpart of the
  legacy empty-string "no collation" sentinel.

Hot-path callers migrated from GetCollator(d.Collation()) to
GetCollatorByID(int(d.CollationID())) in:
codec.encodeString, ranger/detacher, planner/cardinality,
table/tables/mutation_checker, executor/foreign_key,
plus the internal Datum compare/key paths.

Datum.MemUsage drops the len(collation) term — the collation is now
inline (counted by EmptyDatumSize); TestRangeMemUsage is updated to
match. GetDatumMemSize in aggfuncs/func_group_concat follows suit.

Measured on linux/amd64, Intel i7-13700:
- unsafe.Sizeof(types.Datum) = 56 B (was 72 B)
- BenchmarkDecodeWithSize (codec):  8.0 KiB -> 6.0 KiB B/op (-25%),
  2435 -> 1862 ns/op (-24%)
- BenchmarkDecodeWithOutSize (codec): 20.88 KiB -> 15.83 KiB (-24%)
- BenchmarkCompareDatum (types): 34.06 -> 32.12 ns/op (-5.7%)
- BenchmarkEncodeValue (tablecodec): 135 -> 122 ns/op (-10%)
- No allocation regressions across the bench set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`types.Time` is `struct{CoreTime}` where `CoreTime` is a `uint64` —
exactly 8 bytes, the same size as `Datum.i`. Storing it through
`Datum.x any` boxes the value into an interface, and because Go stores
non-pointer interface payloads on the heap this allocates 8 bytes per
`SetMysqlTime` call. `SetMysqlDuration` already avoids the box by
packing into `d.i`; do the same for Time.

This also removes the `KindMysqlTime` branch from `Datum.Copy`: once
Time lives in `d.i`, the bitwise `*dst = *d` already produces an
independent value, so there is no need to re-set Time on the
destination (which itself was an allocation).

Benchstat before/after, linux/amd64, Intel i7-13700, -count=5:

  BenchmarkDatumSetMysqlTime   7.215 ns/op -> 0.102 ns/op  (-98.6%)
                               8 B/op      -> 0 B/op       (-100%)
                               1 allocs/op -> 0 allocs/op  (-100%)
  BenchmarkDatumCopy           46.14 ns/op -> 36.88 ns/op  (-20%)
                               64 B/op     -> 53 B/op      (-17%)
                               3 allocs/op -> 2 allocs/op  (-33%)

No struct-size change; Datum stays 72 bytes. The size reduction is a
separate change (pingcap#67977).

`TestMarshalDatum` previously asserted round-trip via `d.x.(Time)`;
updated to use `GetMysqlTime().Compare` since the round-trip value
no longer traverses `d.x`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Datum's KindInterface (byte value 14) was an escape-hatch kind used to
carry arbitrary Go values that SetValue's type switch did not recognize,
stored via SetInterface in d.x (any). Production code had exactly one
caller of this path: the TABLESAMPLE grammar rule in parser.y, which
wrapped an already-ExprNode value via ast.NewValueExpr. That wrapping
was redundant (ast.TableSample.Expr is ExprNode; the Expression rule's
$4 already is one) and harmful (it forced the escape-hatch store). Fix
the grammar to pass the ExprNode through directly and retire the
escape hatch everywhere else.

Changes:

- pkg/parser/parser.y: TABLESAMPLE's Expression / RepeatableOpt values
  are passed through as-is instead of wrapped via ast.NewValueExpr.
  Regenerate parser.go via `make parser`.
- pkg/types/datum.go: rename KindInterface -> KindInterfaceDeprecated
  (value 14 preserved with a Deprecated: comment); delete SetInterface
  / GetInterface methods; SetValue / SetValueWithDefaultCollation
  defaults now panic; GetValue's default panics (with explicit nil
  returns for KindNull / KindMinNotNull / KindMaxValue / KindRaw which
  previously fell through the old default); Datum.String()'s default
  now prints "Unknown type: N" instead of "Unknown".
- pkg/types/etc.go: rename map entry; KindStr returns
  "Unknown kind N" on map miss instead of "".
- pkg/types/parser_driver/value_expr.go,
  pkg/lightning/backend/kv/base.go,
  pkg/parser/test_driver/test_driver.go,
  pkg/parser/test_driver/test_driver_datum.go:
  mirror the rename / method removal / panic defaults.
- pkg/executor/aggfuncs/func_group_concat.go: drop the
  getValMemDelta(d.GetInterface()) term in GetDatumMemSize; the
  interface payload no longer contributes a variable-length term.

Test updates:

- pkg/expression/builtin_{string,op,control,compare,time}_test.go:
  retire `errors.New(...)` rows used as "weird input to trigger an
  error path"; that path is now unreachable via the public API. Drop
  unused errors imports that result.
- pkg/expression/builtin_string_vec_test.go: pass int64(ast.TrimX)
  instead of ast.TrimX to NewDatum (ast.TrimDirectionType is a named
  int type not in SetValue's switch).
- pkg/expression/builtin_other_test.go: types.MakeDatums(tc.args) ->
  ...(tc.args...) (missing spread).
- pkg/expression/builtin_string_test.go: TestBin "" row retired (it
  was masked by escape-hatch NewDatum(Datum) wrapping); TestFormat
  error-Datum wrapping replaced with direct err-message checks.
- pkg/expression/evaluator_test.go: drop the KindInterface case from
  the test helper's kind->FieldType mapping.
- pkg/lightning/backend/kv/sql2kv_test.go: retire the sub-test that
  constructed an invalid Datum via SetInterface.
- pkg/types/{datum,convert}_test.go: switch error-path expectations to
  require.Panics where previously they relied on the escape hatch to
  surface downstream errors.
- pkg/executor/aggfuncs/aggfunc_test.go: add resultToDatum helper so
  tests can pass either a primitive or an already-built types.Datum
  to buildAggTester* (the latter previously round-tripped through the
  escape hatch via NewDatum(someDatum)).

Behavior:

- Verified empirically that `TABLESAMPLE (33.3 + 44.4)` parses
  correctly through TiDB's parser_driver with no panic.
- unsafe.Sizeof(types.Datum) unchanged at 72 B; this PR is purely
  about removing the escape hatch. Option C (replace `x any` with a
  typed `*MyDecimal`) is unblocked by this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d SA1019

`// Deprecated:` is a magic marker that `staticcheck/SA1019` treats
as an error on every reference. But a few legitimate sites still need
to enumerate `KindInterfaceDeprecated` (name tables in pkg/types/etc.go
and pkg/lightning/backend/kv/base.go; the "not implemented" restore
groups in pkg/types/parser_driver/value_expr.go and
pkg/parser/test_driver/test_driver.go). Those are not bugs -- they are
the handful of places where exhaustiveness requires naming the kind.

Rewrite the comment without the `Deprecated:` marker while keeping the
"do not use in new code" guidance. Linter passes cleanly now; the
message to readers is unchanged.

Also drop two incidental trailing blank lines the formatter flagged
in pkg/expression/builtin_op_test.go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With KindInterfaceDeprecated removed (prev commit) and Time packed into
d.i (earlier commit), the heterogeneous `x any` slot in Datum now only
ever carries a *MyDecimal. Replace it with a typed `decPtr *MyDecimal`
pointer. That drops the interface header (16B -> 8B), brings
unsafe.Sizeof(types.Datum) from 56B to 48B, and makes the decimal path
one less indirection.

Updates:

- Datum struct: `x any` -> `decPtr *MyDecimal`; comment block updated.
- SetNull / SetMinNotNull clear decPtr (was d.x = nil).
- GetMysqlDecimal / SetMysqlDecimal use decPtr directly (no type assertion).
- GetMysqlDuration restores `int(int8(d.decimal))` sign extension when
  unpacking Fsp. d.decimal is uint8, but callers encode Fsp = -1 as the
  "unspecified" sentinel (see types.UnspecifiedFsp); that round-trips as
  255 in the narrow field and needs the int8 cast to come back as -1.
  This was already required once Option A narrowed `decimal` to uint8;
  it surfaced here because TestMemJsonObjectagg formats a Duration with
  Fsp = -1 and the old `int(d.decimal)` cast produced a 255-length slice
  index inside Duration.formatFrac, crashing the test.
- MarshalJSON's "unexpected x" guard now checks decPtr.
- TestMarshalDatum and TestComputePlus debug output adapted to the new
  field name / shape.

Measurements (linux/amd64, Intel i7-13700):

- unsafe.Sizeof(types.Datum): 56 -> 48 bytes (-14% on top of A's 56B).
- BenchmarkDecodeWithSize (codec, make([]Datum, 100)): 6144 -> 4864 B/op,
  -20.8%. Total journey from master: 8192 -> 4864 = -40.6% B/op.
- BenchmarkDecodeWithOutSize (codec): 15.83 KiB -> 15.33 KiB, -3.2%.

Requires Option A (pingcap#67977), Option B (pingcap#67978), and KindInterface
removal (pingcap#68007) to have landed for the layout invariant to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ti-chi-bot

ti-chi-bot Bot commented Apr 23, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Apr 23, 2026
@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Refactors Datum to remove interface-backed storage (deprecating KindInterface), add inline fields and numeric CollationID, migrate callers to use CollationID/GetCollatorByID, update parser/test-driver and many tests, simplify some memory estimates, and add benchmark allocation reporting.

Changes

Cohort / File(s) Summary
Core Datum & types
pkg/types/datum.go, pkg/types/datum_test.go, pkg/types/etc.go, pkg/types/field_type.go
Remove interface-backed datum storage (KindInterface → KindInterfaceDeprecated); add/switch to numeric collationID with CollationID()/SetCollation(...); adjust JSON, equality, mem accounting, and tests.
Collation migration
pkg/util/collate/collate.go, pkg/util/codec/codec.go, pkg/util/ranger/detacher.go, pkg/util/ranger/types_test.go, pkg/executor/foreign_key.go, pkg/planner/cardinality/row_count_column.go, pkg/table/tables/mutation_checker.go, pkg/executor/aggfuncs/func_group_concat.go
Switch collator lookups to use Datum.CollationID() / GetCollatorByID; update memory-size estimation to rely on GetBytes() payload only; suppress warning when collation id == 0.
Parser & test-driver
pkg/parser/parser.go, pkg/parser/parser.y, pkg/parser/test_driver/test_driver.go, pkg/parser/test_driver/test_driver_datum.go, pkg/types/parser_driver/value_expr.go
Stop wrapping parsed expressions with ast.NewValueExpr; rename KindInterface uses to KindInterfaceDeprecated; remove GetInterface/SetInterface; make unsupported Go-value paths panic or be treated as deprecated.
Expression & builtin tests
pkg/expression/builtin_compare_test.go, pkg/expression/builtin_control_test.go, pkg/expression/builtin_op_test.go, pkg/expression/builtin_other_test.go, pkg/expression/builtin_string_test.go, pkg/expression/builtin_string_vec_test.go, pkg/expression/builtin_time_test.go, pkg/expression/evaluator_test.go
Remove/update test vectors that relied on the old interface escape hatch or unsupported Go inputs; replace some error expectations with panics; adjust datum construction in tests.
Executor & agg tests
pkg/executor/aggfuncs/aggfunc_test.go, pkg/executor/aggfuncs/func_group_concat.go
Add resultToDatum helper in tests to avoid re-wrapping types.Datum; simplify GetDatumMemSize to struct size + GetBytes() payload.
KV, rowcodec, encoding, benchmarks
pkg/lightning/backend/kv/base.go, pkg/lightning/backend/kv/sql2kv_test.go, pkg/util/rowcodec/common.go, pkg/util/codec/codec.go, pkg/tablecodec/bench_test.go, pkg/tablecodec/tablecodec_test.go, pkg/util/codec/bench_test.go
Adjust kind string label for deprecated interface; remove interface-based marshal test behavior; validate datum kind before GetMysqlTime(); use CollationID for encoding collator; add b.ReportAllocs() to benchmarks.
Planner & integration tests
pkg/planner/core/integration_test.go, tests/integrationtest/*/planner/core/casetest/index/*, tests/integrationtest/r/planner/core/integration.result, tests/integrationtest/t/planner/core/integration.test
Update @@tidb_opt_range_max_size values and corresponding expected Warning 1105 messages to reflect new memory-capacity numbers in test expectations.
Misc tests & converters
pkg/types/convert_test.go, pkg/executor/foreign_key.go, pkg/lightning/..., pkg/parser/..., pkg/expression/...
Various test updates to expect panics for unsupported Go types, minor cleanups, comments, and removal of deprecated error-vector cases.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • wjhuang2016
  • AilinKid
  • hawkingrei

Poem

🐰 I hopped through datums, tidy and neat,
IDs tucked in, inline fields snug and complete.
Collators found by number, no wrappers to show,
Tests trimmed their edges as the old tunnels close.
Benchmarks report their breaths — tiny hops that glow.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.61% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: replacing Datum's heterogeneous x any field with a typed decPtr *MyDecimal pointer, and the resulting size reduction from 56 to 48 bytes.
Description check ✅ Passed The description comprehensively covers the problem being solved, detailed change explanations, performance measurements, dependencies, verification, and a complete checklist with test confirmation and release notes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. sig/planner SIG: Planner labels Apr 23, 2026
@tiprow

tiprow Bot commented Apr 23, 2026

Copy link
Copy Markdown

Hi @mjonss. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mjonss mjonss changed the title *: Datum typed decimal pointer types: replace Datum's x any with decPtr *MyDecimal (56B -> 48B) Apr 23, 2026
@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Apr 23, 2026
@mjonss mjonss marked this pull request as ready for review April 24, 2026 09:16
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 24, 2026
@codecov

codecov Bot commented Apr 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 53.03618% with 727 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.5454%. Comparing base (5581189) to head (f6e5297).
⚠️ Report is 50 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68011        +/-   ##
================================================
- Coverage   77.7952%   77.5454%   -0.2498%     
================================================
  Files          1984       1986         +2     
  Lines        549728     559638      +9910     
================================================
+ Hits         427662     433974      +6312     
- Misses       121146     124477      +3331     
- Partials        920       1187       +267     
Flag Coverage Δ
integration 43.7294% <53.0058%> (+3.9322%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4888% <ø> (ø)
parser ∅ <ø> (∅)
br 49.9988% <ø> (-13.0317%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/types/datum.go (1)

657-704: ⚠️ Potential issue | 🟠 Major

Restore explicit Datum passthrough in SetValue*.

Both switches now panic on a Datum input, even though this file still exposes helpers like NewDatum(existingDatum)/MakeDatums(...) that legitimately forward prebuilt datums through these paths. That turns a copy-through operation into a runtime panic.

Suggested fix
 func (d *Datum) SetValueWithDefaultCollation(val any) {
 	switch x := val.(type) {
+	case Datum:
+		*d = x
+		return
 	case nil:
 		d.SetNull()
 	case bool:
@@
 	default:
 		panic(fmt.Sprintf("types.Datum.SetValue: unsupported Go type %T", x))
 	}
 }

 func (d *Datum) SetValue(val any, tp *types.FieldType) {
 	switch x := val.(type) {
+	case Datum:
+		*d = x
+		return
 	case nil:
 		d.SetNull()
 	case bool:
@@
 	default:
 		panic(fmt.Sprintf("types.Datum.SetValue: unsupported Go type %T", x))
 	}
 }

Also applies to: 707-755

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/datum.go` around lines 657 - 704, The SetValueWithDefaultCollation
switch (and the similar SetValue* switches around the 707-755 range) currently
panics when given a Datum, breaking legitimate passthroughs from helpers like
NewDatum/MakeDatums; add explicit cases to accept both Datum and *Datum and copy
the incoming Datum into the receiver (e.g., call the existing Datum-copying
helper or implement a SetDatum/assign logic) so prebuilt datums are forwarded
instead of causing a panic—look for the switch in SetValueWithDefaultCollation
and the sibling SetValue* functions and add cases for Datum and *Datum that
perform a proper copy into d.
🧹 Nitpick comments (1)
pkg/lightning/backend/kv/sql2kv_test.go (1)

61-66: Drop the no-op _ = minNotNull.

minNotNull is already used on line 52 in the RowArrayMarshaller{...} literal, so the _ = minNotNull assignment is dead code left over from when the retired sub-test owned the reference. The explanatory comment can stay on its own.

♻️ Proposed cleanup
 	// The previous sub-test used types.Datum.SetInterface(1) to construct
 	// a KindInterfaceDeprecated Datum and verify the marshaller surfaces
 	// a ToString failure. With SetInterface removed there is no public
 	// API path that produces such a Datum, so the scenario is unreachable.
-	_ = minNotNull
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/lightning/backend/kv/sql2kv_test.go` around lines 61 - 66, Remove the
dead no-op assignment `_ = minNotNull` in the test; `minNotNull` is already
referenced in the RowArrayMarshaller literal (RowArrayMarshaller{...}) so the
standalone assignment can be deleted and the explanatory comment may remain
as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/executor/aggfuncs/func_group_concat.go`:
- Around line 644-648: GetDatumMemSize undercounts memory for decimals because
unsafe.Sizeof(*d) only sizes the pointer field and not the heap allocation
behind decPtr; update GetDatumMemSize to detect Datum kinds equal to
KindMysqlDecimal (via types.Datum.Kind or equivalent) and add the heap payload
size for the decimal (accounting for the decPtr-backed structure/slice memory —
e.g., size of the decimal struct plus any digit/slice backing length) to the
returned int64 so decimal allocations are included when computing spill/oom
decisions for GetDatumMemSize.

In `@pkg/expression/builtin_string_test.go`:
- Around line 2203-2216: The current tests for evalBuiltinFunc (called via
fc.getFunction and evalBuiltinFunc using formatTests2 and formatTests3 inputs)
use "if err != nil { require.Contains(...) }" which hides regressions; change
those to assert the expected successful outcome instead—use require.NoError(t,
err) (or require.Nil(t, err)) immediately after calling evalBuiltinFunc for f2
and f3, then keep the testutil.DatumEqual assertions for r2 and r3; this ensures
failures surface instead of being silently accepted as "not implemented"/"not
support for the specific locale".

In `@pkg/lightning/backend/kv/base.go`:
- Around line 61-79: The code indexes kindStr[kind] unsafely which can panic for
newer kinds (e.g., types.KindVectorFloat32); update the lookup that logs kinds
to be bounds-safe by checking the Kind value before indexing (e.g., if int(kind)
>= 0 && int(kind) < len(kindStr) use kindStr[kind] else use a safe fallback like
fmt.Sprintf("kind(%d)", kind) or a map-based lookup), and apply the same pattern
wherever kindStr is accessed (reference the kindStr symbol and the logging site
that currently does kindStr[kind]).

In `@pkg/parser/parser.y`:
- Around line 9855-9859: The existing comment references Datum's removed x any
slot; update it to explain that wrapping already-produced ast.ExprNode values
(from Expression ($4) and RepeatableOpt ($7)) with ast.NewValueExpr will
undesirably box those nodes as a literal value rather than using them as
expression nodes. Replace the old storage-detail explanation with a concise note
that ast.NewValueExpr converts an ast.ExprNode into a literal/value wrapper
(losing its role as an expression subtree), so the parser should use the
ExprNode values directly instead of calling ast.NewValueExpr.

In `@pkg/parser/test_driver/test_driver_datum.go`:
- Around line 146-148: The change removes GetInterface/SetInterface and makes
SetValue panic on the default branch and also omits MySQL-specific kinds
(Duration, Enum, Set, Time, JSON) from GetValue/SetValue; restore source
compatibility by reintroducing GetInterface and SetInterface as simple wrappers
around GetValue/SetValue (or by making SetValue fall back to the previous silent
stash behavior using the KindInterfaceDeprecated placeholder), and extend
GetValue and SetValue in test_driver (the functions named GetValue and SetValue
in pkg/parser/test_driver_datum.go) to handle Duration, Enum, Set, Time and JSON
the same way as production types.Datum does; finally add a parser module release
note/changelog entry documenting the breaking change if you intend to keep the
removal.

In `@pkg/types/datum.go`:
- Around line 609-612: The branch that treats KindRaw like null is incorrect:
update the switch in GetValue (or the method returning the inline value) so that
KindRaw is not grouped with KindNull/KindMinNotNull/KindMaxValue; instead return
the raw payload d.b when kind == KindRaw. Locate the switch that currently lists
"case KindNull, KindMinNotNull, KindMaxValue, KindRaw" and remove KindRaw from
that case, adding a separate branch (or fallthrough) that returns d.b for
KindRaw so Datum.String() and callers see the raw bytes.
- Around line 347-357: The bug: decPtr remains non-nil after switching a Datum
to most non-decimal kinds (only SetNull and SetMinNotNull clear it), causing
MarshalJSON to misclassify non-decimal datums as unsupported; fix by clearing
d.decPtr whenever you change the kind away from KindMysqlDecimal. Either add
d.decPtr = nil to every non-decimal setter (e.g., SetString, SetInt, SetTime,
SetBytes, SetJSON, SetMysqlBit, SetMysqlEnum/SetMysqlSet, etc.) or centralize
the logic in the single place that sets the kind (the common kind-setter
function used throughout the file) so it sets decPtr = nil unless k ==
KindMysqlDecimal; apply the same change for the other occurrence mentioned
(around lines 2294-2316).

---

Outside diff comments:
In `@pkg/types/datum.go`:
- Around line 657-704: The SetValueWithDefaultCollation switch (and the similar
SetValue* switches around the 707-755 range) currently panics when given a
Datum, breaking legitimate passthroughs from helpers like NewDatum/MakeDatums;
add explicit cases to accept both Datum and *Datum and copy the incoming Datum
into the receiver (e.g., call the existing Datum-copying helper or implement a
SetDatum/assign logic) so prebuilt datums are forwarded instead of causing a
panic—look for the switch in SetValueWithDefaultCollation and the sibling
SetValue* functions and add cases for Datum and *Datum that perform a proper
copy into d.

---

Nitpick comments:
In `@pkg/lightning/backend/kv/sql2kv_test.go`:
- Around line 61-66: Remove the dead no-op assignment `_ = minNotNull` in the
test; `minNotNull` is already referenced in the RowArrayMarshaller literal
(RowArrayMarshaller{...}) so the standalone assignment can be deleted and the
explanatory comment may remain as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fbd0c7ce-d83e-4dfb-a949-0454248856ae

📥 Commits

Reviewing files that changed from the base of the PR and between b628c22 and ea7722b.

📒 Files selected for processing (32)
  • pkg/executor/aggfuncs/aggfunc_test.go
  • pkg/executor/aggfuncs/func_group_concat.go
  • pkg/executor/foreign_key.go
  • pkg/expression/builtin_compare_test.go
  • pkg/expression/builtin_control_test.go
  • pkg/expression/builtin_op_test.go
  • pkg/expression/builtin_other_test.go
  • pkg/expression/builtin_string_test.go
  • pkg/expression/builtin_string_vec_test.go
  • pkg/expression/builtin_time_test.go
  • pkg/expression/evaluator_test.go
  • pkg/lightning/backend/kv/base.go
  • pkg/lightning/backend/kv/sql2kv_test.go
  • pkg/parser/parser.go
  • pkg/parser/parser.y
  • pkg/parser/test_driver/test_driver.go
  • pkg/parser/test_driver/test_driver_datum.go
  • pkg/planner/cardinality/row_count_column.go
  • pkg/table/tables/mutation_checker.go
  • pkg/tablecodec/bench_test.go
  • pkg/tablecodec/tablecodec_test.go
  • pkg/types/convert_test.go
  • pkg/types/datum.go
  • pkg/types/datum_test.go
  • pkg/types/etc.go
  • pkg/types/field_type.go
  • pkg/types/parser_driver/value_expr.go
  • pkg/util/codec/bench_test.go
  • pkg/util/codec/codec.go
  • pkg/util/collate/collate.go
  • pkg/util/ranger/detacher.go
  • pkg/util/ranger/types_test.go
💤 Files with no reviewable changes (2)
  • pkg/expression/evaluator_test.go
  • pkg/expression/builtin_op_test.go

Comment on lines 644 to 648
func GetDatumMemSize(d *types.Datum) int64 {
var datumMemSize int64
datumMemSize += int64(unsafe.Sizeof(*d))
datumMemSize += int64(len(d.Collation()))
datumMemSize += int64(len(d.GetBytes()))
datumMemSize += getValMemDelta(d.GetInterface()) - DefInterfaceSize
return datumMemSize

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include decimal payload in datum memory accounting.

At Line 646, unsafe.Sizeof(*d) only counts the pointer field, not the heap allocation behind decPtr. For KindMysqlDecimal, this under-reports memory and can distort spill/oom decisions in top-N group concat paths.

💡 Suggested fix
 func GetDatumMemSize(d *types.Datum) int64 {
-	var datumMemSize int64
-	datumMemSize += int64(unsafe.Sizeof(*d))
-	datumMemSize += int64(len(d.GetBytes()))
-	return datumMemSize
+	datumMemSize := int64(unsafe.Sizeof(*d))
+	datumMemSize += int64(len(d.GetBytes()))
+	if d.Kind() == types.KindMysqlDecimal && d.GetMysqlDecimal() != nil {
+		datumMemSize += int64(types.MyDecimalStructSize)
+	}
+	return datumMemSize
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func GetDatumMemSize(d *types.Datum) int64 {
var datumMemSize int64
datumMemSize += int64(unsafe.Sizeof(*d))
datumMemSize += int64(len(d.Collation()))
datumMemSize += int64(len(d.GetBytes()))
datumMemSize += getValMemDelta(d.GetInterface()) - DefInterfaceSize
return datumMemSize
func GetDatumMemSize(d *types.Datum) int64 {
datumMemSize := int64(unsafe.Sizeof(*d))
datumMemSize += int64(len(d.GetBytes()))
if d.Kind() == types.KindMysqlDecimal && d.GetMysqlDecimal() != nil {
datumMemSize += int64(types.MyDecimalStructSize)
}
return datumMemSize
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/aggfuncs/func_group_concat.go` around lines 644 - 648,
GetDatumMemSize undercounts memory for decimals because unsafe.Sizeof(*d) only
sizes the pointer field and not the heap allocation behind decPtr; update
GetDatumMemSize to detect Datum kinds equal to KindMysqlDecimal (via
types.Datum.Kind or equivalent) and add the heap payload size for the decimal
(accounting for the decPtr-backed structure/slice memory — e.g., size of the
decimal struct plus any digit/slice backing length) to the returned int64 so
decimal allocations are included when computing spill/oom decisions for
GetDatumMemSize.

Comment on lines 2203 to 2216
r2, err := evalBuiltinFunc(f2, ctx, chunk.Row{})
testutil.DatumEqual(t, types.NewDatum(errors.New("not implemented")), types.NewDatum(err))
if err != nil {
require.Contains(t, err.Error(), "not implemented")
}
testutil.DatumEqual(t, types.NewDatum(formatTests2.ret), r2)

f3, err := fc.getFunction(ctx, datumsToConstants(types.MakeDatums(formatTests3.number, formatTests3.precision, formatTests3.locale)))
require.NoError(t, err)
require.NotNil(t, f3)
r3, err := evalBuiltinFunc(f3, ctx, chunk.Row{})
testutil.DatumEqual(t, types.NewDatum(errors.New("not support for the specific locale")), types.NewDatum(err))
if err != nil {
require.Contains(t, err.Error(), "not support for the specific locale")
}
testutil.DatumEqual(t, types.NewDatum(formatTests3.ret), r3)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

TestFormat error assertions are now unconditional on the happy path.

if err != nil { require.Contains(...) } permits both success and a "not implemented"/"not support for the specific locale" failure to pass. Given the newly added TestFormatWithLocale below shows zh_CN and de_GE now succeed via the en_US fallback (producing "-12,332" and "12,345.67"), the if err != nil branch here is effectively dead and will silently mask a regression that re-introduces an error. Prefer tightening to the actual expected outcome.

🛡️ Proposed fix
 	r2, err := evalBuiltinFunc(f2, ctx, chunk.Row{})
-	if err != nil {
-		require.Contains(t, err.Error(), "not implemented")
-	}
+	require.NoError(t, err)
 	testutil.DatumEqual(t, types.NewDatum(formatTests2.ret), r2)

 	f3, err := fc.getFunction(ctx, datumsToConstants(types.MakeDatums(formatTests3.number, formatTests3.precision, formatTests3.locale)))
 	require.NoError(t, err)
 	require.NotNil(t, f3)
 	r3, err := evalBuiltinFunc(f3, ctx, chunk.Row{})
-	if err != nil {
-		require.Contains(t, err.Error(), "not support for the specific locale")
-	}
+	require.NoError(t, err)
 	testutil.DatumEqual(t, types.NewDatum(formatTests3.ret), r3)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/expression/builtin_string_test.go` around lines 2203 - 2216, The current
tests for evalBuiltinFunc (called via fc.getFunction and evalBuiltinFunc using
formatTests2 and formatTests3 inputs) use "if err != nil { require.Contains(...)
}" which hides regressions; change those to assert the expected successful
outcome instead—use require.NoError(t, err) (or require.Nil(t, err)) immediately
after calling evalBuiltinFunc for f2 and f3, then keep the testutil.DatumEqual
assertions for r2 and r3; this ensures failures surface instead of being
silently accepted as "not implemented"/"not support for the specific locale".

Comment on lines +61 to +79
types.KindNull: "null",
types.KindInt64: "int64",
types.KindUint64: "uint64",
types.KindFloat32: "float32",
types.KindFloat64: "float64",
types.KindString: "string",
types.KindBytes: "bytes",
types.KindBinaryLiteral: "binary",
types.KindMysqlDecimal: "decimal",
types.KindMysqlDuration: "duration",
types.KindMysqlEnum: "enum",
types.KindMysqlBit: "bit",
types.KindMysqlSet: "set",
types.KindMysqlTime: "time",
types.KindInterfaceDeprecated: "interface(deprecated)",
types.KindMinNotNull: "min",
types.KindMaxValue: "max",
types.KindRaw: "raw",
types.KindMysqlJSON: "json",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential panic in logging due to unsafe kind index lookup.

Line 111 indexes kindStr[kind] without bounds checks, but the table omits newer kinds (e.g. types.KindVectorFloat32). That can panic during error logging.

🔧 Suggested fix
 var kindStr = [...]string{
 	types.KindNull:                "null",
 	types.KindInt64:               "int64",
 	types.KindUint64:              "uint64",
 	types.KindFloat32:             "float32",
 	types.KindFloat64:             "float64",
 	types.KindString:              "string",
 	types.KindBytes:               "bytes",
 	types.KindBinaryLiteral:       "binary",
 	types.KindMysqlDecimal:        "decimal",
 	types.KindMysqlDuration:       "duration",
 	types.KindMysqlEnum:           "enum",
 	types.KindMysqlBit:            "bit",
 	types.KindMysqlSet:            "set",
 	types.KindMysqlTime:           "time",
 	types.KindInterfaceDeprecated: "interface(deprecated)",
 	types.KindMinNotNull:          "min",
 	types.KindMaxValue:            "max",
 	types.KindRaw:                 "raw",
 	types.KindMysqlJSON:           "json",
+	types.KindVectorFloat32:       "vector_float32",
 }
@@
 		if err := encoder.AppendObject(zapcore.ObjectMarshalerFunc(func(enc zapcore.ObjectEncoder) error {
-			enc.AddString("kind", kindStr[kind])
+			kindName := "unknown"
+			if int(kind) < len(kindStr) && kindStr[kind] != "" {
+				kindName = kindStr[kind]
+			}
+			enc.AddString("kind", kindName)
 			enc.AddString("val", redact.Value(str))
 			return nil
 		})); err != nil {

Also applies to: 111-111

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/lightning/backend/kv/base.go` around lines 61 - 79, The code indexes
kindStr[kind] unsafely which can panic for newer kinds (e.g.,
types.KindVectorFloat32); update the lookup that logs kinds to be bounds-safe by
checking the Kind value before indexing (e.g., if int(kind) >= 0 && int(kind) <
len(kindStr) use kindStr[kind] else use a safe fallback like
fmt.Sprintf("kind(%d)", kind) or a map-based lookup), and apply the same pattern
wherever kindStr is accessed (reference the kindStr symbol and the logging site
that currently does kindStr[kind]).

Comment thread pkg/parser/parser.y
Comment on lines +9855 to +9859
// Expression ($4) and RepeatableOpt ($7) are both declared as
// %type <expr> and already produce ast.ExprNode values. Wrapping
// them via ast.NewValueExpr would store the whole subtree inside a
// Datum's escape-hatch "x any" slot (the only production path that
// does so); use them directly instead.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rewrite the rationale comment to match the post-refactor state.

This comment still refers to Datum’s removed x any slot, so it becomes inaccurate once this PR lands. Please describe the problem as ast.NewValueExpr boxing an ast.ExprNode as a literal rather than referencing the old storage detail.

✏️ Suggested wording
-		// Expression ($4) and RepeatableOpt ($7) are both declared as
-		// %type <expr> and already produce ast.ExprNode values. Wrapping
-		// them via ast.NewValueExpr would store the whole subtree inside a
-		// Datum's escape-hatch "x any" slot (the only production path that
-		// does so); use them directly instead.
+		// Expression ($4) and RepeatableOpt ($7) are both declared as
+		// %type <expr> and already produce ast.ExprNode values. Wrapping
+		// them with ast.NewValueExpr would incorrectly box the expression
+		// subtree as a literal value, so keep the AST nodes directly.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Expression ($4) and RepeatableOpt ($7) are both declared as
// %type <expr> and already produce ast.ExprNode values. Wrapping
// them via ast.NewValueExpr would store the whole subtree inside a
// Datum's escape-hatch "x any" slot (the only production path that
// does so); use them directly instead.
// Expression ($4) and RepeatableOpt ($7) are both declared as
// %type <expr> and already produce ast.ExprNode values. Wrapping
// them with ast.NewValueExpr would incorrectly box the expression
// subtree as a literal value, so keep the AST nodes directly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/parser/parser.y` around lines 9855 - 9859, The existing comment
references Datum's removed x any slot; update it to explain that wrapping
already-produced ast.ExprNode values (from Expression ($4) and RepeatableOpt
($7)) with ast.NewValueExpr will undesirably box those nodes as a literal value
rather than using them as expression nodes. Replace the old storage-detail
explanation with a concise note that ast.NewValueExpr converts an ast.ExprNode
into a literal/value wrapper (losing its role as an expression subtree), so the
parser should use the ExprNode values directly instead of calling
ast.NewValueExpr.

Comment on lines +146 to +148
// GetInterface and SetInterface were removed along with the
// KindInterfaceDeprecated (14) kind. See pkg/types/datum.go for the
// rationale; callers should use the typed Set/Get methods.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm no remaining internal callers of the removed accessors and
# that nothing outside the types package still writes KindInterfaceDeprecated.
rg -nP '\b(GetInterface|SetInterface)\s*\(' --type=go
rg -nP '\bKindInterfaceDeprecated\b' --type=go
# Also check that every kind in the test_driver constant block has a GetValue branch
rg -nP '\bKind(Mysql(Duration|Enum|Set|Time|JSON)|InterfaceDeprecated)\b' pkg/parser/test_driver/test_driver_datum.go

Repository: pingcap/tidb

Length of output: 3168


🏁 Script executed:

# Check the test_driver_datum.go file around the mentioned lines
cat -n pkg/parser/test_driver/test_driver_datum.go | sed -n '140,160p'
cat -n pkg/parser/test_driver/test_driver_datum.go | sed -n '200,245p'

Repository: pingcap/tidb

Length of output: 2005


🏁 Script executed:

# Search for GetInterface and SetInterface method definitions in test_driver
rg -A5 'func.*GetInterface|func.*SetInterface' pkg/parser/test_driver/ --type=go

Repository: pingcap/tidb

Length of output: 38


🏁 Script executed:

# Check SetValue implementation to see default branch
ast-grep --pattern 'func ($_) SetValue($$$) {
  $$$
}'

Repository: pingcap/tidb

Length of output: 7953


🏁 Script executed:

# Look for CHANGELOG, release notes, or migration guides
fd -i 'changelog|release|news|migrate' -t f | head -20

Repository: pingcap/tidb

Length of output: 129


🏁 Script executed:

# Check if pkg/parser is a separate Go module or part of main module
cat go.mod | head -30
ls -la pkg/parser/go.mod 2>/dev/null || echo "No separate go.mod in pkg/parser"

# Look for any CHANGELOG or release docs in pkg/parser
find pkg/parser -name '*CHANGE*' -o -name '*RELEASE*' -o -name '*README*' 2>/dev/null | head -10

Repository: pingcap/tidb

Length of output: 1390


🏁 Script executed:

# Check GetValue implementation in test_driver to see coverage
cat -n pkg/parser/test_driver/test_driver_datum.go | sed -n '160,200p'

Repository: pingcap/tidb

Length of output: 1322


🏁 Script executed:

# Search for external callers (outside pkg/parser) that might be affected
rg 'test_driver\.Datum' --type=go | grep -v 'pkg/parser' | head -20

Repository: pingcap/tidb

Length of output: 38


🏁 Script executed:

# Check if there's any migration guide or deprecation notice
rg -i 'migration|deprecat|breaking|external' pkg/parser --type=go | head -15

Repository: pingcap/tidb

Length of output: 1584


🏁 Script executed:

# Check pkg/parser README to understand external exposure
cat pkg/parser/README.md

Repository: pingcap/tidb

Length of output: 3389


🏁 Script executed:

# Check pkg/parser/go.mod to confirm separate module
cat pkg/parser/go.mod

Repository: pingcap/tidb

Length of output: 1007


🏁 Script executed:

# Look for GetValue branches handling MySQL types mentioned in constants
rg -B2 -A2 'KindMysql(Duration|Enum|Set|Time|JSON)' pkg/parser/test_driver/test_driver_datum.go | grep -v '^--$'

Repository: pingcap/tidb

Length of output: 563


🏁 Script executed:

# Check if there are any version tags or release process for parser module
git tag -l 'parser*' 2>/dev/null | head -10 || echo "No parser-specific tags"

Repository: pingcap/tidb

Length of output: 38


🏁 Script executed:

# Check types.Datum.GetValue implementation to compare with test_driver
cat -n pkg/types/datum.go | sed -n '517,580p'

Repository: pingcap/tidb

Length of output: 1984


🏁 Script executed:

# Search for any root-level CHANGELOG or release notes
find . -maxdepth 2 -type f \( -iname '*changelog*' -o -iname '*release*' -o -iname '*version*' \) | grep -v '.git' | head -20

Repository: pingcap/tidb

Length of output: 112


🏁 Script executed:

# Verify if test_driver.Datum is documented as canonical impl
rg -B3 -A3 'canonical.*Datum.*implementation|test.driver.*canonical' pkg/parser --type=go

Repository: pingcap/tidb

Length of output: 38


🏁 Script executed:

# Get full types.Datum.GetValue implementation
cat -n pkg/types/datum.go | sed -n '578,650p'

Repository: pingcap/tidb

Length of output: 2802


🏁 Script executed:

# Check if there's any module release process or contributor notes mentioning external changes
find pkg/parser -name '*.md' | xargs cat 2>/dev/null | grep -i 'breaking\|external\|downstream\|release\|changelog'

Repository: pingcap/tidb

Length of output: 580


🏁 Script executed:

# Verify test_driver is indeed used as canonical by checking imports/usage
rg 'pkg/parser.*test_driver' pkg/types --type=go | head -10

Repository: pingcap/tidb

Length of output: 38


🏁 Script executed:

# Check if test_driver methods for MySQL types exist
rg 'GetMysqlDuration|GetMysqlEnum|GetMysqlSet|GetMysqlTime|GetMysqlJSON' pkg/parser/test_driver --type=go

Repository: pingcap/tidb

Length of output: 38


🏁 Script executed:

# Check if test_driver is explicitly intended as limited or reference implementation
cat -n pkg/parser/test_driver/test_driver.go | head -50

Repository: pingcap/tidb

Length of output: 1698


🏁 Script executed:

# Look for any docs about parser module stability or external contracts
find pkg/parser/docs -type f 2>/dev/null | xargs cat 2>/dev/null || echo "No docs dir found"

Repository: pingcap/tidb

Length of output: 5934


🏁 Script executed:

# Check documentation/comments about test_driver's intended scope
cat -n pkg/parser/test_driver/test_driver_datum.go | sed -n '1,50p'

Repository: pingcap/tidb

Length of output: 1889


🏁 Script executed:

# Verify that production types uses parser/test_driver import anywhere
rg 'import.*test_driver|from.*test_driver' pkg/types --type=go | head -10

Repository: pingcap/tidb

Length of output: 38


Heads up: this is an externally-visible behavior change for parser consumers.

pkg/parser is exposed as a separate Go module. Removing GetInterface/SetInterface and making SetValue's default branch panic (instead of silently stashing the value under KindInterfaceDeprecated) will break source compatibility for any external caller that referenced those methods or relied on the silent fallback behavior.

However, note that while test_driver is provided for testing, the quickstart documentation recommends production users use pkg/types/parser_driver instead. Still, the removal impacts the test_driver module itself, which is publicly distributed.

Additional issue: GetValue and SetValue in test_driver lack support for MySQL-specific types (Duration, Enum, Set, Time, JSON) that exist in the type constant block—these would panic at runtime if encountered, unlike the production types.Datum which handles them. Verify this asymmetry is intentional.

Consider adding a release note / changelog entry in the parser module so downstream users are aware of these breaking changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/parser/test_driver/test_driver_datum.go` around lines 146 - 148, The
change removes GetInterface/SetInterface and makes SetValue panic on the default
branch and also omits MySQL-specific kinds (Duration, Enum, Set, Time, JSON)
from GetValue/SetValue; restore source compatibility by reintroducing
GetInterface and SetInterface as simple wrappers around GetValue/SetValue (or by
making SetValue fall back to the previous silent stash behavior using the
KindInterfaceDeprecated placeholder), and extend GetValue and SetValue in
test_driver (the functions named GetValue and SetValue in
pkg/parser/test_driver_datum.go) to handle Duration, Enum, Set, Time and JSON
the same way as production types.Datum does; finally add a parser module release
note/changelog entry documenting the breaking change if you intend to keep the
removal.

Comment thread pkg/types/datum.go
Comment on lines 347 to 357
// SetNull sets datum to nil.
func (d *Datum) SetNull() {
d.k = KindNull
d.x = nil
d.decPtr = nil
}

// SetMinNotNull sets datum to minNotNull value.
func (d *Datum) SetMinNotNull() {
d.k = KindMinNotNull
d.x = nil
d.decPtr = nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

decPtr is still live after most kind switches.

This refactor documents decPtr as “non-nil only when k == KindMysqlDecimal”, but only SetNull and SetMinNotNull actually clear it. Reusing a datum from decimal to string/int/time leaves the stale pointer behind, and the new MarshalJSON guard will then reject an otherwise valid non-decimal datum as unsupported type.

Please clear decPtr in every non-decimal setter, or centralize kind-reset logic so the invariant is enforced in one place.

Also applies to: 2294-2316

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/datum.go` around lines 347 - 357, The bug: decPtr remains non-nil
after switching a Datum to most non-decimal kinds (only SetNull and
SetMinNotNull clear it), causing MarshalJSON to misclassify non-decimal datums
as unsupported; fix by clearing d.decPtr whenever you change the kind away from
KindMysqlDecimal. Either add d.decPtr = nil to every non-decimal setter (e.g.,
SetString, SetInt, SetTime, SetBytes, SetJSON, SetMysqlBit,
SetMysqlEnum/SetMysqlSet, etc.) or centralize the logic in the single place that
sets the kind (the common kind-setter function used throughout the file) so it
sets decPtr = nil unless k == KindMysqlDecimal; apply the same change for the
other occurrence mentioned (around lines 2294-2316).

Comment thread pkg/types/datum.go
Comment on lines +609 to +612
case KindNull, KindMinNotNull, KindMaxValue, KindRaw:
// These kinds carry no inline value; nil matches the prior
// behavior where d.x was unset and GetInterface returned nil.
return nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

KindRaw should still return its raw bytes from GetValue().

KindRaw carries its payload in d.b, but this branch now treats it like a null-like sentinel and returns nil. That changes callers from seeing the raw value to seeing no value at all, and even Datum.String() will degrade to KindRaw <nil>.

Suggested fix
-	case KindNull, KindMinNotNull, KindMaxValue, KindRaw:
+	case KindNull, KindMinNotNull, KindMaxValue:
 		// These kinds carry no inline value; nil matches the prior
 		// behavior where d.x was unset and GetInterface returned nil.
 		return nil
+	case KindRaw:
+		return d.GetRaw()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case KindNull, KindMinNotNull, KindMaxValue, KindRaw:
// These kinds carry no inline value; nil matches the prior
// behavior where d.x was unset and GetInterface returned nil.
return nil
case KindNull, KindMinNotNull, KindMaxValue:
// These kinds carry no inline value; nil matches the prior
// behavior where d.x was unset and GetInterface returned nil.
return nil
case KindRaw:
return d.GetRaw()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/datum.go` around lines 609 - 612, The branch that treats KindRaw
like null is incorrect: update the switch in GetValue (or the method returning
the inline value) so that KindRaw is not grouped with
KindNull/KindMinNotNull/KindMaxValue; instead return the raw payload d.b when
kind == KindRaw. Locate the switch that currently lists "case KindNull,
KindMinNotNull, KindMaxValue, KindRaw" and remove KindRaw from that case, adding
a separate branch (or fallthrough) that returns d.b for KindRaw so
Datum.String() and callers see the raw bytes.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/types/datum.go (1)

93-101: ⚠️ Potential issue | 🟠 Major

decPtr invariant is still only partially enforced.

The struct comment on Line 100 documents decPtr as “non-nil only when k == KindMysqlDecimal”, but only SetNull, SetMinNotNull, and SetMysqlDecimal manage the field. A Datum that previously held a decimal will keep the stale pointer after any other setter (SetInt64, SetString, SetBytes, SetMysqlTime, SetMysqlDuration, SetMysqlEnum, SetMysqlSet, SetMysqlJSON, SetMysqlBit, SetBinaryLiteral, SetVectorFloat32, SetRaw, SetAutoID, …), which then causes MarshalJSON’s default-branch guard at Line 2322 to reject an otherwise valid non-decimal datum as unsupported type: <k>.

Since Datums are frequently reused (e.g., via *dst = *d in Copy, or direct field resets in hot paths), please either clear d.decPtr in every non-decimal setter or centralize kind-change logic so the invariant is enforced in one place. Alternatively, relax the MarshalJSON guard to only check decPtr when d.k == KindMysqlDecimal.

Also applies to: 347-357, 2318-2324

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/datum.go` around lines 93 - 101, The Datum.decPtr invariant is not
enforced: decPtr should be non-nil only when k == KindMysqlDecimal but only
SetNull, SetMinNotNull and SetMysqlDecimal touch decPtr, so setters like
SetInt64, SetString, SetBytes, SetMysqlTime, SetMysqlDuration, SetMysqlEnum,
SetMysqlSet, SetMysqlJSON, SetMysqlBit, SetBinaryLiteral, SetVectorFloat32,
SetRaw, SetAutoID and Copy can leave a stale decPtr leading MarshalJSON to
misclassify non-decimal datums; fix by clearing d.decPtr (set to nil) whenever
changing k to a non-decimal kind (i.e., inside each non-decimal setter or,
preferably, centralize kind changes in a single helper like setKind(newKind)
that clears decPtr unless newKind == KindMysqlDecimal) and/or adjust MarshalJSON
to only treat decPtr as an error when d.k == KindMysqlDecimal so the invariant
is consistently maintained.
🧹 Nitpick comments (1)
pkg/types/datum_test.go (1)

698-708: Optional: make the alloc-free claim an actual assertion.

The comment says BenchmarkDatumSetMysqlTime "asserts that SetMysqlTime is alloc-free", but b.ReportAllocs() only reports — it does not fail the benchmark if an allocation ever regresses. If the zero-alloc property is meant to be load-bearing for this refactor, consider converting this into a small testing.AllocsPerRun-based test (or adding one alongside) so a future regression blocks CI rather than being silently reported.

func TestDatumSetMysqlTimeNoAlloc(t *testing.T) {
    tm := NewTime(FromGoTime(time.Date(2018, 3, 8, 16, 1, 0, 315313000, time.UTC)), mysql.TypeTimestamp, 6)
    var d Datum
    allocs := testing.AllocsPerRun(100, func() { d.SetMysqlTime(tm) })
    require.Zero(t, allocs)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/datum_test.go` around lines 698 - 708, Add a real assertion that
SetMysqlTime is alloc-free by adding a unit test that uses testing.AllocsPerRun
to measure allocations for Datum.SetMysqlTime (use the same time setup as
BenchmarkDatumSetMysqlTime: NewTime(FromGoTime(...), mysql.TypeTimestamp, 6) and
a zero-value Datum), then assert the measured allocs is zero (e.g. require.Zero
or t.Fatalf if non-zero); keep BenchmarkDatumSetMysqlTime for reporting but make
TestDatumSetMysqlTimeNoAlloc enforce the zero-allocation contract so regressions
fail CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/types/datum.go`:
- Around line 93-101: The Datum.decPtr invariant is not enforced: decPtr should
be non-nil only when k == KindMysqlDecimal but only SetNull, SetMinNotNull and
SetMysqlDecimal touch decPtr, so setters like SetInt64, SetString, SetBytes,
SetMysqlTime, SetMysqlDuration, SetMysqlEnum, SetMysqlSet, SetMysqlJSON,
SetMysqlBit, SetBinaryLiteral, SetVectorFloat32, SetRaw, SetAutoID and Copy can
leave a stale decPtr leading MarshalJSON to misclassify non-decimal datums; fix
by clearing d.decPtr (set to nil) whenever changing k to a non-decimal kind
(i.e., inside each non-decimal setter or, preferably, centralize kind changes in
a single helper like setKind(newKind) that clears decPtr unless newKind ==
KindMysqlDecimal) and/or adjust MarshalJSON to only treat decPtr as an error
when d.k == KindMysqlDecimal so the invariant is consistently maintained.

---

Nitpick comments:
In `@pkg/types/datum_test.go`:
- Around line 698-708: Add a real assertion that SetMysqlTime is alloc-free by
adding a unit test that uses testing.AllocsPerRun to measure allocations for
Datum.SetMysqlTime (use the same time setup as BenchmarkDatumSetMysqlTime:
NewTime(FromGoTime(...), mysql.TypeTimestamp, 6) and a zero-value Datum), then
assert the measured allocs is zero (e.g. require.Zero or t.Fatalf if non-zero);
keep BenchmarkDatumSetMysqlTime for reporting but make
TestDatumSetMysqlTimeNoAlloc enforce the zero-allocation contract so regressions
fail CI.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f5d66fa7-39c4-43b4-9f3d-7de2e3fb04b0

📥 Commits

Reviewing files that changed from the base of the PR and between ea7722b and 0f2db44.

📒 Files selected for processing (7)
  • pkg/executor/aggfuncs/func_group_concat.go
  • pkg/planner/core/integration_test.go
  • pkg/types/datum.go
  • pkg/types/datum_test.go
  • pkg/util/rowcodec/common.go
  • tests/integrationtest/r/planner/core/casetest/index/index.result
  • tests/integrationtest/t/planner/core/casetest/index/index.test
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/executor/aggfuncs/func_group_concat.go

…atum

The decPtr change (pingcap#68011) shrank types.Datum from 56 to 48 bytes,
dropping per-range memory by 16 B per Datum. Two fallback-gating tests
stopped triggering as a result:

- planner/core/casetest/index/index.test: TestIndexJoinRangeFallback's
  first level was 2300 (calibrated for 56 B by c229ffb); the t1
  IndexJoin full plan now sums to 2144 B, so 2300 sits above it and
  the first fallback level (drop trailing eq column) is skipped. Lower
  to 1900, which is still above the drop-d level (1680 B) but below
  the full plan, so the same three fallback steps fire.
- planner/core/integration.test: TestIndexRangeFallback's =1000 step
  no longer pushed t4's 3 composite (a,b) ranges (912 B) over the
  budget. Lower all four of that block's =1000 settings to =800 so the
  fallback-vs-full pairing holds for every query under it.

The 1500 / 500 / 300 thresholds in casetest/index already straddle the
new memusage levels and don't need to move.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ti-chi-bot

ti-chi-bot Bot commented Apr 28, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign benjamin2037, terry1purcell, zanmato1984 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mjonss and others added 4 commits April 29, 2026 17:51
Shrinking Datum.decimal from uint16 to uint8 (2a3c81e) made
MarshalJSON emit the UnspecifiedFsp sentinel as "decimal":255 instead
of the legacy "decimal":65535. The sentinel is normally read back as
signed -1; with the old uint16 reader, 255 is a positive frac, so the
PR-authored "JSON wire format stays stable" guarantee on jsonDatum no
longer held across mixed versions.

Map ^uint8(0) → ^uint16(0) at marshal time. UnmarshalJSON's existing
uint8(jd.Decimal) truncation already collapses 65535 → 255, so the
new reader is unchanged and the round-trip is fixed in both
directions.

Add TestMarshalDatumUnspecifiedFsp asserting both the wire payload
("decimal":65535) and the in-memory sentinel survival (Fsp == -1
after Marshal/Unmarshal).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The setter was added alongside CollationID() during the collation-id
shrink, but no caller materialized — every SetCollation site still
passes a string sourced from FieldType.GetCollate(), so the integer
setter never gets exercised. Drop it; the getter stays because
hot-path callers (ranger, codec, cardinality, mutation_checker) use it
for uint16 compares.

A real perf win for the setter side would require also caching the
collation id on FieldType so the row-decode paths can avoid the
per-cell strings.ToLower + map lookup. That is out of scope here and
can be added together with the migrated callers when the FieldType
cache lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four small follow-ups to the pingcap#67977 review (all comment / test
nits, no behavior change):

- Datum struct doc: explicitly map the "frac" / "flen" MySQL terms in
  the layout description to the struct fields named `decimal` and
  `length`, so readers don't have to guess the alias.
- CollationID(): expand the doc to name the actual hot-path consumers
  (ranger/detacher, codec.encodeString, planner/cardinality,
  table/tables.mutationChecker) and the win (uint16 compare vs name
  lookup + string compare) so the "prefer over Collation()" guidance
  is justified, not just asserted.
- Collation(): note that each call resolves the id back to a name via
  the charset map; this is what makes the hot-path advice load-bearing.
- BenchmarkCompareDatumCollation: rewrite the doc — the bench passes
  an explicit `coll` to Compare, which uses the supplied comparer
  directly, so it never goes through Datum.Collation()/CollationID().
  The previous wording falsely claimed it exercised that accessor.
- TestDatumEqualsCollation: add require.NotEqual(0, a.CollationID())
  on the case-variant block so the equality claim can't pass vacuously
  if "UTF8MB4_BIN" / "utf8mb4_bin" ever fail to resolve and both
  collapse to id 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/types/datum_test.go (1)

317-342: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Exercise the minus cases too.

TestComputePlusAndMinus still only checks ComputePlus, so the minus expectations in the table are never validated. A regression in ComputeMinus would pass this test unnoticed.

Proposed fix
 for ith, tt := range tests {
 	got, err := ComputePlus(tt.a, tt.b)
 	require.Equal(t, tt.hasErr, err != nil)
 	v, err := got.Compare(DefaultStmtNoWarningContext, &tt.plus, collate.GetBinaryCollator())
 	require.NoError(t, err)
 	require.Equalf(t, 0, v, "%dth got:%#v, %#v, expect:%#v, %#v", ith, got, got.decPtr, tt.plus, tt.plus.decPtr)
+
+	minusGot, err := ComputeMinus(tt.a, tt.b)
+	require.Equal(t, tt.hasErr, err != nil)
+	v, err = minusGot.Compare(DefaultStmtNoWarningContext, &tt.minus, collate.GetBinaryCollator())
+	require.NoError(t, err)
+	require.Equalf(t, 0, v, "%dth got:%#v, %#v, expect:%#v, %#v", ith, minusGot, minusGot.decPtr, tt.minus, tt.minus.decPtr)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/datum_test.go` around lines 317 - 342, The test
TestComputePlusAndMinus currently only validates ComputePlus; update it to also
call ComputeMinus for each table row: invoke ComputeMinus(tt.a, tt.b), assert
error presence matches tt.hasErr, then compare the resulting Datum with tt.minus
using got.Compare(DefaultStmtNoWarningContext, &tt.minus,
collate.GetBinaryCollator()) and require.Equal/require.NoError as appropriate
(mirror the existing ComputePlus assertions). Ensure you reference the same loop
variable (tt) and test index (ith) and reuse the same comparison/logging pattern
so minus failures are reported consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/types/datum_test.go`:
- Around line 317-342: The test TestComputePlusAndMinus currently only validates
ComputePlus; update it to also call ComputeMinus for each table row: invoke
ComputeMinus(tt.a, tt.b), assert error presence matches tt.hasErr, then compare
the resulting Datum with tt.minus using got.Compare(DefaultStmtNoWarningContext,
&tt.minus, collate.GetBinaryCollator()) and require.Equal/require.NoError as
appropriate (mirror the existing ComputePlus assertions). Ensure you reference
the same loop variable (tt) and test index (ith) and reuse the same
comparison/logging pattern so minus failures are reported consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 64783f13-8150-4776-8038-34483b0664be

📥 Commits

Reviewing files that changed from the base of the PR and between bdd48db and f6e5297.

📒 Files selected for processing (2)
  • pkg/types/datum.go
  • pkg/types/datum_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/types/datum.go

@ti-chi-bot

ti-chi-bot Bot commented Apr 30, 2026

Copy link
Copy Markdown

@mjonss: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test-next-gen f6e5297 link true /test pull-unit-test-next-gen
idc-jenkins-ci-tidb/unit-test f6e5297 link true /test unit-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant