types: replace Datum's x any with decPtr *MyDecimal (56B -> 48B)#68011
types: replace Datum's x any with decPtr *MyDecimal (56B -> 48B)#68011mjonss wants to merge 22 commits into
Conversation
Turn on b.ReportAllocs() across Datum-related benchmarks so benchstat can diff B/op and allocs/op against upcoming struct-size changes. Add three new targeted benches in pkg/types: - BenchmarkDatumCopy: tightest signal on Datum struct size (heterogeneous slice copied per iteration). - BenchmarkDatumSetMysqlTime: locks in the current 1 alloc / 8 B per call caused by boxing Time through d.x; will drop to 0 allocs once Time packs into d.i. - BenchmarkCompareDatumCollation: exercises the named-collator compare path, which the existing BenchmarkCompareDatum skips by using the binary collator. Behavior-only change in benchmarks; no production code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the 16-byte `collation string` field on types.Datum with a 2-byte
`collationID uint16` carrying the MySQL collation id, and shrink
`decimal uint16` (frac; MySQL DECIMAL frac <= 30, fsp <= 6) to `uint8`.
The freed padding absorbs the new collationID so the first struct word
stays 8 bytes. Net: Datum is now 56 bytes instead of 72 (-22%), every
`[]Datum` allocation is correspondingly smaller, and Datum.Equals now
compares collations with a uint16 compare instead of a string compare.
API compat:
- Collation() string and SetCollation(string) are preserved as
facades that resolve via the charset package. External callers keep
working unchanged.
- New CollationID() uint16 / SetCollationID(uint16) for hot paths.
- MarshalJSON/UnmarshalJSON keep the on-wire "collation" field as a
name string, so older JSON blobs round-trip.
- collate.GetCollatorByID(0) now falls through silently (matching
GetCollator("")), since id 0 is the uint16 counterpart of the
legacy empty-string "no collation" sentinel.
Hot-path callers migrated from GetCollator(d.Collation()) to
GetCollatorByID(int(d.CollationID())) in:
codec.encodeString, ranger/detacher, planner/cardinality,
table/tables/mutation_checker, executor/foreign_key,
plus the internal Datum compare/key paths.
Datum.MemUsage drops the len(collation) term — the collation is now
inline (counted by EmptyDatumSize); TestRangeMemUsage is updated to
match. GetDatumMemSize in aggfuncs/func_group_concat follows suit.
Measured on linux/amd64, Intel i7-13700:
- unsafe.Sizeof(types.Datum) = 56 B (was 72 B)
- BenchmarkDecodeWithSize (codec): 8.0 KiB -> 6.0 KiB B/op (-25%),
2435 -> 1862 ns/op (-24%)
- BenchmarkDecodeWithOutSize (codec): 20.88 KiB -> 15.83 KiB (-24%)
- BenchmarkCompareDatum (types): 34.06 -> 32.12 ns/op (-5.7%)
- BenchmarkEncodeValue (tablecodec): 135 -> 122 ns/op (-10%)
- No allocation regressions across the bench set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`types.Time` is `struct{CoreTime}` where `CoreTime` is a `uint64` —
exactly 8 bytes, the same size as `Datum.i`. Storing it through
`Datum.x any` boxes the value into an interface, and because Go stores
non-pointer interface payloads on the heap this allocates 8 bytes per
`SetMysqlTime` call. `SetMysqlDuration` already avoids the box by
packing into `d.i`; do the same for Time.
This also removes the `KindMysqlTime` branch from `Datum.Copy`: once
Time lives in `d.i`, the bitwise `*dst = *d` already produces an
independent value, so there is no need to re-set Time on the
destination (which itself was an allocation).
Benchstat before/after, linux/amd64, Intel i7-13700, -count=5:
BenchmarkDatumSetMysqlTime 7.215 ns/op -> 0.102 ns/op (-98.6%)
8 B/op -> 0 B/op (-100%)
1 allocs/op -> 0 allocs/op (-100%)
BenchmarkDatumCopy 46.14 ns/op -> 36.88 ns/op (-20%)
64 B/op -> 53 B/op (-17%)
3 allocs/op -> 2 allocs/op (-33%)
No struct-size change; Datum stays 72 bytes. The size reduction is a
separate change (pingcap#67977).
`TestMarshalDatum` previously asserted round-trip via `d.x.(Time)`;
updated to use `GetMysqlTime().Compare` since the round-trip value
no longer traverses `d.x`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The inl_join range-fallback thresholds (2900/2300/700) were calibrated to Datum's old 72-byte size. After shrinking Datum to 56 bytes, ranges fit comfortably under the old limits and no longer exercise the progressive fallback. Lower the thresholds to 2300/1500/500 so the same three fallback levels (drop trailing col, drop two cols, INL_JOIN inapplicable) are triggered again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shrinking Datum.decimal from uint16 to uint8 in 2a3c81e dropped the int8 sign-extending cast in GetMysqlDuration, so Fsp=-1 (UnspecifiedFsp) came back as 255 and crashed Duration.formatFrac with a slice-bounds panic (TestDateArithFuncs). Restore the int8 round-trip so the -1 sentinel survives the uint storage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After shrinking Datum.collation from a byte string to a uint16 id, Equals compares the canonical MySQL collation id instead of the raw name. That makes "UTF8MB4_BIN" and "utf8mb4_bin" compare equal where they previously compared unequal. It aligns with MySQL's case-insensitive collation names but was not previously test-pinned, so a future refactor could silently flip it back. Add TestDatumEqualsCollation covering same-canonical/different-casing, distinct-canonical, and the unknown-name-collapses-to-0 edge case. Also call out the lossy name -> id storage in SetCollation's doc so future readers understand the trade-off without hunting the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Packing `Time` into `Datum.i` turned `GetMysqlTime` from a panic-on-mismatch `d.x.(Time)` into an unchecked reinterpret of `d.i` as CoreTime bits. Callers in `appendDatumForChecksum` relied on the panic (caught by defer-recover) to convert a kind mismatch into an error; without the panic `TestColumnEncode/mismatch/*` for timestamp/datetime/date/newdate regressed. Add an `intest.Assert(d.k == KindMysqlTime)` to `GetMysqlTime` so tests catch any silent misuse, and add an explicit kind check in `appendDatumForChecksum` before the `GetMysqlTime` call so the "invalid type for checksum" error path is restored for release builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups to packing Time into Datum.i:
- Add a compile-time guard that unsafe.Sizeof(Time{}) == 8, so any
future field added to Time fails the build rather than silently
losing data through GetMysqlTime/SetMysqlTime.
- Drop the sizeOfMysqlTime charge from EstimatedMemUsage for
KindMysqlTime: Time now lives in d.i, which is already counted in
sizeOfEmptyDatum. The old addition double-counted 8 bytes per
Time-valued Datum. Extend TestEstimatedMemUsage with a Time datum.
- In MarshalJSON, zero jd.I when marshaling KindMysqlTime so the
packed coreTime isn't serialized twice (once via jd.I, once via
jd.Time). UnmarshalJSON re-derives d.i from jd.Time, so round-trip
is unchanged.
After shrinking types.Datum from 72 to 56 bytes, Ranges.MemUsage() returns 16 fewer bytes per datum. The hardcoded thresholds in TestPlanCacheForIndexRangeFallback (1330) and TestPlanCacheForIndexJoinRangeFallback (1260) now sit above both the short- and long-value scenarios, so the long case no longer triggers fallback. Measured new usage in-process and lowered thresholds to 1120 and 1014 respectively so each test exercises the same accept/fallback boundary as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Turn on b.ReportAllocs() across Datum-related benchmarks so benchstat can diff B/op and allocs/op against upcoming struct-size changes. Add three new targeted benches in pkg/types: - BenchmarkDatumCopy: tightest signal on Datum struct size (heterogeneous slice copied per iteration). - BenchmarkDatumSetMysqlTime: locks in the current 1 alloc / 8 B per call caused by boxing Time through d.x; will drop to 0 allocs once Time packs into d.i. - BenchmarkCompareDatumCollation: exercises the named-collator compare path, which the existing BenchmarkCompareDatum skips by using the binary collator. Behavior-only change in benchmarks; no production code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the 16-byte `collation string` field on types.Datum with a 2-byte
`collationID uint16` carrying the MySQL collation id, and shrink
`decimal uint16` (frac; MySQL DECIMAL frac <= 30, fsp <= 6) to `uint8`.
The freed padding absorbs the new collationID so the first struct word
stays 8 bytes. Net: Datum is now 56 bytes instead of 72 (-22%), every
`[]Datum` allocation is correspondingly smaller, and Datum.Equals now
compares collations with a uint16 compare instead of a string compare.
API compat:
- Collation() string and SetCollation(string) are preserved as
facades that resolve via the charset package. External callers keep
working unchanged.
- New CollationID() uint16 / SetCollationID(uint16) for hot paths.
- MarshalJSON/UnmarshalJSON keep the on-wire "collation" field as a
name string, so older JSON blobs round-trip.
- collate.GetCollatorByID(0) now falls through silently (matching
GetCollator("")), since id 0 is the uint16 counterpart of the
legacy empty-string "no collation" sentinel.
Hot-path callers migrated from GetCollator(d.Collation()) to
GetCollatorByID(int(d.CollationID())) in:
codec.encodeString, ranger/detacher, planner/cardinality,
table/tables/mutation_checker, executor/foreign_key,
plus the internal Datum compare/key paths.
Datum.MemUsage drops the len(collation) term — the collation is now
inline (counted by EmptyDatumSize); TestRangeMemUsage is updated to
match. GetDatumMemSize in aggfuncs/func_group_concat follows suit.
Measured on linux/amd64, Intel i7-13700:
- unsafe.Sizeof(types.Datum) = 56 B (was 72 B)
- BenchmarkDecodeWithSize (codec): 8.0 KiB -> 6.0 KiB B/op (-25%),
2435 -> 1862 ns/op (-24%)
- BenchmarkDecodeWithOutSize (codec): 20.88 KiB -> 15.83 KiB (-24%)
- BenchmarkCompareDatum (types): 34.06 -> 32.12 ns/op (-5.7%)
- BenchmarkEncodeValue (tablecodec): 135 -> 122 ns/op (-10%)
- No allocation regressions across the bench set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`types.Time` is `struct{CoreTime}` where `CoreTime` is a `uint64` —
exactly 8 bytes, the same size as `Datum.i`. Storing it through
`Datum.x any` boxes the value into an interface, and because Go stores
non-pointer interface payloads on the heap this allocates 8 bytes per
`SetMysqlTime` call. `SetMysqlDuration` already avoids the box by
packing into `d.i`; do the same for Time.
This also removes the `KindMysqlTime` branch from `Datum.Copy`: once
Time lives in `d.i`, the bitwise `*dst = *d` already produces an
independent value, so there is no need to re-set Time on the
destination (which itself was an allocation).
Benchstat before/after, linux/amd64, Intel i7-13700, -count=5:
BenchmarkDatumSetMysqlTime 7.215 ns/op -> 0.102 ns/op (-98.6%)
8 B/op -> 0 B/op (-100%)
1 allocs/op -> 0 allocs/op (-100%)
BenchmarkDatumCopy 46.14 ns/op -> 36.88 ns/op (-20%)
64 B/op -> 53 B/op (-17%)
3 allocs/op -> 2 allocs/op (-33%)
No struct-size change; Datum stays 72 bytes. The size reduction is a
separate change (pingcap#67977).
`TestMarshalDatum` previously asserted round-trip via `d.x.(Time)`;
updated to use `GetMysqlTime().Compare` since the round-trip value
no longer traverses `d.x`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Datum's KindInterface (byte value 14) was an escape-hatch kind used to
carry arbitrary Go values that SetValue's type switch did not recognize,
stored via SetInterface in d.x (any). Production code had exactly one
caller of this path: the TABLESAMPLE grammar rule in parser.y, which
wrapped an already-ExprNode value via ast.NewValueExpr. That wrapping
was redundant (ast.TableSample.Expr is ExprNode; the Expression rule's
$4 already is one) and harmful (it forced the escape-hatch store). Fix
the grammar to pass the ExprNode through directly and retire the
escape hatch everywhere else.
Changes:
- pkg/parser/parser.y: TABLESAMPLE's Expression / RepeatableOpt values
are passed through as-is instead of wrapped via ast.NewValueExpr.
Regenerate parser.go via `make parser`.
- pkg/types/datum.go: rename KindInterface -> KindInterfaceDeprecated
(value 14 preserved with a Deprecated: comment); delete SetInterface
/ GetInterface methods; SetValue / SetValueWithDefaultCollation
defaults now panic; GetValue's default panics (with explicit nil
returns for KindNull / KindMinNotNull / KindMaxValue / KindRaw which
previously fell through the old default); Datum.String()'s default
now prints "Unknown type: N" instead of "Unknown".
- pkg/types/etc.go: rename map entry; KindStr returns
"Unknown kind N" on map miss instead of "".
- pkg/types/parser_driver/value_expr.go,
pkg/lightning/backend/kv/base.go,
pkg/parser/test_driver/test_driver.go,
pkg/parser/test_driver/test_driver_datum.go:
mirror the rename / method removal / panic defaults.
- pkg/executor/aggfuncs/func_group_concat.go: drop the
getValMemDelta(d.GetInterface()) term in GetDatumMemSize; the
interface payload no longer contributes a variable-length term.
Test updates:
- pkg/expression/builtin_{string,op,control,compare,time}_test.go:
retire `errors.New(...)` rows used as "weird input to trigger an
error path"; that path is now unreachable via the public API. Drop
unused errors imports that result.
- pkg/expression/builtin_string_vec_test.go: pass int64(ast.TrimX)
instead of ast.TrimX to NewDatum (ast.TrimDirectionType is a named
int type not in SetValue's switch).
- pkg/expression/builtin_other_test.go: types.MakeDatums(tc.args) ->
...(tc.args...) (missing spread).
- pkg/expression/builtin_string_test.go: TestBin "" row retired (it
was masked by escape-hatch NewDatum(Datum) wrapping); TestFormat
error-Datum wrapping replaced with direct err-message checks.
- pkg/expression/evaluator_test.go: drop the KindInterface case from
the test helper's kind->FieldType mapping.
- pkg/lightning/backend/kv/sql2kv_test.go: retire the sub-test that
constructed an invalid Datum via SetInterface.
- pkg/types/{datum,convert}_test.go: switch error-path expectations to
require.Panics where previously they relied on the escape hatch to
surface downstream errors.
- pkg/executor/aggfuncs/aggfunc_test.go: add resultToDatum helper so
tests can pass either a primitive or an already-built types.Datum
to buildAggTester* (the latter previously round-tripped through the
escape hatch via NewDatum(someDatum)).
Behavior:
- Verified empirically that `TABLESAMPLE (33.3 + 44.4)` parses
correctly through TiDB's parser_driver with no panic.
- unsafe.Sizeof(types.Datum) unchanged at 72 B; this PR is purely
about removing the escape hatch. Option C (replace `x any` with a
typed `*MyDecimal`) is unblocked by this change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d SA1019 `// Deprecated:` is a magic marker that `staticcheck/SA1019` treats as an error on every reference. But a few legitimate sites still need to enumerate `KindInterfaceDeprecated` (name tables in pkg/types/etc.go and pkg/lightning/backend/kv/base.go; the "not implemented" restore groups in pkg/types/parser_driver/value_expr.go and pkg/parser/test_driver/test_driver.go). Those are not bugs -- they are the handful of places where exhaustiveness requires naming the kind. Rewrite the comment without the `Deprecated:` marker while keeping the "do not use in new code" guidance. Linter passes cleanly now; the message to readers is unchanged. Also drop two incidental trailing blank lines the formatter flagged in pkg/expression/builtin_op_test.go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With KindInterfaceDeprecated removed (prev commit) and Time packed into d.i (earlier commit), the heterogeneous `x any` slot in Datum now only ever carries a *MyDecimal. Replace it with a typed `decPtr *MyDecimal` pointer. That drops the interface header (16B -> 8B), brings unsafe.Sizeof(types.Datum) from 56B to 48B, and makes the decimal path one less indirection. Updates: - Datum struct: `x any` -> `decPtr *MyDecimal`; comment block updated. - SetNull / SetMinNotNull clear decPtr (was d.x = nil). - GetMysqlDecimal / SetMysqlDecimal use decPtr directly (no type assertion). - GetMysqlDuration restores `int(int8(d.decimal))` sign extension when unpacking Fsp. d.decimal is uint8, but callers encode Fsp = -1 as the "unspecified" sentinel (see types.UnspecifiedFsp); that round-trips as 255 in the narrow field and needs the int8 cast to come back as -1. This was already required once Option A narrowed `decimal` to uint8; it surfaced here because TestMemJsonObjectagg formats a Duration with Fsp = -1 and the old `int(d.decimal)` cast produced a 255-length slice index inside Duration.formatFrac, crashing the test. - MarshalJSON's "unexpected x" guard now checks decPtr. - TestMarshalDatum and TestComputePlus debug output adapted to the new field name / shape. Measurements (linux/amd64, Intel i7-13700): - unsafe.Sizeof(types.Datum): 56 -> 48 bytes (-14% on top of A's 56B). - BenchmarkDecodeWithSize (codec, make([]Datum, 100)): 6144 -> 4864 B/op, -20.8%. Total journey from master: 8192 -> 4864 = -40.6% B/op. - BenchmarkDecodeWithOutSize (codec): 15.83 KiB -> 15.33 KiB, -3.2%. Requires Option A (pingcap#67977), Option B (pingcap#67978), and KindInterface removal (pingcap#68007) to have landed for the layout invariant to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
📝 WalkthroughWalkthroughRefactors Datum to remove interface-backed storage (deprecating KindInterface), add inline fields and numeric CollationID, migrate callers to use CollationID/GetCollatorByID, update parser/test-driver and many tests, simplify some memory estimates, and add benchmark allocation reporting. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
|
Hi @mjonss. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #68011 +/- ##
================================================
- Coverage 77.7952% 77.5454% -0.2498%
================================================
Files 1984 1986 +2
Lines 549728 559638 +9910
================================================
+ Hits 427662 433974 +6312
- Misses 121146 124477 +3331
- Partials 920 1187 +267
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/types/datum.go (1)
657-704:⚠️ Potential issue | 🟠 MajorRestore explicit
Datumpassthrough inSetValue*.Both switches now panic on a
Datuminput, even though this file still exposes helpers likeNewDatum(existingDatum)/MakeDatums(...)that legitimately forward prebuilt datums through these paths. That turns a copy-through operation into a runtime panic.Suggested fix
func (d *Datum) SetValueWithDefaultCollation(val any) { switch x := val.(type) { + case Datum: + *d = x + return case nil: d.SetNull() case bool: @@ default: panic(fmt.Sprintf("types.Datum.SetValue: unsupported Go type %T", x)) } } func (d *Datum) SetValue(val any, tp *types.FieldType) { switch x := val.(type) { + case Datum: + *d = x + return case nil: d.SetNull() case bool: @@ default: panic(fmt.Sprintf("types.Datum.SetValue: unsupported Go type %T", x)) } }Also applies to: 707-755
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/datum.go` around lines 657 - 704, The SetValueWithDefaultCollation switch (and the similar SetValue* switches around the 707-755 range) currently panics when given a Datum, breaking legitimate passthroughs from helpers like NewDatum/MakeDatums; add explicit cases to accept both Datum and *Datum and copy the incoming Datum into the receiver (e.g., call the existing Datum-copying helper or implement a SetDatum/assign logic) so prebuilt datums are forwarded instead of causing a panic—look for the switch in SetValueWithDefaultCollation and the sibling SetValue* functions and add cases for Datum and *Datum that perform a proper copy into d.
🧹 Nitpick comments (1)
pkg/lightning/backend/kv/sql2kv_test.go (1)
61-66: Drop the no-op_ = minNotNull.
minNotNullis already used on line 52 in theRowArrayMarshaller{...}literal, so the_ = minNotNullassignment is dead code left over from when the retired sub-test owned the reference. The explanatory comment can stay on its own.♻️ Proposed cleanup
// The previous sub-test used types.Datum.SetInterface(1) to construct // a KindInterfaceDeprecated Datum and verify the marshaller surfaces // a ToString failure. With SetInterface removed there is no public // API path that produces such a Datum, so the scenario is unreachable. - _ = minNotNull🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/lightning/backend/kv/sql2kv_test.go` around lines 61 - 66, Remove the dead no-op assignment `_ = minNotNull` in the test; `minNotNull` is already referenced in the RowArrayMarshaller literal (RowArrayMarshaller{...}) so the standalone assignment can be deleted and the explanatory comment may remain as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/executor/aggfuncs/func_group_concat.go`:
- Around line 644-648: GetDatumMemSize undercounts memory for decimals because
unsafe.Sizeof(*d) only sizes the pointer field and not the heap allocation
behind decPtr; update GetDatumMemSize to detect Datum kinds equal to
KindMysqlDecimal (via types.Datum.Kind or equivalent) and add the heap payload
size for the decimal (accounting for the decPtr-backed structure/slice memory —
e.g., size of the decimal struct plus any digit/slice backing length) to the
returned int64 so decimal allocations are included when computing spill/oom
decisions for GetDatumMemSize.
In `@pkg/expression/builtin_string_test.go`:
- Around line 2203-2216: The current tests for evalBuiltinFunc (called via
fc.getFunction and evalBuiltinFunc using formatTests2 and formatTests3 inputs)
use "if err != nil { require.Contains(...) }" which hides regressions; change
those to assert the expected successful outcome instead—use require.NoError(t,
err) (or require.Nil(t, err)) immediately after calling evalBuiltinFunc for f2
and f3, then keep the testutil.DatumEqual assertions for r2 and r3; this ensures
failures surface instead of being silently accepted as "not implemented"/"not
support for the specific locale".
In `@pkg/lightning/backend/kv/base.go`:
- Around line 61-79: The code indexes kindStr[kind] unsafely which can panic for
newer kinds (e.g., types.KindVectorFloat32); update the lookup that logs kinds
to be bounds-safe by checking the Kind value before indexing (e.g., if int(kind)
>= 0 && int(kind) < len(kindStr) use kindStr[kind] else use a safe fallback like
fmt.Sprintf("kind(%d)", kind) or a map-based lookup), and apply the same pattern
wherever kindStr is accessed (reference the kindStr symbol and the logging site
that currently does kindStr[kind]).
In `@pkg/parser/parser.y`:
- Around line 9855-9859: The existing comment references Datum's removed x any
slot; update it to explain that wrapping already-produced ast.ExprNode values
(from Expression ($4) and RepeatableOpt ($7)) with ast.NewValueExpr will
undesirably box those nodes as a literal value rather than using them as
expression nodes. Replace the old storage-detail explanation with a concise note
that ast.NewValueExpr converts an ast.ExprNode into a literal/value wrapper
(losing its role as an expression subtree), so the parser should use the
ExprNode values directly instead of calling ast.NewValueExpr.
In `@pkg/parser/test_driver/test_driver_datum.go`:
- Around line 146-148: The change removes GetInterface/SetInterface and makes
SetValue panic on the default branch and also omits MySQL-specific kinds
(Duration, Enum, Set, Time, JSON) from GetValue/SetValue; restore source
compatibility by reintroducing GetInterface and SetInterface as simple wrappers
around GetValue/SetValue (or by making SetValue fall back to the previous silent
stash behavior using the KindInterfaceDeprecated placeholder), and extend
GetValue and SetValue in test_driver (the functions named GetValue and SetValue
in pkg/parser/test_driver_datum.go) to handle Duration, Enum, Set, Time and JSON
the same way as production types.Datum does; finally add a parser module release
note/changelog entry documenting the breaking change if you intend to keep the
removal.
In `@pkg/types/datum.go`:
- Around line 609-612: The branch that treats KindRaw like null is incorrect:
update the switch in GetValue (or the method returning the inline value) so that
KindRaw is not grouped with KindNull/KindMinNotNull/KindMaxValue; instead return
the raw payload d.b when kind == KindRaw. Locate the switch that currently lists
"case KindNull, KindMinNotNull, KindMaxValue, KindRaw" and remove KindRaw from
that case, adding a separate branch (or fallthrough) that returns d.b for
KindRaw so Datum.String() and callers see the raw bytes.
- Around line 347-357: The bug: decPtr remains non-nil after switching a Datum
to most non-decimal kinds (only SetNull and SetMinNotNull clear it), causing
MarshalJSON to misclassify non-decimal datums as unsupported; fix by clearing
d.decPtr whenever you change the kind away from KindMysqlDecimal. Either add
d.decPtr = nil to every non-decimal setter (e.g., SetString, SetInt, SetTime,
SetBytes, SetJSON, SetMysqlBit, SetMysqlEnum/SetMysqlSet, etc.) or centralize
the logic in the single place that sets the kind (the common kind-setter
function used throughout the file) so it sets decPtr = nil unless k ==
KindMysqlDecimal; apply the same change for the other occurrence mentioned
(around lines 2294-2316).
---
Outside diff comments:
In `@pkg/types/datum.go`:
- Around line 657-704: The SetValueWithDefaultCollation switch (and the similar
SetValue* switches around the 707-755 range) currently panics when given a
Datum, breaking legitimate passthroughs from helpers like NewDatum/MakeDatums;
add explicit cases to accept both Datum and *Datum and copy the incoming Datum
into the receiver (e.g., call the existing Datum-copying helper or implement a
SetDatum/assign logic) so prebuilt datums are forwarded instead of causing a
panic—look for the switch in SetValueWithDefaultCollation and the sibling
SetValue* functions and add cases for Datum and *Datum that perform a proper
copy into d.
---
Nitpick comments:
In `@pkg/lightning/backend/kv/sql2kv_test.go`:
- Around line 61-66: Remove the dead no-op assignment `_ = minNotNull` in the
test; `minNotNull` is already referenced in the RowArrayMarshaller literal
(RowArrayMarshaller{...}) so the standalone assignment can be deleted and the
explanatory comment may remain as-is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fbd0c7ce-d83e-4dfb-a949-0454248856ae
📒 Files selected for processing (32)
pkg/executor/aggfuncs/aggfunc_test.gopkg/executor/aggfuncs/func_group_concat.gopkg/executor/foreign_key.gopkg/expression/builtin_compare_test.gopkg/expression/builtin_control_test.gopkg/expression/builtin_op_test.gopkg/expression/builtin_other_test.gopkg/expression/builtin_string_test.gopkg/expression/builtin_string_vec_test.gopkg/expression/builtin_time_test.gopkg/expression/evaluator_test.gopkg/lightning/backend/kv/base.gopkg/lightning/backend/kv/sql2kv_test.gopkg/parser/parser.gopkg/parser/parser.ypkg/parser/test_driver/test_driver.gopkg/parser/test_driver/test_driver_datum.gopkg/planner/cardinality/row_count_column.gopkg/table/tables/mutation_checker.gopkg/tablecodec/bench_test.gopkg/tablecodec/tablecodec_test.gopkg/types/convert_test.gopkg/types/datum.gopkg/types/datum_test.gopkg/types/etc.gopkg/types/field_type.gopkg/types/parser_driver/value_expr.gopkg/util/codec/bench_test.gopkg/util/codec/codec.gopkg/util/collate/collate.gopkg/util/ranger/detacher.gopkg/util/ranger/types_test.go
💤 Files with no reviewable changes (2)
- pkg/expression/evaluator_test.go
- pkg/expression/builtin_op_test.go
| func GetDatumMemSize(d *types.Datum) int64 { | ||
| var datumMemSize int64 | ||
| datumMemSize += int64(unsafe.Sizeof(*d)) | ||
| datumMemSize += int64(len(d.Collation())) | ||
| datumMemSize += int64(len(d.GetBytes())) | ||
| datumMemSize += getValMemDelta(d.GetInterface()) - DefInterfaceSize | ||
| return datumMemSize |
There was a problem hiding this comment.
Include decimal payload in datum memory accounting.
At Line 646, unsafe.Sizeof(*d) only counts the pointer field, not the heap allocation behind decPtr. For KindMysqlDecimal, this under-reports memory and can distort spill/oom decisions in top-N group concat paths.
💡 Suggested fix
func GetDatumMemSize(d *types.Datum) int64 {
- var datumMemSize int64
- datumMemSize += int64(unsafe.Sizeof(*d))
- datumMemSize += int64(len(d.GetBytes()))
- return datumMemSize
+ datumMemSize := int64(unsafe.Sizeof(*d))
+ datumMemSize += int64(len(d.GetBytes()))
+ if d.Kind() == types.KindMysqlDecimal && d.GetMysqlDecimal() != nil {
+ datumMemSize += int64(types.MyDecimalStructSize)
+ }
+ return datumMemSize
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func GetDatumMemSize(d *types.Datum) int64 { | |
| var datumMemSize int64 | |
| datumMemSize += int64(unsafe.Sizeof(*d)) | |
| datumMemSize += int64(len(d.Collation())) | |
| datumMemSize += int64(len(d.GetBytes())) | |
| datumMemSize += getValMemDelta(d.GetInterface()) - DefInterfaceSize | |
| return datumMemSize | |
| func GetDatumMemSize(d *types.Datum) int64 { | |
| datumMemSize := int64(unsafe.Sizeof(*d)) | |
| datumMemSize += int64(len(d.GetBytes())) | |
| if d.Kind() == types.KindMysqlDecimal && d.GetMysqlDecimal() != nil { | |
| datumMemSize += int64(types.MyDecimalStructSize) | |
| } | |
| return datumMemSize | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/executor/aggfuncs/func_group_concat.go` around lines 644 - 648,
GetDatumMemSize undercounts memory for decimals because unsafe.Sizeof(*d) only
sizes the pointer field and not the heap allocation behind decPtr; update
GetDatumMemSize to detect Datum kinds equal to KindMysqlDecimal (via
types.Datum.Kind or equivalent) and add the heap payload size for the decimal
(accounting for the decPtr-backed structure/slice memory — e.g., size of the
decimal struct plus any digit/slice backing length) to the returned int64 so
decimal allocations are included when computing spill/oom decisions for
GetDatumMemSize.
| r2, err := evalBuiltinFunc(f2, ctx, chunk.Row{}) | ||
| testutil.DatumEqual(t, types.NewDatum(errors.New("not implemented")), types.NewDatum(err)) | ||
| if err != nil { | ||
| require.Contains(t, err.Error(), "not implemented") | ||
| } | ||
| testutil.DatumEqual(t, types.NewDatum(formatTests2.ret), r2) | ||
|
|
||
| f3, err := fc.getFunction(ctx, datumsToConstants(types.MakeDatums(formatTests3.number, formatTests3.precision, formatTests3.locale))) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, f3) | ||
| r3, err := evalBuiltinFunc(f3, ctx, chunk.Row{}) | ||
| testutil.DatumEqual(t, types.NewDatum(errors.New("not support for the specific locale")), types.NewDatum(err)) | ||
| if err != nil { | ||
| require.Contains(t, err.Error(), "not support for the specific locale") | ||
| } | ||
| testutil.DatumEqual(t, types.NewDatum(formatTests3.ret), r3) |
There was a problem hiding this comment.
TestFormat error assertions are now unconditional on the happy path.
if err != nil { require.Contains(...) } permits both success and a "not implemented"/"not support for the specific locale" failure to pass. Given the newly added TestFormatWithLocale below shows zh_CN and de_GE now succeed via the en_US fallback (producing "-12,332" and "12,345.67"), the if err != nil branch here is effectively dead and will silently mask a regression that re-introduces an error. Prefer tightening to the actual expected outcome.
🛡️ Proposed fix
r2, err := evalBuiltinFunc(f2, ctx, chunk.Row{})
- if err != nil {
- require.Contains(t, err.Error(), "not implemented")
- }
+ require.NoError(t, err)
testutil.DatumEqual(t, types.NewDatum(formatTests2.ret), r2)
f3, err := fc.getFunction(ctx, datumsToConstants(types.MakeDatums(formatTests3.number, formatTests3.precision, formatTests3.locale)))
require.NoError(t, err)
require.NotNil(t, f3)
r3, err := evalBuiltinFunc(f3, ctx, chunk.Row{})
- if err != nil {
- require.Contains(t, err.Error(), "not support for the specific locale")
- }
+ require.NoError(t, err)
testutil.DatumEqual(t, types.NewDatum(formatTests3.ret), r3)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/expression/builtin_string_test.go` around lines 2203 - 2216, The current
tests for evalBuiltinFunc (called via fc.getFunction and evalBuiltinFunc using
formatTests2 and formatTests3 inputs) use "if err != nil { require.Contains(...)
}" which hides regressions; change those to assert the expected successful
outcome instead—use require.NoError(t, err) (or require.Nil(t, err)) immediately
after calling evalBuiltinFunc for f2 and f3, then keep the testutil.DatumEqual
assertions for r2 and r3; this ensures failures surface instead of being
silently accepted as "not implemented"/"not support for the specific locale".
| types.KindNull: "null", | ||
| types.KindInt64: "int64", | ||
| types.KindUint64: "uint64", | ||
| types.KindFloat32: "float32", | ||
| types.KindFloat64: "float64", | ||
| types.KindString: "string", | ||
| types.KindBytes: "bytes", | ||
| types.KindBinaryLiteral: "binary", | ||
| types.KindMysqlDecimal: "decimal", | ||
| types.KindMysqlDuration: "duration", | ||
| types.KindMysqlEnum: "enum", | ||
| types.KindMysqlBit: "bit", | ||
| types.KindMysqlSet: "set", | ||
| types.KindMysqlTime: "time", | ||
| types.KindInterfaceDeprecated: "interface(deprecated)", | ||
| types.KindMinNotNull: "min", | ||
| types.KindMaxValue: "max", | ||
| types.KindRaw: "raw", | ||
| types.KindMysqlJSON: "json", |
There was a problem hiding this comment.
Potential panic in logging due to unsafe kind index lookup.
Line 111 indexes kindStr[kind] without bounds checks, but the table omits newer kinds (e.g. types.KindVectorFloat32). That can panic during error logging.
🔧 Suggested fix
var kindStr = [...]string{
types.KindNull: "null",
types.KindInt64: "int64",
types.KindUint64: "uint64",
types.KindFloat32: "float32",
types.KindFloat64: "float64",
types.KindString: "string",
types.KindBytes: "bytes",
types.KindBinaryLiteral: "binary",
types.KindMysqlDecimal: "decimal",
types.KindMysqlDuration: "duration",
types.KindMysqlEnum: "enum",
types.KindMysqlBit: "bit",
types.KindMysqlSet: "set",
types.KindMysqlTime: "time",
types.KindInterfaceDeprecated: "interface(deprecated)",
types.KindMinNotNull: "min",
types.KindMaxValue: "max",
types.KindRaw: "raw",
types.KindMysqlJSON: "json",
+ types.KindVectorFloat32: "vector_float32",
}
@@
if err := encoder.AppendObject(zapcore.ObjectMarshalerFunc(func(enc zapcore.ObjectEncoder) error {
- enc.AddString("kind", kindStr[kind])
+ kindName := "unknown"
+ if int(kind) < len(kindStr) && kindStr[kind] != "" {
+ kindName = kindStr[kind]
+ }
+ enc.AddString("kind", kindName)
enc.AddString("val", redact.Value(str))
return nil
})); err != nil {Also applies to: 111-111
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/lightning/backend/kv/base.go` around lines 61 - 79, The code indexes
kindStr[kind] unsafely which can panic for newer kinds (e.g.,
types.KindVectorFloat32); update the lookup that logs kinds to be bounds-safe by
checking the Kind value before indexing (e.g., if int(kind) >= 0 && int(kind) <
len(kindStr) use kindStr[kind] else use a safe fallback like
fmt.Sprintf("kind(%d)", kind) or a map-based lookup), and apply the same pattern
wherever kindStr is accessed (reference the kindStr symbol and the logging site
that currently does kindStr[kind]).
| // Expression ($4) and RepeatableOpt ($7) are both declared as | ||
| // %type <expr> and already produce ast.ExprNode values. Wrapping | ||
| // them via ast.NewValueExpr would store the whole subtree inside a | ||
| // Datum's escape-hatch "x any" slot (the only production path that | ||
| // does so); use them directly instead. |
There was a problem hiding this comment.
Rewrite the rationale comment to match the post-refactor state.
This comment still refers to Datum’s removed x any slot, so it becomes inaccurate once this PR lands. Please describe the problem as ast.NewValueExpr boxing an ast.ExprNode as a literal rather than referencing the old storage detail.
✏️ Suggested wording
- // Expression ($4) and RepeatableOpt ($7) are both declared as
- // %type <expr> and already produce ast.ExprNode values. Wrapping
- // them via ast.NewValueExpr would store the whole subtree inside a
- // Datum's escape-hatch "x any" slot (the only production path that
- // does so); use them directly instead.
+ // Expression ($4) and RepeatableOpt ($7) are both declared as
+ // %type <expr> and already produce ast.ExprNode values. Wrapping
+ // them with ast.NewValueExpr would incorrectly box the expression
+ // subtree as a literal value, so keep the AST nodes directly.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Expression ($4) and RepeatableOpt ($7) are both declared as | |
| // %type <expr> and already produce ast.ExprNode values. Wrapping | |
| // them via ast.NewValueExpr would store the whole subtree inside a | |
| // Datum's escape-hatch "x any" slot (the only production path that | |
| // does so); use them directly instead. | |
| // Expression ($4) and RepeatableOpt ($7) are both declared as | |
| // %type <expr> and already produce ast.ExprNode values. Wrapping | |
| // them with ast.NewValueExpr would incorrectly box the expression | |
| // subtree as a literal value, so keep the AST nodes directly. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/parser/parser.y` around lines 9855 - 9859, The existing comment
references Datum's removed x any slot; update it to explain that wrapping
already-produced ast.ExprNode values (from Expression ($4) and RepeatableOpt
($7)) with ast.NewValueExpr will undesirably box those nodes as a literal value
rather than using them as expression nodes. Replace the old storage-detail
explanation with a concise note that ast.NewValueExpr converts an ast.ExprNode
into a literal/value wrapper (losing its role as an expression subtree), so the
parser should use the ExprNode values directly instead of calling
ast.NewValueExpr.
| // GetInterface and SetInterface were removed along with the | ||
| // KindInterfaceDeprecated (14) kind. See pkg/types/datum.go for the | ||
| // rationale; callers should use the typed Set/Get methods. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm no remaining internal callers of the removed accessors and
# that nothing outside the types package still writes KindInterfaceDeprecated.
rg -nP '\b(GetInterface|SetInterface)\s*\(' --type=go
rg -nP '\bKindInterfaceDeprecated\b' --type=go
# Also check that every kind in the test_driver constant block has a GetValue branch
rg -nP '\bKind(Mysql(Duration|Enum|Set|Time|JSON)|InterfaceDeprecated)\b' pkg/parser/test_driver/test_driver_datum.goRepository: pingcap/tidb
Length of output: 3168
🏁 Script executed:
# Check the test_driver_datum.go file around the mentioned lines
cat -n pkg/parser/test_driver/test_driver_datum.go | sed -n '140,160p'
cat -n pkg/parser/test_driver/test_driver_datum.go | sed -n '200,245p'Repository: pingcap/tidb
Length of output: 2005
🏁 Script executed:
# Search for GetInterface and SetInterface method definitions in test_driver
rg -A5 'func.*GetInterface|func.*SetInterface' pkg/parser/test_driver/ --type=goRepository: pingcap/tidb
Length of output: 38
🏁 Script executed:
# Check SetValue implementation to see default branch
ast-grep --pattern 'func ($_) SetValue($$$) {
$$$
}'Repository: pingcap/tidb
Length of output: 7953
🏁 Script executed:
# Look for CHANGELOG, release notes, or migration guides
fd -i 'changelog|release|news|migrate' -t f | head -20Repository: pingcap/tidb
Length of output: 129
🏁 Script executed:
# Check if pkg/parser is a separate Go module or part of main module
cat go.mod | head -30
ls -la pkg/parser/go.mod 2>/dev/null || echo "No separate go.mod in pkg/parser"
# Look for any CHANGELOG or release docs in pkg/parser
find pkg/parser -name '*CHANGE*' -o -name '*RELEASE*' -o -name '*README*' 2>/dev/null | head -10Repository: pingcap/tidb
Length of output: 1390
🏁 Script executed:
# Check GetValue implementation in test_driver to see coverage
cat -n pkg/parser/test_driver/test_driver_datum.go | sed -n '160,200p'Repository: pingcap/tidb
Length of output: 1322
🏁 Script executed:
# Search for external callers (outside pkg/parser) that might be affected
rg 'test_driver\.Datum' --type=go | grep -v 'pkg/parser' | head -20Repository: pingcap/tidb
Length of output: 38
🏁 Script executed:
# Check if there's any migration guide or deprecation notice
rg -i 'migration|deprecat|breaking|external' pkg/parser --type=go | head -15Repository: pingcap/tidb
Length of output: 1584
🏁 Script executed:
# Check pkg/parser README to understand external exposure
cat pkg/parser/README.mdRepository: pingcap/tidb
Length of output: 3389
🏁 Script executed:
# Check pkg/parser/go.mod to confirm separate module
cat pkg/parser/go.modRepository: pingcap/tidb
Length of output: 1007
🏁 Script executed:
# Look for GetValue branches handling MySQL types mentioned in constants
rg -B2 -A2 'KindMysql(Duration|Enum|Set|Time|JSON)' pkg/parser/test_driver/test_driver_datum.go | grep -v '^--$'Repository: pingcap/tidb
Length of output: 563
🏁 Script executed:
# Check if there are any version tags or release process for parser module
git tag -l 'parser*' 2>/dev/null | head -10 || echo "No parser-specific tags"Repository: pingcap/tidb
Length of output: 38
🏁 Script executed:
# Check types.Datum.GetValue implementation to compare with test_driver
cat -n pkg/types/datum.go | sed -n '517,580p'Repository: pingcap/tidb
Length of output: 1984
🏁 Script executed:
# Search for any root-level CHANGELOG or release notes
find . -maxdepth 2 -type f \( -iname '*changelog*' -o -iname '*release*' -o -iname '*version*' \) | grep -v '.git' | head -20Repository: pingcap/tidb
Length of output: 112
🏁 Script executed:
# Verify if test_driver.Datum is documented as canonical impl
rg -B3 -A3 'canonical.*Datum.*implementation|test.driver.*canonical' pkg/parser --type=goRepository: pingcap/tidb
Length of output: 38
🏁 Script executed:
# Get full types.Datum.GetValue implementation
cat -n pkg/types/datum.go | sed -n '578,650p'Repository: pingcap/tidb
Length of output: 2802
🏁 Script executed:
# Check if there's any module release process or contributor notes mentioning external changes
find pkg/parser -name '*.md' | xargs cat 2>/dev/null | grep -i 'breaking\|external\|downstream\|release\|changelog'Repository: pingcap/tidb
Length of output: 580
🏁 Script executed:
# Verify test_driver is indeed used as canonical by checking imports/usage
rg 'pkg/parser.*test_driver' pkg/types --type=go | head -10Repository: pingcap/tidb
Length of output: 38
🏁 Script executed:
# Check if test_driver methods for MySQL types exist
rg 'GetMysqlDuration|GetMysqlEnum|GetMysqlSet|GetMysqlTime|GetMysqlJSON' pkg/parser/test_driver --type=goRepository: pingcap/tidb
Length of output: 38
🏁 Script executed:
# Check if test_driver is explicitly intended as limited or reference implementation
cat -n pkg/parser/test_driver/test_driver.go | head -50Repository: pingcap/tidb
Length of output: 1698
🏁 Script executed:
# Look for any docs about parser module stability or external contracts
find pkg/parser/docs -type f 2>/dev/null | xargs cat 2>/dev/null || echo "No docs dir found"Repository: pingcap/tidb
Length of output: 5934
🏁 Script executed:
# Check documentation/comments about test_driver's intended scope
cat -n pkg/parser/test_driver/test_driver_datum.go | sed -n '1,50p'Repository: pingcap/tidb
Length of output: 1889
🏁 Script executed:
# Verify that production types uses parser/test_driver import anywhere
rg 'import.*test_driver|from.*test_driver' pkg/types --type=go | head -10Repository: pingcap/tidb
Length of output: 38
Heads up: this is an externally-visible behavior change for parser consumers.
pkg/parser is exposed as a separate Go module. Removing GetInterface/SetInterface and making SetValue's default branch panic (instead of silently stashing the value under KindInterfaceDeprecated) will break source compatibility for any external caller that referenced those methods or relied on the silent fallback behavior.
However, note that while test_driver is provided for testing, the quickstart documentation recommends production users use pkg/types/parser_driver instead. Still, the removal impacts the test_driver module itself, which is publicly distributed.
Additional issue: GetValue and SetValue in test_driver lack support for MySQL-specific types (Duration, Enum, Set, Time, JSON) that exist in the type constant block—these would panic at runtime if encountered, unlike the production types.Datum which handles them. Verify this asymmetry is intentional.
Consider adding a release note / changelog entry in the parser module so downstream users are aware of these breaking changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/parser/test_driver/test_driver_datum.go` around lines 146 - 148, The
change removes GetInterface/SetInterface and makes SetValue panic on the default
branch and also omits MySQL-specific kinds (Duration, Enum, Set, Time, JSON)
from GetValue/SetValue; restore source compatibility by reintroducing
GetInterface and SetInterface as simple wrappers around GetValue/SetValue (or by
making SetValue fall back to the previous silent stash behavior using the
KindInterfaceDeprecated placeholder), and extend GetValue and SetValue in
test_driver (the functions named GetValue and SetValue in
pkg/parser/test_driver_datum.go) to handle Duration, Enum, Set, Time and JSON
the same way as production types.Datum does; finally add a parser module release
note/changelog entry documenting the breaking change if you intend to keep the
removal.
| // SetNull sets datum to nil. | ||
| func (d *Datum) SetNull() { | ||
| d.k = KindNull | ||
| d.x = nil | ||
| d.decPtr = nil | ||
| } | ||
|
|
||
| // SetMinNotNull sets datum to minNotNull value. | ||
| func (d *Datum) SetMinNotNull() { | ||
| d.k = KindMinNotNull | ||
| d.x = nil | ||
| d.decPtr = nil | ||
| } |
There was a problem hiding this comment.
decPtr is still live after most kind switches.
This refactor documents decPtr as “non-nil only when k == KindMysqlDecimal”, but only SetNull and SetMinNotNull actually clear it. Reusing a datum from decimal to string/int/time leaves the stale pointer behind, and the new MarshalJSON guard will then reject an otherwise valid non-decimal datum as unsupported type.
Please clear decPtr in every non-decimal setter, or centralize kind-reset logic so the invariant is enforced in one place.
Also applies to: 2294-2316
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/types/datum.go` around lines 347 - 357, The bug: decPtr remains non-nil
after switching a Datum to most non-decimal kinds (only SetNull and
SetMinNotNull clear it), causing MarshalJSON to misclassify non-decimal datums
as unsupported; fix by clearing d.decPtr whenever you change the kind away from
KindMysqlDecimal. Either add d.decPtr = nil to every non-decimal setter (e.g.,
SetString, SetInt, SetTime, SetBytes, SetJSON, SetMysqlBit,
SetMysqlEnum/SetMysqlSet, etc.) or centralize the logic in the single place that
sets the kind (the common kind-setter function used throughout the file) so it
sets decPtr = nil unless k == KindMysqlDecimal; apply the same change for the
other occurrence mentioned (around lines 2294-2316).
| case KindNull, KindMinNotNull, KindMaxValue, KindRaw: | ||
| // These kinds carry no inline value; nil matches the prior | ||
| // behavior where d.x was unset and GetInterface returned nil. | ||
| return nil |
There was a problem hiding this comment.
KindRaw should still return its raw bytes from GetValue().
KindRaw carries its payload in d.b, but this branch now treats it like a null-like sentinel and returns nil. That changes callers from seeing the raw value to seeing no value at all, and even Datum.String() will degrade to KindRaw <nil>.
Suggested fix
- case KindNull, KindMinNotNull, KindMaxValue, KindRaw:
+ case KindNull, KindMinNotNull, KindMaxValue:
// These kinds carry no inline value; nil matches the prior
// behavior where d.x was unset and GetInterface returned nil.
return nil
+ case KindRaw:
+ return d.GetRaw()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case KindNull, KindMinNotNull, KindMaxValue, KindRaw: | |
| // These kinds carry no inline value; nil matches the prior | |
| // behavior where d.x was unset and GetInterface returned nil. | |
| return nil | |
| case KindNull, KindMinNotNull, KindMaxValue: | |
| // These kinds carry no inline value; nil matches the prior | |
| // behavior where d.x was unset and GetInterface returned nil. | |
| return nil | |
| case KindRaw: | |
| return d.GetRaw() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/types/datum.go` around lines 609 - 612, The branch that treats KindRaw
like null is incorrect: update the switch in GetValue (or the method returning
the inline value) so that KindRaw is not grouped with
KindNull/KindMinNotNull/KindMaxValue; instead return the raw payload d.b when
kind == KindRaw. Locate the switch that currently lists "case KindNull,
KindMinNotNull, KindMaxValue, KindRaw" and remove KindRaw from that case, adding
a separate branch (or fallthrough) that returns d.b for KindRaw so
Datum.String() and callers see the raw bytes.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/types/datum.go (1)
93-101:⚠️ Potential issue | 🟠 Major
decPtrinvariant is still only partially enforced.The struct comment on Line 100 documents
decPtras “non-nil only whenk == KindMysqlDecimal”, but onlySetNull,SetMinNotNull, andSetMysqlDecimalmanage the field. ADatumthat previously held a decimal will keep the stale pointer after any other setter (SetInt64,SetString,SetBytes,SetMysqlTime,SetMysqlDuration,SetMysqlEnum,SetMysqlSet,SetMysqlJSON,SetMysqlBit,SetBinaryLiteral,SetVectorFloat32,SetRaw,SetAutoID, …), which then causesMarshalJSON’s default-branch guard at Line 2322 to reject an otherwise valid non-decimal datum asunsupported type: <k>.Since Datums are frequently reused (e.g., via
*dst = *dinCopy, or direct field resets in hot paths), please either cleard.decPtrin every non-decimal setter or centralize kind-change logic so the invariant is enforced in one place. Alternatively, relax theMarshalJSONguard to only checkdecPtrwhend.k == KindMysqlDecimal.Also applies to: 347-357, 2318-2324
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/datum.go` around lines 93 - 101, The Datum.decPtr invariant is not enforced: decPtr should be non-nil only when k == KindMysqlDecimal but only SetNull, SetMinNotNull and SetMysqlDecimal touch decPtr, so setters like SetInt64, SetString, SetBytes, SetMysqlTime, SetMysqlDuration, SetMysqlEnum, SetMysqlSet, SetMysqlJSON, SetMysqlBit, SetBinaryLiteral, SetVectorFloat32, SetRaw, SetAutoID and Copy can leave a stale decPtr leading MarshalJSON to misclassify non-decimal datums; fix by clearing d.decPtr (set to nil) whenever changing k to a non-decimal kind (i.e., inside each non-decimal setter or, preferably, centralize kind changes in a single helper like setKind(newKind) that clears decPtr unless newKind == KindMysqlDecimal) and/or adjust MarshalJSON to only treat decPtr as an error when d.k == KindMysqlDecimal so the invariant is consistently maintained.
🧹 Nitpick comments (1)
pkg/types/datum_test.go (1)
698-708: Optional: make the alloc-free claim an actual assertion.The comment says
BenchmarkDatumSetMysqlTime"asserts that SetMysqlTime is alloc-free", butb.ReportAllocs()only reports — it does not fail the benchmark if an allocation ever regresses. If the zero-alloc property is meant to be load-bearing for this refactor, consider converting this into a smalltesting.AllocsPerRun-based test (or adding one alongside) so a future regression blocks CI rather than being silently reported.func TestDatumSetMysqlTimeNoAlloc(t *testing.T) { tm := NewTime(FromGoTime(time.Date(2018, 3, 8, 16, 1, 0, 315313000, time.UTC)), mysql.TypeTimestamp, 6) var d Datum allocs := testing.AllocsPerRun(100, func() { d.SetMysqlTime(tm) }) require.Zero(t, allocs) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/datum_test.go` around lines 698 - 708, Add a real assertion that SetMysqlTime is alloc-free by adding a unit test that uses testing.AllocsPerRun to measure allocations for Datum.SetMysqlTime (use the same time setup as BenchmarkDatumSetMysqlTime: NewTime(FromGoTime(...), mysql.TypeTimestamp, 6) and a zero-value Datum), then assert the measured allocs is zero (e.g. require.Zero or t.Fatalf if non-zero); keep BenchmarkDatumSetMysqlTime for reporting but make TestDatumSetMysqlTimeNoAlloc enforce the zero-allocation contract so regressions fail CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/types/datum.go`:
- Around line 93-101: The Datum.decPtr invariant is not enforced: decPtr should
be non-nil only when k == KindMysqlDecimal but only SetNull, SetMinNotNull and
SetMysqlDecimal touch decPtr, so setters like SetInt64, SetString, SetBytes,
SetMysqlTime, SetMysqlDuration, SetMysqlEnum, SetMysqlSet, SetMysqlJSON,
SetMysqlBit, SetBinaryLiteral, SetVectorFloat32, SetRaw, SetAutoID and Copy can
leave a stale decPtr leading MarshalJSON to misclassify non-decimal datums; fix
by clearing d.decPtr (set to nil) whenever changing k to a non-decimal kind
(i.e., inside each non-decimal setter or, preferably, centralize kind changes in
a single helper like setKind(newKind) that clears decPtr unless newKind ==
KindMysqlDecimal) and/or adjust MarshalJSON to only treat decPtr as an error
when d.k == KindMysqlDecimal so the invariant is consistently maintained.
---
Nitpick comments:
In `@pkg/types/datum_test.go`:
- Around line 698-708: Add a real assertion that SetMysqlTime is alloc-free by
adding a unit test that uses testing.AllocsPerRun to measure allocations for
Datum.SetMysqlTime (use the same time setup as BenchmarkDatumSetMysqlTime:
NewTime(FromGoTime(...), mysql.TypeTimestamp, 6) and a zero-value Datum), then
assert the measured allocs is zero (e.g. require.Zero or t.Fatalf if non-zero);
keep BenchmarkDatumSetMysqlTime for reporting but make
TestDatumSetMysqlTimeNoAlloc enforce the zero-allocation contract so regressions
fail CI.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f5d66fa7-39c4-43b4-9f3d-7de2e3fb04b0
📒 Files selected for processing (7)
pkg/executor/aggfuncs/func_group_concat.gopkg/planner/core/integration_test.gopkg/types/datum.gopkg/types/datum_test.gopkg/util/rowcodec/common.gotests/integrationtest/r/planner/core/casetest/index/index.resulttests/integrationtest/t/planner/core/casetest/index/index.test
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/executor/aggfuncs/func_group_concat.go
…atum The decPtr change (pingcap#68011) shrank types.Datum from 56 to 48 bytes, dropping per-range memory by 16 B per Datum. Two fallback-gating tests stopped triggering as a result: - planner/core/casetest/index/index.test: TestIndexJoinRangeFallback's first level was 2300 (calibrated for 56 B by c229ffb); the t1 IndexJoin full plan now sums to 2144 B, so 2300 sits above it and the first fallback level (drop trailing eq column) is skipped. Lower to 1900, which is still above the drop-d level (1680 B) but below the full plan, so the same three fallback steps fire. - planner/core/integration.test: TestIndexRangeFallback's =1000 step no longer pushed t4's 3 composite (a,b) ranges (912 B) over the budget. Lower all four of that block's =1000 settings to =800 so the fallback-vs-full pairing holds for every query under it. The 1500 / 500 / 300 thresholds in casetest/index already straddle the new memusage levels and don't need to move. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Shrinking Datum.decimal from uint16 to uint8 (2a3c81e) made MarshalJSON emit the UnspecifiedFsp sentinel as "decimal":255 instead of the legacy "decimal":65535. The sentinel is normally read back as signed -1; with the old uint16 reader, 255 is a positive frac, so the PR-authored "JSON wire format stays stable" guarantee on jsonDatum no longer held across mixed versions. Map ^uint8(0) → ^uint16(0) at marshal time. UnmarshalJSON's existing uint8(jd.Decimal) truncation already collapses 65535 → 255, so the new reader is unchanged and the round-trip is fixed in both directions. Add TestMarshalDatumUnspecifiedFsp asserting both the wire payload ("decimal":65535) and the in-memory sentinel survival (Fsp == -1 after Marshal/Unmarshal). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The setter was added alongside CollationID() during the collation-id shrink, but no caller materialized — every SetCollation site still passes a string sourced from FieldType.GetCollate(), so the integer setter never gets exercised. Drop it; the getter stays because hot-path callers (ranger, codec, cardinality, mutation_checker) use it for uint16 compares. A real perf win for the setter side would require also caching the collation id on FieldType so the row-decode paths can avoid the per-cell strings.ToLower + map lookup. That is out of scope here and can be added together with the migrated callers when the FieldType cache lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four small follow-ups to the pingcap#67977 review (all comment / test nits, no behavior change): - Datum struct doc: explicitly map the "frac" / "flen" MySQL terms in the layout description to the struct fields named `decimal` and `length`, so readers don't have to guess the alias. - CollationID(): expand the doc to name the actual hot-path consumers (ranger/detacher, codec.encodeString, planner/cardinality, table/tables.mutationChecker) and the win (uint16 compare vs name lookup + string compare) so the "prefer over Collation()" guidance is justified, not just asserted. - Collation(): note that each call resolves the id back to a name via the charset map; this is what makes the hot-path advice load-bearing. - BenchmarkCompareDatumCollation: rewrite the doc — the bench passes an explicit `coll` to Compare, which uses the supplied comparer directly, so it never goes through Datum.Collation()/CollationID(). The previous wording falsely claimed it exercised that accessor. - TestDatumEqualsCollation: add require.NotEqual(0, a.CollationID()) on the case-variant block so the equality claim can't pass vacuously if "UTF8MB4_BIN" / "utf8mb4_bin" ever fail to resolve and both collapse to id 0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/types/datum_test.go (1)
317-342:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winExercise the
minuscases too.
TestComputePlusAndMinusstill only checksComputePlus, so theminusexpectations in the table are never validated. A regression inComputeMinuswould pass this test unnoticed.Proposed fix
for ith, tt := range tests { got, err := ComputePlus(tt.a, tt.b) require.Equal(t, tt.hasErr, err != nil) v, err := got.Compare(DefaultStmtNoWarningContext, &tt.plus, collate.GetBinaryCollator()) require.NoError(t, err) require.Equalf(t, 0, v, "%dth got:%#v, %#v, expect:%#v, %#v", ith, got, got.decPtr, tt.plus, tt.plus.decPtr) + + minusGot, err := ComputeMinus(tt.a, tt.b) + require.Equal(t, tt.hasErr, err != nil) + v, err = minusGot.Compare(DefaultStmtNoWarningContext, &tt.minus, collate.GetBinaryCollator()) + require.NoError(t, err) + require.Equalf(t, 0, v, "%dth got:%#v, %#v, expect:%#v, %#v", ith, minusGot, minusGot.decPtr, tt.minus, tt.minus.decPtr) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/datum_test.go` around lines 317 - 342, The test TestComputePlusAndMinus currently only validates ComputePlus; update it to also call ComputeMinus for each table row: invoke ComputeMinus(tt.a, tt.b), assert error presence matches tt.hasErr, then compare the resulting Datum with tt.minus using got.Compare(DefaultStmtNoWarningContext, &tt.minus, collate.GetBinaryCollator()) and require.Equal/require.NoError as appropriate (mirror the existing ComputePlus assertions). Ensure you reference the same loop variable (tt) and test index (ith) and reuse the same comparison/logging pattern so minus failures are reported consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pkg/types/datum_test.go`:
- Around line 317-342: The test TestComputePlusAndMinus currently only validates
ComputePlus; update it to also call ComputeMinus for each table row: invoke
ComputeMinus(tt.a, tt.b), assert error presence matches tt.hasErr, then compare
the resulting Datum with tt.minus using got.Compare(DefaultStmtNoWarningContext,
&tt.minus, collate.GetBinaryCollator()) and require.Equal/require.NoError as
appropriate (mirror the existing ComputePlus assertions). Ensure you reference
the same loop variable (tt) and test index (ith) and reuse the same
comparison/logging pattern so minus failures are reported consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 64783f13-8150-4776-8038-34483b0664be
📒 Files selected for processing (2)
pkg/types/datum.gopkg/types/datum_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/types/datum.go
|
@mjonss: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: ref #67976
Problem Summary:
Final step of the
types.Datumsize-reduction series tracked in #67976.After Option A (72→56 B), Option B (pack
Timeintod.i), and theremoval of the
KindInterfaceDeprecatedescape hatch (#68007), theheterogeneous
x anyslot inDatumonly ever carries a*MyDecimal.Replace it with a typed pointer to drop the 16-byte interface header
and reach a 48-byte Datum.
What changed and how does it work?
Core change — Datum struct layout
pkg/types/datum.go:x any→decPtr *MyDecimal.GetMysqlDecimal/SetMysqlDecimalused.decPtrdirectly (notype assertion, no interface unbox).
SetNull/SetMinNotNullcleardecPtr(wasd.x = nil).MarshalJSON's "unexpected non-nil x" guard now checksdecPtr.pkg/types/datum_test.go:TestMarshalDatumandTestComputePlusadapted to the new field name / shape (the former compared the
interface-held value by type; now it compares the typed
*MyDecimaldirectly for decimal datums and uses the
GetMysqlTime()facade fortime datums, matching the Option B approach).
Bug caught and fixed along the way
GetMysqlDurationnow restores theint(int8(d.decimal))sign-extensionwhen unpacking
Fsp.d.decimalisuint8(after Option A), butcallers encode
Fsp = -1(the "unspecified" sentinel, seetypes.UnspecifiedFsp), which round-trips as 255 in the narrow fieldand needs the
int8cast to come back as-1. Surfaced byTestMemJsonObjectagg, which formats a Duration withFsp = -1— themissing sign-extension produced
s[0:255]on a 6-byte string insideDuration.formatFracand crashed. This was a latent defect introducedby Option A's narrowing of
decimal; fixing it here also makes OptionA + friends safer in isolation.
Measurements
Linux/amd64, Intel i7-13700,
benchstat -count=3comparing A+B+pre-Cagainst A+B+pre-C+C (i.e. the effect of this PR's single commit on
top of the stack).
unsafe.Sizeof(types.Datum)BenchmarkDecodeWithSizeB/op (codec,make([]Datum, 100))BenchmarkDecodeWithOutSizeB/op (codec)Full journey from master:
BenchmarkDecodeWithSizegoes 8192 → 6144(A) → 4864 (A+B+pre-C+C) = −40.6% B/op.
Dependencies
This PR cannot land before its parents. The branch contains them as
parent commits so reviewers can see the full stack:
When the parents merge, rebasing this PR onto master should reduce to
the single Option C commit (
ea7722bc34).Verification
Local runs clean on the touched surface:
pkg/types/...,pkg/types/parser_driver/...,pkg/util/codec/...,pkg/util/collate/...,pkg/util/ranger/...,pkg/tablecodec/...,pkg/lightning/backend/kv/...,pkg/executor/aggfuncs/...: all pass.pkg/util/chunk/...: the failpoint-gated tests(
TestPanicWhenSpillToDisk,TestPanicDuringSortedRowContainerSpill)pass via
tools/check/failpoint-go-test.sh. Also pre-existing onmaster.
SELECT * FROM t TABLESAMPLE (33.3 + 44.4)parses through the TiDBparser driver without panic (same smoke test as *: remove KindInterface escape hatch from types.Datum #68007).
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
Bug Fixes
Tests
Benchmarks
Breaking Changes