Skip to content

perf: eliminate reflection overhead in RowData() and type instantiation#779

Merged
dkropachev merged 4 commits intoscylladb:masterfrom
mykaul:rowdata_reflection_elimination
Apr 5, 2026
Merged

perf: eliminate reflection overhead in RowData() and type instantiation#779
dkropachev merged 4 commits intoscylladb:masterfrom
mykaul:rowdata_reflection_elimination

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Mar 16, 2026

Summary

Eliminate reflection overhead in the hot path of RowData()NewWithError() by adding direct type instantiation fast paths for all native CQL types, common collection patterns, and tuples.

Extracted from #699.


Commit 1: Optimize RowData() allocation and assignment performance

  • Pre-size slices using iter.meta.actualColCount (accounts for tuple expansion)
  • Replace append with direct slice indexing to avoid bounds checking overhead
  • Add comprehensive benchmark suite in helpers_bench_test.go

Commit 2: Eliminate reflection in NativeType.NewWithError()

  • Add type switch with direct new() calls for all 17 native CQL types
  • Falls back to reflection only for TypeCustom and complex types

Commit 3: Eliminate reflection for collection and tuple types

  • Fast paths for common list/set element types (int, int64, string, bool, float32, float64, UUID, time.Time, etc.)
  • Fast paths for common map key/value combinations (string→int, string→string, etc.)
  • TupleTypeInfo.NewWithError() now always returns new([]interface{}) — no reflection needed

Commit 4: Harden RowData + improve NewWithError test coverage

  • Add defensive guard in RowData() to detect actualColCount metadata inconsistency (returns clear error instead of silent corruption or index-out-of-bounds panic)
  • Set iter.err in column count mismatch guard to match error contract
  • Add overflow bounds checks before slice indexing to prevent panics
  • Add TypeFloat fast-path for map values in CollectionType.NewWithError()
  • Move TestNewWithErrorConsistentWithGoType to marshal_test.go (unit build tag)
  • Add TestCollectionNewWithErrorConsistentWithGoType to verify collection fast-paths stay in sync with goType()

Benchmark results

origin/master vs this branch — benchstat comparison (10 runs each, Intel i7-1270P).

All results statistically significant at p=0.000, n=10.

Speed — 55.2% geomean improvement

Benchmark master (ns/op) PR (ns/op) vs base
RowData-16 477.8 ±3% 198.5 ±2% -58.47%
RowDataSmall-16 164.0 ±2% 75.2 ±2% -54.15%
RowDataLarge-16 2286.5 ±3% 840.4 ±1% -63.24%
RowDataWithTypes-16 547.8 ±3% 248.8 ±2% -54.58%
RowDataWithTuples-16 641.9 ±1% 417.7 ±2% -34.94%
RowDataRepeated-16 49.16 µs ±2% 20.80 µs ±2% -57.68%
Alloc/10cols-16 496.9 ±1% 211.0 ±4% -57.55%
Alloc/100cols-16 4611 ±3% 1694 ±2% -63.26%
Alloc/1000cols-16 44960 ±3% 16500 ±1% -63.29%
Alloc/WithTuples-16 647.8 ±1% 426.2 ±2% -34.22%
geomean 1704 764.5 -55.15%

Memory (B/op) — 44.1% geomean reduction

Benchmark master (B) PR (B) vs base
RowData-16 720 ±0% 400 ±0% -44.44%
RowDataSmall-16 216 ±0% 120 ±0% -44.44%
RowDataLarge-16 3,792 ±0% 2,192 ±0% -42.19%
RowDataWithTypes-16 776 ±0% 456 ±0% -41.24%
RowDataWithTuples-16 616 ±0% 328 ±0% -46.75%
RowDataRepeated-16 72,000 ±0% 40,000 ±0% -44.44%
Alloc/10cols-16 720 ±0% 400 ±0% -44.44%
Alloc/100cols-16 7,584 ±0% 4,384 ±0% -42.19%
Alloc/1000cols-16 72,768 ±0% 40,768 ±0% -43.98%
Alloc/WithTuples-16 616 ±0% 328 ±0% -46.75%

Allocations (allocs/op) — 44.0% geomean reduction

Benchmark master PR vs base
RowData-16 22 ±0% 12 ±0% -45.45%
RowDataSmall-16 8 ±0% 5 ±0% -37.50%
RowDataLarge-16 102 ±0% 52 ±0% -49.02%
RowDataWithTypes-16 22 ±0% 12 ±0% -45.45%
RowDataWithTuples-16 20 ±0% 13 ±0% -35.00%
RowDataRepeated-16 2,200 ±0% 1,200 ±0% -45.45%
Alloc/10cols-16 22 ±0% 12 ±0% -45.45%
Alloc/100cols-16 202 ±0% 102 ±0% -49.50%
Alloc/1000cols-16 2,002 ±0% 1,002 ±0% -49.95%
Alloc/WithTuples-16 20 ±0% 13 ±0% -35.00%

Every column in RowData() calls NewWithError(), making this optimization highly impactful for queries with many columns. The improvement compounds: pre-sizing eliminates reallocation, and direct instantiation eliminates reflection — together achieving ~55-63% speedup with ~44% fewer allocations.

Related PRs

The following changes were previously bundled in this PR and have been split out:

Signed-off-by: Yaniv Kaul yaniv.kaul@scylladb.com

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes common scan/unmarshal setup paths in the driver by reducing reflection and allocation overhead when creating destination values for row scanning (notably in TypeInfo.NewWithError() and Iter.RowData()), and adds benchmarks to measure the effect.

Changes:

  • Add fast paths in NativeType.NewWithError() and CollectionType.NewWithError() to avoid reflection for common primitive and collection types.
  • Optimize Iter.RowData() by pre-sizing slices using actualColCount and filling via direct indexing (including tuple expansion).
  • Add helpers_bench_test.go benchmarks to measure RowData() performance across column counts/types/tuples.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
marshal.go Adds fast-path allocations for common types/collections and simplifies tuple destination creation.
helpers.go Improves RowData() allocation/indexing strategy using actualColCount and tuple expansion logic.
helpers_bench_test.go Introduces benchmarks for RowData() in several representative scenarios.

Comment thread marshal.go
Comment thread marshal.go Outdated
@mykaul mykaul force-pushed the rowdata_reflection_elimination branch from ec75393 to 1f4c312 Compare March 17, 2026 18:07
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Mar 17, 2026

Addressed review feedback:

Fixed (this push):

  1. Added TestNewWithErrorConsistentWithGoType — a unit test that iterates all NativeType type codes and verifies the fast-path switch in NewWithError() produces the same Go type as the canonical goType() function. This catches any future drift between the two mappings.
  2. Fixed tuple comment in TupleTypeInfo.NewWithError(): changed "Tuples are always []interface{}" to clarify it returns *[]interface{} (pointer to slice), matching the convention of other NewWithError implementations.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes common allocation paths in the driver by adding non-reflection “fast paths” for TypeInfo.NewWithError() and by reducing allocations in Iter.RowData(). It also adds unit tests to keep the new fast-path mappings consistent with the canonical goType() mapping, plus benchmarks for RowData() performance.

Changes:

  • Add fast-path implementations for NativeType.NewWithError() and CollectionType.NewWithError() to avoid reflection for common primitive/collection types.
  • Optimize Iter.RowData() by pre-sizing slices to actualColCount and using direct indexing (including tuple expansion).
  • Add unit tests ensuring NewWithError() fast paths stay consistent with goType(), and add RowData() benchmarks.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
marshal.go Adds fast-path NewWithError() implementations and simplifies tuple NewWithError().
marshal_test.go Adds tests to ensure fast-path type allocation matches goType()’s canonical mapping.
helpers.go Optimizes Iter.RowData() allocations and indexing using actualColCount, adds mismatch error.
helpers_bench_test.go Adds microbenchmarks for Iter.RowData() across common shapes (wide/narrow/tuples).

Comment thread helpers.go Outdated
Comment thread marshal.go
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets hot-path performance in result scanning and host selection by removing reflection-heavy allocations and reducing per-query heap churn in caches and iterators.

Changes:

  • Optimize Iter.RowData() allocation/indexing and add defensive metadata consistency checks.
  • Add direct instantiation fast paths in NativeType.NewWithError() / CollectionType.NewWithError() / TupleTypeInfo.NewWithError() to reduce reflection overhead, plus unit tests to keep mappings consistent with goType().
  • Reduce allocations in prepared statement caching (struct cache key + generic LRU) and token-aware host selection (in-place shuffle, in-place partition, small “used host” set), plus benchmarks.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
session.go Updates LRU instantiations to use the new generic lru.Cache[K] types.
prepared_cache.go Replaces concatenated string cache keys with a composite struct key and updates cache APIs accordingly.
prepared_cache_bench_test.go Adds a micro-benchmark for the new prepared-statement cache key construction.
policies.go Adds allocation-reducing helpers (hostSet, in-place shuffle/partition) and applies them in token-aware selection.
policies_test.go Adjusts token-aware policy test setup/expectations to match new selection behavior.
policies_bench_test.go Adds benchmarks for the new policy helpers.
marshal.go Adds NewWithError() fast paths for native and common collection types; simplifies tuple allocation path.
marshal_test.go Adds unit tests ensuring NewWithError() fast paths remain consistent with goType().
internal/lru/lru.go Generalizes the internal LRU to Cache[K comparable] to avoid interface key boxing and enable struct keys.
internal/lru/lru_test.go Updates tests for generic cache and adds coverage for struct keys.
helpers.go Reworks RowData() to pre-size and fill slices by index and to error on metadata inconsistencies.
helpers_bench_test.go Adds a benchmark suite for RowData() scenarios (simple, wide, tuples, repeated calls).
conn.go Pools callReq objects via sync.Pool and switches prepareStatement to struct cache keys.
conn_bench_test.go Adds a benchmark comparing pooled vs unpooled callReq allocation.

Comment thread policies.go Outdated
Comment thread marshal.go Outdated
@mykaul mykaul force-pushed the rowdata_reflection_elimination branch from 6e774a2 to a08bf20 Compare March 23, 2026 10:54
@mykaul mykaul requested a review from Copilot March 23, 2026 10:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

conn.go:1230

  • With callReq now coming from a sync.Pool, the early-return path when addCall() fails should return resources: put the callReq back in the pool and clear the allocated stream (c.streams.Clear(stream)) when appropriate. Otherwise, failures (e.g., unexpected existingCall) will leak the callReq object and can also leak a stream slot.
	call := getCallReq(stream)

	if c.streamObserver != nil {
		call.streamObserverContext = c.streamObserver.StreamContext(ctx)
	}

	if err := c.addCall(call); err != nil {
		return nil, &QueryError{err: err, potentiallyExecuted: false}
	}

Comment thread policies.go Outdated
Comment thread policies.go Outdated
Comment thread prepared_cache_bench_test.go Outdated
@mykaul mykaul force-pushed the rowdata_reflection_elimination branch from a08bf20 to b8edd55 Compare March 23, 2026 13:29
@mykaul mykaul requested a review from Copilot March 24, 2026 18:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

conn.go:1230

  • If addCall(call) fails, exec() currently returns without clearing the reserved stream ID and without returning the callReq to the pool. This can leak streams (reducing AvailableStreams over time) and defeats the pooling optimization. Consider releasing the stream (and returning the callReq to the pool) on this error path before returning.
	call := getCallReq(stream)

	if c.streamObserver != nil {
		call.streamObserverContext = c.streamObserver.StreamContext(ctx)
	}

	if err := c.addCall(call); err != nil {
		return nil, &QueryError{err: err, potentiallyExecuted: false}
	}

Comment thread policies.go Outdated
@dkropachev
Copy link
Copy Markdown
Collaborator

@mykaul , it looks good, could you please rebase it on master, i have another PR on top on this guy

@mykaul mykaul force-pushed the rowdata_reflection_elimination branch from 37e2e7c to 342ebb9 Compare April 3, 2026 21:03
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 3, 2026

@mykaul , it looks good, could you please rebase it on master, i have another PR on top on this guy

done.

Comment thread prepared_cache_bench_test.go Outdated
@dkropachev
Copy link
Copy Markdown
Collaborator

@mykaul , needs another rebase

@dkropachev dkropachev marked this pull request as ready for review April 5, 2026 18:41
mykaul added 4 commits April 5, 2026 21:52
Use pre-sizing and direct indexing in RowData() to improve
performance and reduce allocations, especially for queries with tuples.

Changes:
- Pre-size slices using iter.meta.actualColCount instead of len(iter.Columns())
  to account for tuple expansion, eliminating reallocation
- Replace append operations with direct slice indexing to avoid bounds
  checking and length update overhead

Performance improvements (measured on Intel i7-1270P, single core):
- Regular columns: 2-4% faster across all column counts
- Tuple columns: 12-13% faster with 128 bytes less memory and 2 fewer
  allocations per RowData() call

Benchmark results:
                              Baseline        Optimized       Improvement
BenchmarkRowData              1122 ns/op      1092 ns/op      2.7% faster
BenchmarkRowDataWithTuples    1556 ns/op      1369 ns/op      12.0% faster
                              616 B, 20 allocs 488 B, 18 allocs
BenchmarkRowDataAllocation/100cols   10580 ns/op     10130 ns/op     4.3% faster
BenchmarkRowDataAllocation/WithTuples 1602 ns/op     1390 ns/op      13.2% faster
                              616 B, 20 allocs 488 B, 18 allocs

Added comprehensive benchmark suite in helpers_bench_test.go to measure
RowData() performance across various column counts and data types.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Add fast-path type instantiation for all native CQL types to avoid
expensive reflection calls when creating column value holders.

Changes:
- Added type switch in NativeType.NewWithError() with direct instantiation
  for all 17 native types (int, bigint, text, uuid, timestamp, etc.)
- Falls back to reflection only for TypeCustom and complex types
- Added required imports: time, math/big, gopkg.in/inf.v0

Performance improvements (measured on Intel i7-1270P, single core):
Combined with previous pre-sizing/indexing optimization, achieves:
- 35-66% faster RowData() across all workloads
- 40-47% less memory allocation
- 35-50% fewer allocations

The reflection elimination provides 50-65% speedup on top of the
pre-sizing optimization, with improvement scaling with column count.

Benchmark comparison (baseline → optimized):
BenchmarkRowData              1122 ns/op → 463 ns/op   (58.7% faster)
                              720 B, 22 allocs → 400 B, 12 allocs
BenchmarkRowDataLarge         5220 ns/op → 1865 ns/op  (64.3% faster)
                              3792 B, 102 allocs → 2192 B, 52 allocs
BenchmarkRowDataAllocation/100cols   10580 ns/op → 3624 ns/op (65.7% faster)
                              7584 B, 202 allocs → 4384 B, 102 allocs
BenchmarkRowDataAllocation/1000cols  103552 ns/op → 35607 ns/op (65.6% faster)
                              72768 B, 2002 allocs → 40768 B, 1002 allocs

Every column in RowData() calls NewWithError(), making this optimization
highly impactful for queries with many columns. The improvement compounds
with the previous commit's pre-sizing and direct indexing changes.

Same improvement can be done (in a separate PR) to collections and tuples
(and their NewWithError() functions)

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
…d tuple types

Similiar to previous commit:
Add type-specific fast paths in CollectionType.NewWithError() and
TupleTypeInfo.NewWithError() to avoid expensive reflection calls during
row data allocation.

Changes:
- CollectionType.NewWithError(): Fast paths for common patterns:
  * Lists/sets: []int, []int64, []string, []bool, []float32, []float64,
    []UUID, []time.Time, []int16, []int8, [][]byte
  * Maps: map[string]int, map[string]int64, map[string]string,
    map[string]bool, map[string]float64, map[string]UUID,
    map[int]string, map[int]int
  * Falls back to reflection for complex nested collections

- TupleTypeInfo.NewWithError(): Simplified to always return
  new([]interface{}) since tuples unmarshal to []interface{} regardless
  of element types, completely eliminating reflection

(Note - we may need to think of moving from interface to any? )

Performance impact:
- Tuple-heavy queries: ~3% faster (1047→1017 ns/op)
- Maintains performance for primitive-heavy workloads
- Part of broader RowData() optimization series:
  * Combined improvements: 58.7% faster overall
  * Memory: -44% (720→400 B/op)
  * Allocations: -45% (22→12 allocs/op)

Benchmarks show targeted benefits for queries using collections and
tuples while preserving fast-path performance for queries dominated by
native types.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
- Add defensive guard in RowData() to detect actualColCount metadata
  inconsistency: returns a clear error instead of silently producing
  trailing zero-values or panicking on index-out-of-bounds.

- Set iter.err in column count mismatch guard to match error contract.

- Add overflow bounds checks before slice indexing to prevent panics.

- Add TypeFloat fast-path for map values in CollectionType.NewWithError()
  for both string-key and int-key maps, avoiding reflection for
  map[string]float32 and map[int]float32.

- Move TestNewWithErrorConsistentWithGoType from helpers_bench_test.go
  to marshal_test.go (with unit build tag) where it logically belongs.

- Add TestCollectionNewWithErrorConsistentWithGoType to verify that
  CollectionType.NewWithError() fast-paths stay in sync with goType()
  for list, set, and map types with common element/key combinations.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
@mykaul mykaul force-pushed the rowdata_reflection_elimination branch from f9ec857 to 79f26ef Compare April 5, 2026 18:53
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 5, 2026

@mykaul , needs another rebase

done.

@dkropachev dkropachev merged commit f0df6a5 into scylladb:master Apr 5, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants