perf: eliminate reflection overhead in RowData() and type instantiation#779
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes common scan/unmarshal setup paths in the driver by reducing reflection and allocation overhead when creating destination values for row scanning (notably in TypeInfo.NewWithError() and Iter.RowData()), and adds benchmarks to measure the effect.
Changes:
- Add fast paths in
NativeType.NewWithError()andCollectionType.NewWithError()to avoid reflection for common primitive and collection types. - Optimize
Iter.RowData()by pre-sizing slices usingactualColCountand filling via direct indexing (including tuple expansion). - Add
helpers_bench_test.gobenchmarks to measureRowData()performance across column counts/types/tuples.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| marshal.go | Adds fast-path allocations for common types/collections and simplifies tuple destination creation. |
| helpers.go | Improves RowData() allocation/indexing strategy using actualColCount and tuple expansion logic. |
| helpers_bench_test.go | Introduces benchmarks for RowData() in several representative scenarios. |
ec75393 to
1f4c312
Compare
|
Addressed review feedback: Fixed (this push):
|
There was a problem hiding this comment.
Pull request overview
This PR optimizes common allocation paths in the driver by adding non-reflection “fast paths” for TypeInfo.NewWithError() and by reducing allocations in Iter.RowData(). It also adds unit tests to keep the new fast-path mappings consistent with the canonical goType() mapping, plus benchmarks for RowData() performance.
Changes:
- Add fast-path implementations for
NativeType.NewWithError()andCollectionType.NewWithError()to avoid reflection for common primitive/collection types. - Optimize
Iter.RowData()by pre-sizing slices toactualColCountand using direct indexing (including tuple expansion). - Add unit tests ensuring
NewWithError()fast paths stay consistent withgoType(), and addRowData()benchmarks.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| marshal.go | Adds fast-path NewWithError() implementations and simplifies tuple NewWithError(). |
| marshal_test.go | Adds tests to ensure fast-path type allocation matches goType()’s canonical mapping. |
| helpers.go | Optimizes Iter.RowData() allocations and indexing using actualColCount, adds mismatch error. |
| helpers_bench_test.go | Adds microbenchmarks for Iter.RowData() across common shapes (wide/narrow/tuples). |
There was a problem hiding this comment.
Pull request overview
This PR targets hot-path performance in result scanning and host selection by removing reflection-heavy allocations and reducing per-query heap churn in caches and iterators.
Changes:
- Optimize
Iter.RowData()allocation/indexing and add defensive metadata consistency checks. - Add direct instantiation fast paths in
NativeType.NewWithError()/CollectionType.NewWithError()/TupleTypeInfo.NewWithError()to reduce reflection overhead, plus unit tests to keep mappings consistent withgoType(). - Reduce allocations in prepared statement caching (struct cache key + generic LRU) and token-aware host selection (in-place shuffle, in-place partition, small “used host” set), plus benchmarks.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| session.go | Updates LRU instantiations to use the new generic lru.Cache[K] types. |
| prepared_cache.go | Replaces concatenated string cache keys with a composite struct key and updates cache APIs accordingly. |
| prepared_cache_bench_test.go | Adds a micro-benchmark for the new prepared-statement cache key construction. |
| policies.go | Adds allocation-reducing helpers (hostSet, in-place shuffle/partition) and applies them in token-aware selection. |
| policies_test.go | Adjusts token-aware policy test setup/expectations to match new selection behavior. |
| policies_bench_test.go | Adds benchmarks for the new policy helpers. |
| marshal.go | Adds NewWithError() fast paths for native and common collection types; simplifies tuple allocation path. |
| marshal_test.go | Adds unit tests ensuring NewWithError() fast paths remain consistent with goType(). |
| internal/lru/lru.go | Generalizes the internal LRU to Cache[K comparable] to avoid interface key boxing and enable struct keys. |
| internal/lru/lru_test.go | Updates tests for generic cache and adds coverage for struct keys. |
| helpers.go | Reworks RowData() to pre-size and fill slices by index and to error on metadata inconsistencies. |
| helpers_bench_test.go | Adds a benchmark suite for RowData() scenarios (simple, wide, tuples, repeated calls). |
| conn.go | Pools callReq objects via sync.Pool and switches prepareStatement to struct cache keys. |
| conn_bench_test.go | Adds a benchmark comparing pooled vs unpooled callReq allocation. |
6e774a2 to
a08bf20
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
conn.go:1230
- With callReq now coming from a sync.Pool, the early-return path when addCall() fails should return resources: put the callReq back in the pool and clear the allocated stream (c.streams.Clear(stream)) when appropriate. Otherwise, failures (e.g., unexpected existingCall) will leak the callReq object and can also leak a stream slot.
call := getCallReq(stream)
if c.streamObserver != nil {
call.streamObserverContext = c.streamObserver.StreamContext(ctx)
}
if err := c.addCall(call); err != nil {
return nil, &QueryError{err: err, potentiallyExecuted: false}
}
a08bf20 to
b8edd55
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
conn.go:1230
- If addCall(call) fails, exec() currently returns without clearing the reserved stream ID and without returning the callReq to the pool. This can leak streams (reducing AvailableStreams over time) and defeats the pooling optimization. Consider releasing the stream (and returning the callReq to the pool) on this error path before returning.
call := getCallReq(stream)
if c.streamObserver != nil {
call.streamObserverContext = c.streamObserver.StreamContext(ctx)
}
if err := c.addCall(call); err != nil {
return nil, &QueryError{err: err, potentiallyExecuted: false}
}
|
@mykaul , it looks good, could you please rebase it on master, i have another PR on top on this guy |
37e2e7c to
342ebb9
Compare
done. |
342ebb9 to
0f2e686
Compare
0f2e686 to
f9ec857
Compare
|
@mykaul , needs another rebase |
Use pre-sizing and direct indexing in RowData() to improve
performance and reduce allocations, especially for queries with tuples.
Changes:
- Pre-size slices using iter.meta.actualColCount instead of len(iter.Columns())
to account for tuple expansion, eliminating reallocation
- Replace append operations with direct slice indexing to avoid bounds
checking and length update overhead
Performance improvements (measured on Intel i7-1270P, single core):
- Regular columns: 2-4% faster across all column counts
- Tuple columns: 12-13% faster with 128 bytes less memory and 2 fewer
allocations per RowData() call
Benchmark results:
Baseline Optimized Improvement
BenchmarkRowData 1122 ns/op 1092 ns/op 2.7% faster
BenchmarkRowDataWithTuples 1556 ns/op 1369 ns/op 12.0% faster
616 B, 20 allocs 488 B, 18 allocs
BenchmarkRowDataAllocation/100cols 10580 ns/op 10130 ns/op 4.3% faster
BenchmarkRowDataAllocation/WithTuples 1602 ns/op 1390 ns/op 13.2% faster
616 B, 20 allocs 488 B, 18 allocs
Added comprehensive benchmark suite in helpers_bench_test.go to measure
RowData() performance across various column counts and data types.
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Add fast-path type instantiation for all native CQL types to avoid
expensive reflection calls when creating column value holders.
Changes:
- Added type switch in NativeType.NewWithError() with direct instantiation
for all 17 native types (int, bigint, text, uuid, timestamp, etc.)
- Falls back to reflection only for TypeCustom and complex types
- Added required imports: time, math/big, gopkg.in/inf.v0
Performance improvements (measured on Intel i7-1270P, single core):
Combined with previous pre-sizing/indexing optimization, achieves:
- 35-66% faster RowData() across all workloads
- 40-47% less memory allocation
- 35-50% fewer allocations
The reflection elimination provides 50-65% speedup on top of the
pre-sizing optimization, with improvement scaling with column count.
Benchmark comparison (baseline → optimized):
BenchmarkRowData 1122 ns/op → 463 ns/op (58.7% faster)
720 B, 22 allocs → 400 B, 12 allocs
BenchmarkRowDataLarge 5220 ns/op → 1865 ns/op (64.3% faster)
3792 B, 102 allocs → 2192 B, 52 allocs
BenchmarkRowDataAllocation/100cols 10580 ns/op → 3624 ns/op (65.7% faster)
7584 B, 202 allocs → 4384 B, 102 allocs
BenchmarkRowDataAllocation/1000cols 103552 ns/op → 35607 ns/op (65.6% faster)
72768 B, 2002 allocs → 40768 B, 1002 allocs
Every column in RowData() calls NewWithError(), making this optimization
highly impactful for queries with many columns. The improvement compounds
with the previous commit's pre-sizing and direct indexing changes.
Same improvement can be done (in a separate PR) to collections and tuples
(and their NewWithError() functions)
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
…d tuple types
Similiar to previous commit:
Add type-specific fast paths in CollectionType.NewWithError() and
TupleTypeInfo.NewWithError() to avoid expensive reflection calls during
row data allocation.
Changes:
- CollectionType.NewWithError(): Fast paths for common patterns:
* Lists/sets: []int, []int64, []string, []bool, []float32, []float64,
[]UUID, []time.Time, []int16, []int8, [][]byte
* Maps: map[string]int, map[string]int64, map[string]string,
map[string]bool, map[string]float64, map[string]UUID,
map[int]string, map[int]int
* Falls back to reflection for complex nested collections
- TupleTypeInfo.NewWithError(): Simplified to always return
new([]interface{}) since tuples unmarshal to []interface{} regardless
of element types, completely eliminating reflection
(Note - we may need to think of moving from interface to any? )
Performance impact:
- Tuple-heavy queries: ~3% faster (1047→1017 ns/op)
- Maintains performance for primitive-heavy workloads
- Part of broader RowData() optimization series:
* Combined improvements: 58.7% faster overall
* Memory: -44% (720→400 B/op)
* Allocations: -45% (22→12 allocs/op)
Benchmarks show targeted benefits for queries using collections and
tuples while preserving fast-path performance for queries dominated by
native types.
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
- Add defensive guard in RowData() to detect actualColCount metadata inconsistency: returns a clear error instead of silently producing trailing zero-values or panicking on index-out-of-bounds. - Set iter.err in column count mismatch guard to match error contract. - Add overflow bounds checks before slice indexing to prevent panics. - Add TypeFloat fast-path for map values in CollectionType.NewWithError() for both string-key and int-key maps, avoiding reflection for map[string]float32 and map[int]float32. - Move TestNewWithErrorConsistentWithGoType from helpers_bench_test.go to marshal_test.go (with unit build tag) where it logically belongs. - Add TestCollectionNewWithErrorConsistentWithGoType to verify that CollectionType.NewWithError() fast-paths stay in sync with goType() for list, set, and map types with common element/key combinations. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
f9ec857 to
79f26ef
Compare
done. |
Summary
Eliminate reflection overhead in the hot path of
RowData()→NewWithError()by adding direct type instantiation fast paths for all native CQL types, common collection patterns, and tuples.Extracted from #699.
Commit 1: Optimize RowData() allocation and assignment performance
iter.meta.actualColCount(accounts for tuple expansion)appendwith direct slice indexing to avoid bounds checking overheadhelpers_bench_test.goCommit 2: Eliminate reflection in NativeType.NewWithError()
new()calls for all 17 native CQL typesTypeCustomand complex typesCommit 3: Eliminate reflection for collection and tuple types
TupleTypeInfo.NewWithError()now always returnsnew([]interface{})— no reflection neededCommit 4: Harden RowData + improve NewWithError test coverage
RowData()to detectactualColCountmetadata inconsistency (returns clear error instead of silent corruption or index-out-of-bounds panic)iter.errin column count mismatch guard to match error contractTypeFloatfast-path for map values inCollectionType.NewWithError()TestNewWithErrorConsistentWithGoTypetomarshal_test.go(unit build tag)TestCollectionNewWithErrorConsistentWithGoTypeto verify collection fast-paths stay in sync withgoType()Benchmark results
origin/mastervs this branch —benchstatcomparison (10 runs each, Intel i7-1270P).All results statistically significant at p=0.000, n=10.
Speed — 55.2% geomean improvement
Memory (B/op) — 44.1% geomean reduction
Allocations (allocs/op) — 44.0% geomean reduction
Every column in
RowData()callsNewWithError(), making this optimization highly impactful for queries with many columns. The improvement compounds: pre-sizing eliminates reallocation, and direct instantiation eliminates reflection — together achieving ~55-63% speedup with ~44% fewer allocations.Related PRs
The following changes were previously bundled in this PR and have been split out:
Signed-off-by: Yaniv Kaul yaniv.kaul@scylladb.com