perf: eliminate reflection overhead in RowData() and type instantiation by mykaul · Pull Request #779 · scylladb/gocql

mykaul · 2026-03-16T18:02:29Z

Summary

Eliminate reflection overhead in the hot path of RowData() → NewWithError() by adding direct type instantiation fast paths for all native CQL types, common collection patterns, and tuples.

Extracted from #699.

Commit 1: Optimize RowData() allocation and assignment performance

Pre-size slices using iter.meta.actualColCount (accounts for tuple expansion)
Replace append with direct slice indexing to avoid bounds checking overhead
Add comprehensive benchmark suite in helpers_bench_test.go

Commit 2: Eliminate reflection in NativeType.NewWithError()

Add type switch with direct new() calls for all 17 native CQL types
Falls back to reflection only for TypeCustom and complex types

Commit 3: Eliminate reflection for collection and tuple types

Fast paths for common list/set element types (int, int64, string, bool, float32, float64, UUID, time.Time, etc.)
Fast paths for common map key/value combinations (string→int, string→string, etc.)
TupleTypeInfo.NewWithError() now always returns new([]interface{}) — no reflection needed

Commit 4: Harden RowData + improve NewWithError test coverage

Add defensive guard in RowData() to detect actualColCount metadata inconsistency (returns clear error instead of silent corruption or index-out-of-bounds panic)
Set iter.err in column count mismatch guard to match error contract
Add overflow bounds checks before slice indexing to prevent panics
Add TypeFloat fast-path for map values in CollectionType.NewWithError()
Move TestNewWithErrorConsistentWithGoType to marshal_test.go (unit build tag)
Add TestCollectionNewWithErrorConsistentWithGoType to verify collection fast-paths stay in sync with goType()

Benchmark results

origin/master vs this branch — benchstat comparison (10 runs each, Intel i7-1270P).

All results statistically significant at p=0.000, n=10.

Speed — 55.2% geomean improvement

Benchmark	master (ns/op)	PR (ns/op)	vs base
RowData-16	477.8 ±3%	198.5 ±2%	-58.47%
RowDataSmall-16	164.0 ±2%	75.2 ±2%	-54.15%
RowDataLarge-16	2286.5 ±3%	840.4 ±1%	-63.24%
RowDataWithTypes-16	547.8 ±3%	248.8 ±2%	-54.58%
RowDataWithTuples-16	641.9 ±1%	417.7 ±2%	-34.94%
RowDataRepeated-16	49.16 µs ±2%	20.80 µs ±2%	-57.68%
Alloc/10cols-16	496.9 ±1%	211.0 ±4%	-57.55%
Alloc/100cols-16	4611 ±3%	1694 ±2%	-63.26%
Alloc/1000cols-16	44960 ±3%	16500 ±1%	-63.29%
Alloc/WithTuples-16	647.8 ±1%	426.2 ±2%	-34.22%
geomean	1704	764.5	-55.15%

Memory (B/op) — 44.1% geomean reduction

Benchmark	master (B)	PR (B)	vs base
RowData-16	720 ±0%	400 ±0%	-44.44%
RowDataSmall-16	216 ±0%	120 ±0%	-44.44%
RowDataLarge-16	3,792 ±0%	2,192 ±0%	-42.19%
RowDataWithTypes-16	776 ±0%	456 ±0%	-41.24%
RowDataWithTuples-16	616 ±0%	328 ±0%	-46.75%
RowDataRepeated-16	72,000 ±0%	40,000 ±0%	-44.44%
Alloc/10cols-16	720 ±0%	400 ±0%	-44.44%
Alloc/100cols-16	7,584 ±0%	4,384 ±0%	-42.19%
Alloc/1000cols-16	72,768 ±0%	40,768 ±0%	-43.98%
Alloc/WithTuples-16	616 ±0%	328 ±0%	-46.75%

Allocations (allocs/op) — 44.0% geomean reduction

Benchmark	master	PR	vs base
RowData-16	22 ±0%	12 ±0%	-45.45%
RowDataSmall-16	8 ±0%	5 ±0%	-37.50%
RowDataLarge-16	102 ±0%	52 ±0%	-49.02%
RowDataWithTypes-16	22 ±0%	12 ±0%	-45.45%
RowDataWithTuples-16	20 ±0%	13 ±0%	-35.00%
RowDataRepeated-16	2,200 ±0%	1,200 ±0%	-45.45%
Alloc/10cols-16	22 ±0%	12 ±0%	-45.45%
Alloc/100cols-16	202 ±0%	102 ±0%	-49.50%
Alloc/1000cols-16	2,002 ±0%	1,002 ±0%	-49.95%
Alloc/WithTuples-16	20 ±0%	13 ±0%	-35.00%

Every column in RowData() calls NewWithError(), making this optimization highly impactful for queries with many columns. The improvement compounds: pre-sizing eliminates reallocation, and direct instantiation eliminates reflection — together achieving ~55-63% speedup with ~44% fewer allocations.

Related PRs

The following changes were previously bundled in this PR and have been split out:

perf: generic LRU cache with struct key for prepared statement cache #824 — Generic LRU cache with struct key for prepared statement cache
perf: reduce token-aware Pick() allocations #825 — Reduce token-aware Pick() allocations

Signed-off-by: Yaniv Kaul yaniv.kaul@scylladb.com

Copilot

Pull request overview

This PR optimizes common scan/unmarshal setup paths in the driver by reducing reflection and allocation overhead when creating destination values for row scanning (notably in TypeInfo.NewWithError() and Iter.RowData()), and adds benchmarks to measure the effect.

Changes:

Add fast paths in NativeType.NewWithError() and CollectionType.NewWithError() to avoid reflection for common primitive and collection types.
Optimize Iter.RowData() by pre-sizing slices using actualColCount and filling via direct indexing (including tuple expansion).
Add helpers_bench_test.go benchmarks to measure RowData() performance across column counts/types/tuples.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
marshal.go	Adds fast-path allocations for common types/collections and simplifies tuple destination creation.
helpers.go	Improves `RowData()` allocation/indexing strategy using `actualColCount` and tuple expansion logic.
helpers_bench_test.go	Introduces benchmarks for `RowData()` in several representative scenarios.

mykaul · 2026-03-17T18:07:14Z

Addressed review feedback:

Fixed (this push):

Added TestNewWithErrorConsistentWithGoType — a unit test that iterates all NativeType type codes and verifies the fast-path switch in NewWithError() produces the same Go type as the canonical goType() function. This catches any future drift between the two mappings.
Fixed tuple comment in TupleTypeInfo.NewWithError(): changed "Tuples are always []interface{}" to clarify it returns *[]interface{} (pointer to slice), matching the convention of other NewWithError implementations.

Copilot

Pull request overview

This PR optimizes common allocation paths in the driver by adding non-reflection “fast paths” for TypeInfo.NewWithError() and by reducing allocations in Iter.RowData(). It also adds unit tests to keep the new fast-path mappings consistent with the canonical goType() mapping, plus benchmarks for RowData() performance.

Changes:

Add fast-path implementations for NativeType.NewWithError() and CollectionType.NewWithError() to avoid reflection for common primitive/collection types.
Optimize Iter.RowData() by pre-sizing slices to actualColCount and using direct indexing (including tuple expansion).
Add unit tests ensuring NewWithError() fast paths stay consistent with goType(), and add RowData() benchmarks.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
marshal.go	Adds fast-path `NewWithError()` implementations and simplifies tuple `NewWithError()`.
marshal_test.go	Adds tests to ensure fast-path type allocation matches `goType()`’s canonical mapping.
helpers.go	Optimizes `Iter.RowData()` allocations and indexing using `actualColCount`, adds mismatch error.
helpers_bench_test.go	Adds microbenchmarks for `Iter.RowData()` across common shapes (wide/narrow/tuples).

Copilot

Pull request overview

This PR targets hot-path performance in result scanning and host selection by removing reflection-heavy allocations and reducing per-query heap churn in caches and iterators.

Changes:

Optimize Iter.RowData() allocation/indexing and add defensive metadata consistency checks.
Add direct instantiation fast paths in NativeType.NewWithError() / CollectionType.NewWithError() / TupleTypeInfo.NewWithError() to reduce reflection overhead, plus unit tests to keep mappings consistent with goType().
Reduce allocations in prepared statement caching (struct cache key + generic LRU) and token-aware host selection (in-place shuffle, in-place partition, small “used host” set), plus benchmarks.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
session.go	Updates LRU instantiations to use the new generic `lru.Cache[K]` types.
prepared_cache.go	Replaces concatenated string cache keys with a composite struct key and updates cache APIs accordingly.
prepared_cache_bench_test.go	Adds a micro-benchmark for the new prepared-statement cache key construction.
policies.go	Adds allocation-reducing helpers (`hostSet`, in-place shuffle/partition) and applies them in token-aware selection.
policies_test.go	Adjusts token-aware policy test setup/expectations to match new selection behavior.
policies_bench_test.go	Adds benchmarks for the new policy helpers.
marshal.go	Adds `NewWithError()` fast paths for native and common collection types; simplifies tuple allocation path.
marshal_test.go	Adds unit tests ensuring `NewWithError()` fast paths remain consistent with `goType()`.
internal/lru/lru.go	Generalizes the internal LRU to `Cache[K comparable]` to avoid interface key boxing and enable struct keys.
internal/lru/lru_test.go	Updates tests for generic cache and adds coverage for struct keys.
helpers.go	Reworks `RowData()` to pre-size and fill slices by index and to error on metadata inconsistencies.
helpers_bench_test.go	Adds a benchmark suite for `RowData()` scenarios (simple, wide, tuples, repeated calls).
conn.go	Pools `callReq` objects via `sync.Pool` and switches `prepareStatement` to struct cache keys.
conn_bench_test.go	Adds a benchmark comparing pooled vs unpooled `callReq` allocation.

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

conn.go:1230

With callReq now coming from a sync.Pool, the early-return path when addCall() fails should return resources: put the callReq back in the pool and clear the allocated stream (c.streams.Clear(stream)) when appropriate. Otherwise, failures (e.g., unexpected existingCall) will leak the callReq object and can also leak a stream slot.

	call := getCallReq(stream)

	if c.streamObserver != nil {
		call.streamObserverContext = c.streamObserver.StreamContext(ctx)
	}

	if err := c.addCall(call); err != nil {
		return nil, &QueryError{err: err, potentiallyExecuted: false}
	}

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

conn.go:1230

If addCall(call) fails, exec() currently returns without clearing the reserved stream ID and without returning the callReq to the pool. This can leak streams (reducing AvailableStreams over time) and defeats the pooling optimization. Consider releasing the stream (and returning the callReq to the pool) on this error path before returning.

	call := getCallReq(stream)

	if c.streamObserver != nil {
		call.streamObserverContext = c.streamObserver.StreamContext(ctx)
	}

	if err := c.addCall(call); err != nil {
		return nil, &QueryError{err: err, potentiallyExecuted: false}
	}

dkropachev · 2026-04-03T18:33:41Z

@mykaul , it looks good, could you please rebase it on master, i have another PR on top on this guy

mykaul · 2026-04-03T21:06:35Z

@mykaul , it looks good, could you please rebase it on master, i have another PR on top on this guy

done.

dkropachev · 2026-04-05T18:41:01Z

@mykaul , needs another rebase

Use pre-sizing and direct indexing in RowData() to improve performance and reduce allocations, especially for queries with tuples. Changes: - Pre-size slices using iter.meta.actualColCount instead of len(iter.Columns()) to account for tuple expansion, eliminating reallocation - Replace append operations with direct slice indexing to avoid bounds checking and length update overhead Performance improvements (measured on Intel i7-1270P, single core): - Regular columns: 2-4% faster across all column counts - Tuple columns: 12-13% faster with 128 bytes less memory and 2 fewer allocations per RowData() call Benchmark results: Baseline Optimized Improvement BenchmarkRowData 1122 ns/op 1092 ns/op 2.7% faster BenchmarkRowDataWithTuples 1556 ns/op 1369 ns/op 12.0% faster 616 B, 20 allocs 488 B, 18 allocs BenchmarkRowDataAllocation/100cols 10580 ns/op 10130 ns/op 4.3% faster BenchmarkRowDataAllocation/WithTuples 1602 ns/op 1390 ns/op 13.2% faster 616 B, 20 allocs 488 B, 18 allocs Added comprehensive benchmark suite in helpers_bench_test.go to measure RowData() performance across various column counts and data types. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Add fast-path type instantiation for all native CQL types to avoid expensive reflection calls when creating column value holders. Changes: - Added type switch in NativeType.NewWithError() with direct instantiation for all 17 native types (int, bigint, text, uuid, timestamp, etc.) - Falls back to reflection only for TypeCustom and complex types - Added required imports: time, math/big, gopkg.in/inf.v0 Performance improvements (measured on Intel i7-1270P, single core): Combined with previous pre-sizing/indexing optimization, achieves: - 35-66% faster RowData() across all workloads - 40-47% less memory allocation - 35-50% fewer allocations The reflection elimination provides 50-65% speedup on top of the pre-sizing optimization, with improvement scaling with column count. Benchmark comparison (baseline → optimized): BenchmarkRowData 1122 ns/op → 463 ns/op (58.7% faster) 720 B, 22 allocs → 400 B, 12 allocs BenchmarkRowDataLarge 5220 ns/op → 1865 ns/op (64.3% faster) 3792 B, 102 allocs → 2192 B, 52 allocs BenchmarkRowDataAllocation/100cols 10580 ns/op → 3624 ns/op (65.7% faster) 7584 B, 202 allocs → 4384 B, 102 allocs BenchmarkRowDataAllocation/1000cols 103552 ns/op → 35607 ns/op (65.6% faster) 72768 B, 2002 allocs → 40768 B, 1002 allocs Every column in RowData() calls NewWithError(), making this optimization highly impactful for queries with many columns. The improvement compounds with the previous commit's pre-sizing and direct indexing changes. Same improvement can be done (in a separate PR) to collections and tuples (and their NewWithError() functions) Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

…d tuple types Similiar to previous commit: Add type-specific fast paths in CollectionType.NewWithError() and TupleTypeInfo.NewWithError() to avoid expensive reflection calls during row data allocation. Changes: - CollectionType.NewWithError(): Fast paths for common patterns: * Lists/sets: []int, []int64, []string, []bool, []float32, []float64, []UUID, []time.Time, []int16, []int8, [][]byte * Maps: map[string]int, map[string]int64, map[string]string, map[string]bool, map[string]float64, map[string]UUID, map[int]string, map[int]int * Falls back to reflection for complex nested collections - TupleTypeInfo.NewWithError(): Simplified to always return new([]interface{}) since tuples unmarshal to []interface{} regardless of element types, completely eliminating reflection (Note - we may need to think of moving from interface to any? ) Performance impact: - Tuple-heavy queries: ~3% faster (1047→1017 ns/op) - Maintains performance for primitive-heavy workloads - Part of broader RowData() optimization series: * Combined improvements: 58.7% faster overall * Memory: -44% (720→400 B/op) * Allocations: -45% (22→12 allocs/op) Benchmarks show targeted benefits for queries using collections and tuples while preserving fast-path performance for queries dominated by native types. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

- Add defensive guard in RowData() to detect actualColCount metadata inconsistency: returns a clear error instead of silently producing trailing zero-values or panicking on index-out-of-bounds. - Set iter.err in column count mismatch guard to match error contract. - Add overflow bounds checks before slice indexing to prevent panics. - Add TypeFloat fast-path for map values in CollectionType.NewWithError() for both string-key and int-key maps, avoiding reflection for map[string]float32 and map[int]float32. - Move TestNewWithErrorConsistentWithGoType from helpers_bench_test.go to marshal_test.go (with unit build tag) where it logically belongs. - Add TestCollectionNewWithErrorConsistentWithGoType to verify that CollectionType.NewWithError() fast-paths stay in sync with goType() for list, set, and map types with common element/key combinations. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

mykaul · 2026-04-05T19:22:06Z

@mykaul , needs another rebase

done.

mykaul marked this pull request as draft March 16, 2026 18:04

mykaul requested a review from Copilot March 16, 2026 18:05

Copilot started reviewing on behalf of mykaul March 16, 2026 18:05 View session

mykaul mentioned this pull request Mar 16, 2026

perf: optimize column metadata parsing and readTypeInfo() #780

Merged

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Comment thread marshal.go

Comment thread marshal.go Outdated

mykaul force-pushed the rowdata_reflection_elimination branch from ec75393 to 1f4c312 Compare March 17, 2026 18:07

mykaul requested a review from Copilot March 17, 2026 19:37

Copilot started reviewing on behalf of mykaul March 17, 2026 19:41 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Comment thread helpers.go Outdated

Comment thread marshal.go

mykaul requested a review from Copilot March 20, 2026 21:30

Copilot started reviewing on behalf of mykaul March 20, 2026 21:30 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Comment thread policies.go Outdated

Comment thread marshal.go Outdated

mykaul force-pushed the rowdata_reflection_elimination branch from 6e774a2 to a08bf20 Compare March 23, 2026 10:54

mykaul requested a review from Copilot March 23, 2026 10:54

Copilot started reviewing on behalf of mykaul March 23, 2026 10:55 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread policies.go Outdated

Comment thread policies.go Outdated

Comment thread prepared_cache_bench_test.go Outdated

mykaul force-pushed the rowdata_reflection_elimination branch from a08bf20 to b8edd55 Compare March 23, 2026 13:29

mykaul requested a review from Copilot March 24, 2026 18:16

Copilot started reviewing on behalf of mykaul March 24, 2026 18:17 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Comment thread policies.go Outdated

mykaul force-pushed the rowdata_reflection_elimination branch from 37e2e7c to 342ebb9 Compare April 3, 2026 21:03

dkropachev requested changes Apr 4, 2026

View reviewed changes

Comment thread prepared_cache_bench_test.go Outdated

mykaul force-pushed the rowdata_reflection_elimination branch from 342ebb9 to 0f2e686 Compare April 5, 2026 13:48

This was referenced Apr 5, 2026

perf: generic LRU cache with struct key for prepared statement cache #824

Merged

perf: reduce token-aware Pick() allocations #825

Merged

mykaul force-pushed the rowdata_reflection_elimination branch from 0f2e686 to f9ec857 Compare April 5, 2026 14:03

mykaul mentioned this pull request Apr 5, 2026

perf: optimize collection marshal/unmarshal write path (10's-100's ns improvements - ~14% improvement) #826

Draft

dkropachev marked this pull request as ready for review April 5, 2026 18:41

mykaul added 4 commits April 5, 2026 21:52

mykaul force-pushed the rowdata_reflection_elimination branch from f9ec857 to 79f26ef Compare April 5, 2026 18:53

dkropachev approved these changes Apr 5, 2026

View reviewed changes

dkropachev merged commit f0df6a5 into scylladb:master Apr 5, 2026
7 of 9 checks passed

Conversation

mykaul commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commit 1: Optimize RowData() allocation and assignment performance

Commit 2: Eliminate reflection in NativeType.NewWithError()

Commit 3: Eliminate reflection for collection and tuple types

Commit 4: Harden RowData + improve NewWithError test coverage

Benchmark results

Speed — 55.2% geomean improvement

Memory (B/op) — 44.1% geomean reduction

Allocations (allocs/op) — 44.0% geomean reduction

Related PRs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

mykaul commented Mar 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

dkropachev commented Apr 3, 2026

Uh oh!

mykaul commented Apr 3, 2026

Uh oh!

Uh oh!

dkropachev commented Apr 5, 2026

Uh oh!

mykaul commented Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mykaul commented Mar 16, 2026 •

edited

Loading