Commit c5fb0b2

committed

perf: optimize vector marshal/unmarshal for float32/float64/int32/int64

Add type-specialized fast paths for vector<float>, vector<double>, vector<int>, and vector<bigint> that bypass reflect-based per-element marshaling in favor of direct encoding/binary bulk conversion. Changes in marshal.go: - Type switches in marshalVector()/unmarshalVector() dispatch to dedicated functions for []float32, []float64, []int32, []int64 before falling through to the generic reflect path. - 8 new functions: marshalVectorFloat32, marshalVectorFloat64, unmarshalVectorFloat32, unmarshalVectorFloat64, marshalVectorInt32, marshalVectorInt64, unmarshalVectorInt32, unmarshalVectorInt64. - sync.Pool buffer reuse (vectorBufPool/getVectorBuf/putVectorBuf) for zero-alloc steady state when callers return buffers after the framer copies them. 64KiB cap prevents pool bloat. - Unmarshal fast paths reuse destination slice backing array when capacity is sufficient (zero-alloc steady state on read path). - Generic path preallocation via vectorFixedElemSize() + buf.Grow() for non-fast-path fixed-size types (e.g. UUID, timestamp). - vectorByteSize() helper guards against integer overflow on 32-bit platforms with corrupt or adversarial schema metadata. - All fast-path errors are wrapped as MarshalError/UnmarshalError for consistent error typing. - dim=0 vectors correctly encode as non-nil empty values (not CQL NULL) in both fast paths and generic path. - Negative dimensions are rejected with clear error messages. Benchmark results for vector<float, 1536> (typical embedding dimension): Marshal (baseline -> optimized): 86.4 us/op -> 3.4 us/op (25x faster) 3081 allocs -> 2 allocs (99.94% fewer) 28632 B/op -> 6172 B/op (78% less memory) Marshal with pool return (steady state): 86.4 us/op -> 1.6 us/op (54x faster) 3081 allocs -> 2 allocs (99.94% fewer) 28632 B/op -> 48 B/op (99.8% less memory) Unmarshal (baseline -> optimized): 60.2 us/op -> 1.5 us/op (41x faster) 2 allocs -> 0 allocs (100% fewer) 6168 B/op -> 0 B/op (100% less memory) Round-trip (baseline -> optimized, pooled): 147.8 us/op -> 3.1 us/op (48x faster) 3083 allocs -> 2 allocs (99.94% fewer) 34800 B/op -> 48 B/op (99.9% less memory) Throughput: 80 MB/s -> 3.5 GB/s (geomean, +2900%) New test files: - marshal_vector_test.go: 58+ unit subtests across 13 categories (round-trip, byte-compat, slice-reuse, nil, dimension-mismatch, empty-vector, pointer-to-slice, special-values, pool-concurrency, oversized-not-pooled, fixed-elem-size, generic-prealloc). - vector_bench_test.go: extended with int32/int64 and pooled benchmarks. - tests/bench/bench_vector_public_test.go: public API benchmarks for int32/int64 marshal/unmarshal. Subsumes PR #744 (float fast paths) and PR #745 (generic prealloc). Extends with int32/int64 fast paths and buffer pooling not covered by any existing PR.

1 parent 04c41b0 commit c5fb0b2Copy full SHA for c5fb0b2

4 files changed

marshal.go
marshal_vector_test.go
tests/bench
- bench_vector_public_test.go
vector_bench_test.go

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit c5fb0b2

File tree

0 commit comments