perf: optimize batch execution path for reduced allocations#774
perf: optimize batch execution path for reduced allocations#774mykaul wants to merge 1 commit intoscylladb:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR focuses on reducing allocations and improving performance when constructing and executing batches in the GoCQL driver, including adding benchmarks to measure the impact.
Changes:
- Add
(*Batch).Reserve(n)to pre-allocate capacity for batch entries. - Refactor batch execution to (a) prepare unique statements concurrently and (b) bulk-allocate
queryValuesto reduce per-entry allocations. - Adjust
RequestErrUnpreparedhandling to support retry/eviction without a separate preparedID→statement map, and add new batch-related benchmarks.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| session.go | Adds Batch.Reserve to pre-allocate Entries capacity. |
| conn.go | Refactors executeBatch to concurrent-prepare unique statements, bulk-allocate queryValues, and introduces bounded retry on Unprepared. |
| frame.go | Extends batchStatment to carry original CQL text (sourceStmt) for Unprepared recovery. |
| errors.go | Adds internal stmt field to RequestErrUnprepared to carry original statement text through retry logic. |
| batch_bench_test.go | Adds benchmarks targeting batch append/build/serialization allocation patterns. |
0a6058b to
fa5fba8
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves batch-building and batch-execution performance in the GoCQL driver by reducing allocations and improving prepared-statement handling during batch execution.
Changes:
- Add
(*Batch).Reserve(n)to pre-allocateBatch.Entriescapacity when building large batches. - Refactor
Conn.executeBatchto (a) prepare distinct statements concurrently and (b) bulk-allocatequeryValuesfor all batch entries to reduce per-entry allocations. - Add microbenchmarks covering batch entry appends and writeBatchFrame build/serialization allocation patterns.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
session.go |
Adds Batch.Reserve to reduce allocations when constructing batches. |
conn.go |
Refactors batch execution to prepare unique statements concurrently, bulk-allocate marshaled values, and bounds Unprepared retries. |
batch_bench_test.go |
Adds benchmarks to quantify allocation improvements in batch construction and frame building/serialization. |
There was a problem hiding this comment.
Pull request overview
This PR optimizes batch construction/execution in the driver by adding batch entry preallocation, refactoring batch execution to reduce allocations (bulk queryValues allocation) and prepare statements more efficiently, and introducing benchmarks to measure the impact.
Changes:
- Add
(*Batch).Reserve(n)to pre-allocateBatch.Entriescapacity. - Refactor
Conn.executeBatchto use a bounded retry loop forRequestErrUnprepared, and update batch frame building to bulk-allocatequeryValues. - Add
batch_bench_test.gowith benchmarks targeting batch append/build/serialization patterns.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| session.go | Adds Batch.Reserve to reduce allocations when building large batches. |
| conn.go | Refactors batch execution: bounded unprepared retry, concurrent prepare of unique statements, and bulk allocation of query values. |
| batch_bench_test.go | Adds benchmarks for batch append preallocation and batch frame build/serialization allocation patterns. |
0eff368 to
be36aa7
Compare
Motivation
The BATCH execution path (executeBatch / writeBatchFrame) in the gocql driver performed several avoidable allocations per batch execution: individual make() calls for each statement's queryValues slice, sequential preparation of all statements, and unbounded recursive retries on RequestErrUnprepared errors.
Changes
Concurrent statement preparation: Split executeBatch into a retry loop (executeBatch) and single-attempt logic (executeBatchOnce). Unique statements are collected into a map and prepared concurrently via goroutines + sync.WaitGroup. The existing prepareStatement coalesces concurrent calls for the same statement via execIfMissing on the LRU cache, so this is safe.
Bulk-allocate queryValues: Two-pass approach: first pass computes totalValues across all entries, second pass slices from a single []queryValues allocation. Replaces N individual make([]queryValues, colCount) calls with one.
Eliminate stmts map: The old code maintained a stmts map[string]string to map prepared IDs back to statement text for error recovery. This is no longer needed — on RequestErrUnprepared, we iterate batch entries directly and use evictPreparedID with the stale statement ID, which only removes the specific stale cache entry (safe under concurrent re-preparation).
Reserve(n int) API: Added Batch.Reserve(n int) *Batch to session.go, which pre-allocates the Entries slice capacity. Named Reserve (not Size) because Size() already exists and returns len(b.Entries).
Bounded retry on RequestErrUnprepared: Converted the unbounded recursive executeBatch call to an iterative loop with maxBatchPrepareRetries = 1. If re-preparation fails twice, the error is returned rather than retrying indefinitely.
Benchmark results
Synthetic, 8 samples, not end-to-end latency.
These benchmarks measure allocation patterns only. They do not measure real Cassandra/Scylla round-trip latency or throughput. The numbers below reflect the cost of building batch data structures in isolation.
Reserve() — Batch.Query append
Bulk queryValues allocation — writeBatchFrame construction
I do hope other than reduced allocations, we may see some reduced latency, though I don't expect that much.
Files changed: conn.go, session.go
Files added: batch_bench_test.go