Skip to content

perf: optimize batch execution path for reduced allocations#774

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/batch-optimizations
Draft

perf: optimize batch execution path for reduced allocations#774
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/batch-optimizations

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Mar 14, 2026

Motivation

The BATCH execution path (executeBatch / writeBatchFrame) in the gocql driver performed several avoidable allocations per batch execution: individual make() calls for each statement's queryValues slice, sequential preparation of all statements, and unbounded recursive retries on RequestErrUnprepared errors.

Changes

  1. Concurrent statement preparation: Split executeBatch into a retry loop (executeBatch) and single-attempt logic (executeBatchOnce). Unique statements are collected into a map and prepared concurrently via goroutines + sync.WaitGroup. The existing prepareStatement coalesces concurrent calls for the same statement via execIfMissing on the LRU cache, so this is safe.

  2. Bulk-allocate queryValues: Two-pass approach: first pass computes totalValues across all entries, second pass slices from a single []queryValues allocation. Replaces N individual make([]queryValues, colCount) calls with one.

  3. Eliminate stmts map: The old code maintained a stmts map[string]string to map prepared IDs back to statement text for error recovery. This is no longer needed — on RequestErrUnprepared, we iterate batch entries directly and use evictPreparedID with the stale statement ID, which only removes the specific stale cache entry (safe under concurrent re-preparation).

  4. Reserve(n int) API: Added Batch.Reserve(n int) *Batch to session.go, which pre-allocates the Entries slice capacity. Named Reserve (not Size) because Size() already exists and returns len(b.Entries).

  5. Bounded retry on RequestErrUnprepared: Converted the unbounded recursive executeBatch call to an iterative loop with maxBatchPrepareRetries = 1. If re-preparation fails twice, the error is returned rather than retrying indefinitely.

Benchmark results

Synthetic, 8 samples, not end-to-end latency.
These benchmarks measure allocation patterns only. They do not measure real Cassandra/Scylla round-trip latency or throughput. The numbers below reflect the cost of building batch data structures in isolation.

Reserve() — Batch.Query append

Entries Metric Before After Improvement
10 B/op 2,407 1,110 -54%
10 allocs/op 35 31 -11%
100 B/op 21,791 11,720 -46%
100 allocs/op 308 301 -2%

Bulk queryValues allocation — writeBatchFrame construction

Entries Metric Before After Improvement
10 B/op 3,434 2,294 -33%
10 allocs/op 74 42 -43%
100 B/op 32,319 21,668 -33%
100 allocs/op 704 402 -43%

I do hope other than reduced allocations, we may see some reduced latency, though I don't expect that much.

Files changed: conn.go, session.go
Files added: batch_bench_test.go

@mykaul mykaul marked this pull request as draft March 14, 2026 09:15
@mykaul mykaul requested a review from Copilot March 14, 2026 19:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on reducing allocations and improving performance when constructing and executing batches in the GoCQL driver, including adding benchmarks to measure the impact.

Changes:

  • Add (*Batch).Reserve(n) to pre-allocate capacity for batch entries.
  • Refactor batch execution to (a) prepare unique statements concurrently and (b) bulk-allocate queryValues to reduce per-entry allocations.
  • Adjust RequestErrUnprepared handling to support retry/eviction without a separate preparedID→statement map, and add new batch-related benchmarks.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
session.go Adds Batch.Reserve to pre-allocate Entries capacity.
conn.go Refactors executeBatch to concurrent-prepare unique statements, bulk-allocate queryValues, and introduces bounded retry on Unprepared.
frame.go Extends batchStatment to carry original CQL text (sourceStmt) for Unprepared recovery.
errors.go Adds internal stmt field to RequestErrUnprepared to carry original statement text through retry logic.
batch_bench_test.go Adds benchmarks targeting batch append/build/serialization allocation patterns.

Comment thread conn.go Outdated
Comment thread conn.go Outdated
Comment thread conn.go
Comment thread batch_bench_test.go
Comment thread batch_bench_test.go Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves batch-building and batch-execution performance in the GoCQL driver by reducing allocations and improving prepared-statement handling during batch execution.

Changes:

  • Add (*Batch).Reserve(n) to pre-allocate Batch.Entries capacity when building large batches.
  • Refactor Conn.executeBatch to (a) prepare distinct statements concurrently and (b) bulk-allocate queryValues for all batch entries to reduce per-entry allocations.
  • Add microbenchmarks covering batch entry appends and writeBatchFrame build/serialization allocation patterns.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
session.go Adds Batch.Reserve to reduce allocations when constructing batches.
conn.go Refactors batch execution to prepare unique statements concurrently, bulk-allocate marshaled values, and bounds Unprepared retries.
batch_bench_test.go Adds benchmarks to quantify allocation improvements in batch construction and frame building/serialization.

Comment thread conn.go Outdated
Comment thread conn.go Outdated
Comment thread conn.go Outdated
Comment thread conn.go Outdated
Comment thread batch_bench_test.go Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes batch construction/execution in the driver by adding batch entry preallocation, refactoring batch execution to reduce allocations (bulk queryValues allocation) and prepare statements more efficiently, and introducing benchmarks to measure the impact.

Changes:

  • Add (*Batch).Reserve(n) to pre-allocate Batch.Entries capacity.
  • Refactor Conn.executeBatch to use a bounded retry loop for RequestErrUnprepared, and update batch frame building to bulk-allocate queryValues.
  • Add batch_bench_test.go with benchmarks targeting batch append/build/serialization patterns.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
session.go Adds Batch.Reserve to reduce allocations when building large batches.
conn.go Refactors batch execution: bounded unprepared retry, concurrent prepare of unique statements, and bulk allocation of query values.
batch_bench_test.go Adds benchmarks for batch append preallocation and batch frame build/serialization allocation patterns.

Comment thread conn.go Outdated
Comment thread conn.go
Comment thread conn.go Outdated
Comment thread batch_bench_test.go
Comment thread batch_bench_test.go
@mykaul mykaul force-pushed the perf/batch-optimizations branch from 0eff368 to be36aa7 Compare April 7, 2026 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants