Problem
We have Go benchmarks (make bench) but they are not integrated into CI. There is currently no automated way to detect performance regressions introduced by a PR. Maintainers must run benchmarks manually or rely on real cluster testing to validate performance, which slows down the development cycle.
Current state
make bench only covers pkg/preprocessing/chat_completions/ and pkg/tokenization/
- 12
Benchmark* functions exist across 4 files, but several are not included in make bench
- No benchstat, no performance comparison tooling, no CI benchmark job
Proposal
Add a new GitHub Actions workflow (.github/workflows/ci-bench.yaml) that compares benchmark results between the PR branch and main using benchstat:
- Run
go test -bench=. -benchmem -count=5 on the PR branch → new.txt
- Checkout
main, run the same command → old.txt
- Run
benchstat old.txt new.txt and post the comparison as a PR comment
- (Optional, future) Fail CI if any benchmark regresses beyond a configurable threshold
Benchmarks to track
| Package |
Key benchmarks |
pkg/preprocessing/chat_completions/ |
BenchmarkRenderChat, BenchmarkRender, BenchmarkGetOrCreateTokenizerKey |
pkg/tokenization/ |
BenchmarkAsyncTokenizationStress, BenchmarkSyncTokenizationStress |
pkg/kvevents/ |
BenchmarkZMQSubscriber_Throughput |
pkg/kvcache/kvblock/ |
(see "New benchmarks" below) |
All benchmarks should use -benchmem to track allocation counts and catch GC pressure regressions.
CI performance impact
- Runs as a separate workflow (
ci-bench.yaml), fully parallel with existing ci-test and ci-lint does not slow them down
- Can be configured as a non-required check so it never blocks PR merges
- Can be scoped to only trigger on changes to
pkg/**/*.go, go.mod, go.sum
@vMaroon @sagearc comments for this new ci workflow?
Problem
We have Go benchmarks (
make bench) but they are not integrated into CI. There is currently no automated way to detect performance regressions introduced by a PR. Maintainers must run benchmarks manually or rely on real cluster testing to validate performance, which slows down the development cycle.Current state
make benchonly coverspkg/preprocessing/chat_completions/andpkg/tokenization/Benchmark*functions exist across 4 files, but several are not included inmake benchProposal
Add a new GitHub Actions workflow (
.github/workflows/ci-bench.yaml) that compares benchmark results between the PR branch andmainusing benchstat:go test -bench=. -benchmem -count=5on the PR branch →new.txtmain, run the same command →old.txtbenchstat old.txt new.txtand post the comparison as a PR commentBenchmarks to track
pkg/preprocessing/chat_completions/BenchmarkRenderChat,BenchmarkRender,BenchmarkGetOrCreateTokenizerKeypkg/tokenization/BenchmarkAsyncTokenizationStress,BenchmarkSyncTokenizationStresspkg/kvevents/BenchmarkZMQSubscriber_Throughputpkg/kvcache/kvblock/All benchmarks should use
-benchmemto track allocation counts and catch GC pressure regressions.CI performance impact
ci-bench.yaml), fully parallel with existingci-testandci-lintdoes not slow them downpkg/**/*.go,go.mod,go.sum@vMaroon @sagearc comments for this new ci workflow?