ci: Add benchstat-based performance regression detection

## Problem

We have Go benchmarks (`make bench`) but they are **not integrated into CI**. There is currently no automated way to detect performance regressions introduced by a PR. Maintainers must run benchmarks manually or rely on real cluster testing to validate performance, which slows down the development cycle.

### Current state

- `make bench` only covers `pkg/preprocessing/chat_completions/` and `pkg/tokenization/`
- 12 `Benchmark*` functions exist across 4 files, but several are not included in `make bench`
- No benchstat, no performance comparison tooling, no CI benchmark job

## Proposal

Add a new GitHub Actions workflow (`.github/workflows/ci-bench.yaml`) that compares benchmark results between the PR branch and `main` using [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat):

1. Run `go test -bench=. -benchmem -count=5` on the PR branch → `new.txt`
2. Checkout `main`, run the same command → `old.txt`
3. Run `benchstat old.txt new.txt` and **post the comparison as a PR comment**
4. (Optional, future) Fail CI if any benchmark regresses beyond a configurable threshold

### Benchmarks to track

| Package | Key benchmarks |
|---------|---------------|
| `pkg/preprocessing/chat_completions/` | `BenchmarkRenderChat`, `BenchmarkRender`, `BenchmarkGetOrCreateTokenizerKey` |
| `pkg/tokenization/` | `BenchmarkAsyncTokenizationStress`, `BenchmarkSyncTokenizationStress` |
| `pkg/kvevents/` | `BenchmarkZMQSubscriber_Throughput` |
| `pkg/kvcache/kvblock/` | (see "New benchmarks" below) |

All benchmarks should use `-benchmem` to track allocation counts and catch GC pressure regressions.

### CI performance impact

- Runs as a **separate workflow** (`ci-bench.yaml`), fully parallel with existing `ci-test` and `ci-lint` does not slow them down
- Can be configured as a **non-required** check so it never blocks PR merges
- Can be scoped to only trigger on changes to `pkg/**/*.go`, `go.mod`, `go.sum`

@vMaroon @sagearc comments for this new ci workflow?

Package	Key benchmarks
`pkg/preprocessing/chat_completions/`	`BenchmarkRenderChat`, `BenchmarkRender`, `BenchmarkGetOrCreateTokenizerKey`
`pkg/tokenization/`	`BenchmarkAsyncTokenizationStress`, `BenchmarkSyncTokenizationStress`
`pkg/kvevents/`	`BenchmarkZMQSubscriber_Throughput`
`pkg/kvcache/kvblock/`	(see "New benchmarks" below)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: Add benchstat-based performance regression detection #489

Problem

Current state

Proposal

Benchmarks to track

CI performance impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ci: Add benchstat-based performance regression detection #489

Description

Problem

Current state

Proposal

Benchmarks to track

CI performance impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions