Query Frontend JSON marshal/unmarshal is 5x slower than Thanos Query

## Query Frontend JSON marshal/unmarshal is 5x slower than Thanos Query

**Thanos version:** v0.41.0

### Problem

When querying large responses through Thanos Query Frontend (QFE), JSON serialization dominates total latency. Profiling shows QFE's `json.Marshal` is **5x slower** than Thanos Query's `json.Marshal` for the same data.

**Production measurements** (query: `sum((process_open_fds{instance=~"().*"} / process_max_fds{instance=~"().*"}) * 100) by (instance)`, 6h range, return 1w series result):

|  Total | QFE unmarshal | QFE marshal | Query data fetch | Query marshal |
|---|---|---|---|---|
| 10s | 3.4s | 4s | 2.8s | 0.8s |

jaeger trace:
<img width="1512" height="582" alt="Image" src="https://github.com/user-attachments/assets/0aabad4a-99f8-4119-9920-fd869df646e3" />

### Root Cause

**Thanos Query is fast** because Prometheus registers custom `jsoniter` streaming encoders via `RegisterTypeEncoderFunc` for `promql.Matrix`, `promql.Series`, `promql.FPoint`, etc. These write directly to the jsoniter Stream buffer using `strconv.AppendFloat` / `Stream.WriteInt64`, with near-zero allocations per sample:

- https://github.com/prometheus/prometheus/blob/v0.308.0/web/api/v1/json_codec.go#L27-L36

**Thanos QFE is slow** because `internal/cortex/querier/queryrange/query_range.go` uses `SampleStream.MarshalJSON()` which converts cortexpb types → `model.SampleStream` → calls `model.SampleStream.MarshalJSON()` → which internally calls `encoding/json.Marshal` (Go standard library, reflection-based). For each `SamplePair`, this results in 3× `encoding/json.Marshal` calls + 1× `fmt.Sprintf`:

- https://github.com/prometheus/common/blob/v0.67.5/model/value.go#L193-L225
- https://github.com/prometheus/common/blob/v0.67.5/model/value_float.go#L73-L83

For a response with 11k series × 161 samples = **1.9M sample points**, this means **~5.7M reflection-based `encoding/json.Marshal` calls** and **~1.9M `fmt.Sprintf` string allocations**.

As [benchmarked in the original Prometheus PR #3536](https://github.com/prometheus/prometheus/pull/3536#issuecomment-364collecting), simply switching to `jsoniter` without registering custom type encoders provides **no improvement**, because jsoniter respects `MarshalJSON` methods which fall back to `encoding/json` internally.

### Solution

The upstream Cortex project has already solved this in their `tripperware` package by registering custom jsoniter encoders/decoders for `SampleStream`:

- Decoder: [cortex PR #5349](https://github.com/cortexproject/cortex/pull/5349) (2023-05, ~25% decode improvement)
- Encoder: [cortex PR #6816](https://github.com/cortexproject/cortex/pull/6816) (2025-06, JSON codec performance enhancement)
- Current implementation: https://github.com/cortexproject/cortex/blob/master/pkg/querier/tripperware/query.go (see `init()` registering `encodeSampleStream` / `decodeSampleStream`)

Thanos could apply the same approach to `internal/cortex/querier/queryrange/query_range.go` — register `jsoniter.RegisterTypeEncoderFunc` / `RegisterTypeDecoderFunc` for `queryrange.SampleStream` to bypass the `MarshalJSON` → `encoding/json` path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query Frontend JSON marshal/unmarshal is 5x slower than Thanos Query #8763