Query Frontend JSON marshal/unmarshal is 5x slower than Thanos Query
Thanos version: v0.41.0
Problem
When querying large responses through Thanos Query Frontend (QFE), JSON serialization dominates total latency. Profiling shows QFE's json.Marshal is 5x slower than Thanos Query's json.Marshal for the same data.
Production measurements (query: sum((process_open_fds{instance=~"().*"} / process_max_fds{instance=~"().*"}) * 100) by (instance), 6h range, return 1w series result):
| Total |
QFE unmarshal |
QFE marshal |
Query data fetch |
Query marshal |
| 10s |
3.4s |
4s |
2.8s |
0.8s |
jaeger trace:

Root Cause
Thanos Query is fast because Prometheus registers custom jsoniter streaming encoders via RegisterTypeEncoderFunc for promql.Matrix, promql.Series, promql.FPoint, etc. These write directly to the jsoniter Stream buffer using strconv.AppendFloat / Stream.WriteInt64, with near-zero allocations per sample:
Thanos QFE is slow because internal/cortex/querier/queryrange/query_range.go uses SampleStream.MarshalJSON() which converts cortexpb types → model.SampleStream → calls model.SampleStream.MarshalJSON() → which internally calls encoding/json.Marshal (Go standard library, reflection-based). For each SamplePair, this results in 3× encoding/json.Marshal calls + 1× fmt.Sprintf:
For a response with 11k series × 161 samples = 1.9M sample points, this means ~5.7M reflection-based encoding/json.Marshal calls and ~1.9M fmt.Sprintf string allocations.
As benchmarked in the original Prometheus PR #3536, simply switching to jsoniter without registering custom type encoders provides no improvement, because jsoniter respects MarshalJSON methods which fall back to encoding/json internally.
Solution
The upstream Cortex project has already solved this in their tripperware package by registering custom jsoniter encoders/decoders for SampleStream:
Thanos could apply the same approach to internal/cortex/querier/queryrange/query_range.go — register jsoniter.RegisterTypeEncoderFunc / RegisterTypeDecoderFunc for queryrange.SampleStream to bypass the MarshalJSON → encoding/json path.
Query Frontend JSON marshal/unmarshal is 5x slower than Thanos Query
Thanos version: v0.41.0
Problem
When querying large responses through Thanos Query Frontend (QFE), JSON serialization dominates total latency. Profiling shows QFE's
json.Marshalis 5x slower than Thanos Query'sjson.Marshalfor the same data.Production measurements (query:
sum((process_open_fds{instance=~"().*"} / process_max_fds{instance=~"().*"}) * 100) by (instance), 6h range, return 1w series result):jaeger trace:

Root Cause
Thanos Query is fast because Prometheus registers custom
jsoniterstreaming encoders viaRegisterTypeEncoderFuncforpromql.Matrix,promql.Series,promql.FPoint, etc. These write directly to the jsoniter Stream buffer usingstrconv.AppendFloat/Stream.WriteInt64, with near-zero allocations per sample:Thanos QFE is slow because
internal/cortex/querier/queryrange/query_range.gousesSampleStream.MarshalJSON()which converts cortexpb types →model.SampleStream→ callsmodel.SampleStream.MarshalJSON()→ which internally callsencoding/json.Marshal(Go standard library, reflection-based). For eachSamplePair, this results in 3×encoding/json.Marshalcalls + 1×fmt.Sprintf:For a response with 11k series × 161 samples = 1.9M sample points, this means ~5.7M reflection-based
encoding/json.Marshalcalls and ~1.9Mfmt.Sprintfstring allocations.As benchmarked in the original Prometheus PR #3536, simply switching to
jsoniterwithout registering custom type encoders provides no improvement, because jsoniter respectsMarshalJSONmethods which fall back toencoding/jsoninternally.Solution
The upstream Cortex project has already solved this in their
tripperwarepackage by registering custom jsoniter encoders/decoders forSampleStream:init()registeringencodeSampleStream/decodeSampleStream)Thanos could apply the same approach to
internal/cortex/querier/queryrange/query_range.go— registerjsoniter.RegisterTypeEncoderFunc/RegisterTypeDecoderFuncforqueryrange.SampleStreamto bypass theMarshalJSON→encoding/jsonpath.