Source: Cameron, #private-inference, May 19. Cursor (and likely other partners) ask for our TPS / throughput numbers on shared open-source models before they'll route traffic to us.
What
- Publish per-model TPS, tokens/sec/card, TTFT, ITL for the models we serve (GLM-5.1, Qwen3.5-122B, Qwen3.6-35B, gpt-oss-120b, Gemma-4-31B, …)
- Reproducible methodology so partners can verify
- Comparable baseline (e.g. Scaleway, Together, Fireworks)
Where
- Ideally a public page or PDF we can hand to partners
- genai-benchmark already produces the numbers — gap is the publication artifact
Related
- nearai/infra#127 (Qwen 3.6 perf vs Scaleway — overlapping methodology)
- Lloyd's existing Scaleway report: Notion
Source: Cameron, #private-inference, May 19. Cursor (and likely other partners) ask for our TPS / throughput numbers on shared open-source models before they'll route traffic to us.
What
Where
Related