Skip to content

Releases: nearai/cvm-compose-files

v0.0.55

10 Mar 13:52

Choose a tag to compare

fix: disable EAGLE3 speculative decoding for gpt-oss-120b

Streaming responses were consistently dropping the last 1-2 tokens due to a vLLM v0.12.0 EAGLE3 bug. Non-streaming was unaffected.

v0.0.54

08 Mar 21:09

Choose a tag to compare

fix: update gpt-oss image

v0.0.53

08 Mar 20:32

Choose a tag to compare

Update model configs, remove GLM-4.7, add gpt-oss-single

v0.0.52

06 Mar 23:06

Choose a tag to compare

Full Changelog: v0.0.51...v0.0.52

v0.0.50

06 Mar 18:23
8e390a9

Choose a tag to compare

Changes

  • Update nearaidev/vllm-proxy-rs image digest across GLM-5, small-models, and Qwen3.5-122B configs (#3)
  • Remove unrecognized vLLM args (--max-cudagraph-capture-size, --stream-interval) from gpt-oss-120b (#4)

v0.0.49

06 Mar 11:34
45c047b

Choose a tag to compare

Changes

  • Increase nginx proxy_read_timeout from 300s to 3600s across all CVM configs — fixes timeout errors for long-context inference requests (100K+ tokens) where prefill exceeds 5 minutes (#1)
  • Update gpt-oss vLLM image to newer version (#2)

v0.0.48

05 Mar 16:10

Choose a tag to compare

  • Update gpt-oss vllm image digest

v0.0.47

05 Mar 09:04

Choose a tag to compare

  • Update vllm-proxy-rs to d46dd03 (attestation cache, GPU evidence serialization, retry on failure)

v0.0.46

04 Mar 13:51

Choose a tag to compare

  • fix: update all vllm/vllm-openai images to new digest to address CUDA/NCCL crashes
  • fix: reduce gpt-oss-120b GPU memory utilization to 0.90

v0.0.45

04 Mar 13:24

Choose a tag to compare

  • fix: reduce gpt-oss-120b GPU memory utilization from 0.95 to 0.90 to address CUDA OOM crashes