Releases: nearai/cvm-compose-files
Releases · nearai/cvm-compose-files
v0.0.55
10 Mar 13:52
Compare
Sorry, something went wrong.
No results found
fix: disable EAGLE3 speculative decoding for gpt-oss-120b
Streaming responses were consistently dropping the last 1-2 tokens due to a vLLM v0.12.0 EAGLE3 bug. Non-streaming was unaffected.
v0.0.54
08 Mar 21:09
Compare
Sorry, something went wrong.
No results found
fix: update gpt-oss image
v0.0.53
08 Mar 20:32
Compare
Sorry, something went wrong.
No results found
Update model configs, remove GLM-4.7, add gpt-oss-single
v0.0.52
06 Mar 23:06
Compare
Sorry, something went wrong.
No results found
v0.0.50
06 Mar 18:23
Compare
Sorry, something went wrong.
No results found
Changes
Update nearaidev/vllm-proxy-rs image digest across GLM-5, small-models, and Qwen3.5-122B configs (#3 )
Remove unrecognized vLLM args (--max-cudagraph-capture-size, --stream-interval) from gpt-oss-120b (#4 )
v0.0.49
06 Mar 11:34
Compare
Sorry, something went wrong.
No results found
Changes
Increase nginx proxy_read_timeout from 300s to 3600s across all CVM configs — fixes timeout errors for long-context inference requests (100K+ tokens) where prefill exceeds 5 minutes (#1 )
Update gpt-oss vLLM image to newer version (#2 )
v0.0.48
05 Mar 16:10
Compare
Sorry, something went wrong.
No results found
Update gpt-oss vllm image digest
v0.0.47
05 Mar 09:04
Compare
Sorry, something went wrong.
No results found
Update vllm-proxy-rs to d46dd03 (attestation cache, GPU evidence serialization, retry on failure)
v0.0.46
04 Mar 13:51
Compare
Sorry, something went wrong.
No results found
fix: update all vllm/vllm-openai images to new digest to address CUDA/NCCL crashes
fix: reduce gpt-oss-120b GPU memory utilization to 0.90
v0.0.45
04 Mar 13:24
Compare
Sorry, something went wrong.
No results found
fix: reduce gpt-oss-120b GPU memory utilization from 0.95 to 0.90 to address CUDA OOM crashes