Skip to content

Commit 0c96763

Browse files
committed
bump vllm-proxy-rs to 59e42dd6 + plumb CLOUD_API_USAGE_TOKEN
Brings all three compose files onto the latest reproducible vllm-proxy-rs image (sha256:59e42dd6, :latest as of 2026-05-26 13:33Z — includes nearai/inference-proxy#143's service-token reporting path plus subsequent main merges) and adds the CLOUD_API_USAGE_TOKEN env var to every vllm-proxy-* service. - GLM-5.1.yaml: was already on 59e42dd6 (bumped in a prior commit); adds the env var to proxy-glm51. - Qwen3.5-122B.yaml: 05ad3e83 → 59e42dd6 + env var on vllm-proxy-qwen35. - small-models.yaml: 05ad3e83 → 59e42dd6 (shared anchor) + env var on all 10 vllm-proxy-* services. The env var is the inference-proxy half of the usage-endpoint lockdown (cloud-api#665 added POST /v1/internal/usage; cloud-api side is already live on staging + prod). When the host .env doesn't define CLOUD_API_USAGE_TOKEN, Docker Compose interpolates "" → inference-proxy coerces empty to None → can_use_service_token_path() is false → the reporter stays on the legacy POST /v1/usage + sk- path. So this is safe to deploy before the token is set on inference CVMs; the switch to /v1/internal/usage only flips once the same secret (matching the cloud-api side) is present in the host env. Supersedes #46 (which targeted the older c1208db4 digest and predated main's GLM-5.1 bump).
1 parent 18bc809 commit 0c96763

3 files changed

Lines changed: 14 additions & 2 deletions

File tree

GLM-5.1.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ services:
8686
environment:
8787
- NVIDIA_VISIBLE_DEVICES=all
8888
- CLOUD_API_URL=https://cloud-api.near.ai
89+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
8990
- COMPOSE_MANAGER_URL=http://compose-manager:8080
9091
- LOG_FORMAT=json
9192
- MODEL_NAME=zai-org/GLM-5.1-FP8

Qwen3.5-122B.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ x-nvidia: &nvidia
1515
hard: 65535
1616

1717
x-vllm-proxy-common: &vllm-proxy-common
18-
image: nearaidev/vllm-proxy-rs@sha256:05ad3e830dfca99e499705deab54616db483cf0b222be83b591b6037b7fd2abc
18+
image: nearaidev/vllm-proxy-rs@sha256:59e42dd68faa15eb0c23521029a2fc3d80d86a4143f9f766542357918be33a8c
1919
user: root
2020
privileged: true
2121
<<: *nvidia
@@ -132,6 +132,7 @@ services:
132132
environment:
133133
- NVIDIA_VISIBLE_DEVICES=all
134134
- CLOUD_API_URL=https://cloud-api.near.ai
135+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
135136
- COMPOSE_MANAGER_URL=http://compose-manager:8080
136137
- LOG_FORMAT=json
137138
- MODEL_NAME=Qwen/Qwen3.5-122B-A10B

small-models.yaml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ x-vllm-common: &vllm-common
2727
logging: *logging-conf
2828

2929
x-vllm-proxy-common: &vllm-proxy-common
30-
image: nearaidev/vllm-proxy-rs@sha256:05ad3e830dfca99e499705deab54616db483cf0b222be83b591b6037b7fd2abc
30+
image: nearaidev/vllm-proxy-rs@sha256:59e42dd68faa15eb0c23521029a2fc3d80d86a4143f9f766542357918be33a8c
3131
user: root
3232
privileged: true
3333
<<: *nvidia
@@ -323,6 +323,7 @@ services:
323323
environment:
324324
- NVIDIA_VISIBLE_DEVICES=all
325325
- CLOUD_API_URL=https://cloud-api.near.ai
326+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
326327
- COMPOSE_MANAGER_URL=http://compose-manager:8080
327328
- LOG_FORMAT=json
328329
- MODEL_NAME=Qwen/Qwen3-30B-A3B-Instruct-2507
@@ -379,6 +380,7 @@ services:
379380
environment:
380381
- NVIDIA_VISIBLE_DEVICES=all
381382
- CLOUD_API_URL=https://cloud-api.near.ai
383+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
382384
- COMPOSE_MANAGER_URL=http://compose-manager:8080
383385
- LOG_FORMAT=json
384386
- MODEL_NAME=openai/gpt-oss-120b
@@ -430,6 +432,7 @@ services:
430432
environment:
431433
- NVIDIA_VISIBLE_DEVICES=all
432434
- CLOUD_API_URL=https://cloud-api.near.ai
435+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
433436
- COMPOSE_MANAGER_URL=http://compose-manager:8080
434437
- LOG_FORMAT=json
435438
- MODEL_NAME=Qwen/Qwen3.6-35B-A3B-FP8
@@ -481,6 +484,7 @@ services:
481484
environment:
482485
- NVIDIA_VISIBLE_DEVICES=all
483486
- CLOUD_API_URL=https://cloud-api.near.ai
487+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
484488
- COMPOSE_MANAGER_URL=http://compose-manager:8080
485489
- LOG_FORMAT=json
486490
- MODEL_NAME=google/gemma-4-31B-it
@@ -552,6 +556,7 @@ services:
552556
environment:
553557
- NVIDIA_VISIBLE_DEVICES=all
554558
- CLOUD_API_URL=https://cloud-api.near.ai
559+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
555560
- COMPOSE_MANAGER_URL=http://compose-manager:8080
556561
- LOG_FORMAT=json
557562
- MODEL_NAME=black-forest-labs/FLUX.2-klein-4B
@@ -580,6 +585,7 @@ services:
580585
environment:
581586
- NVIDIA_VISIBLE_DEVICES=all
582587
- CLOUD_API_URL=https://cloud-api.near.ai
588+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
583589
- COMPOSE_MANAGER_URL=http://compose-manager:8080
584590
- LOG_FORMAT=json
585591
- MODEL_NAME=Qwen/Qwen3-VL-30B-A3B-Instruct
@@ -632,6 +638,7 @@ services:
632638
environment:
633639
- NVIDIA_VISIBLE_DEVICES=all
634640
- CLOUD_API_URL=https://cloud-api.near.ai
641+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
635642
- COMPOSE_MANAGER_URL=http://compose-manager:8080
636643
- LOG_FORMAT=json
637644
- MODEL_NAME=Qwen/Qwen3-Embedding-0.6B
@@ -675,6 +682,7 @@ services:
675682
environment:
676683
- NVIDIA_VISIBLE_DEVICES=all
677684
- CLOUD_API_URL=https://cloud-api.near.ai
685+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
678686
- COMPOSE_MANAGER_URL=http://compose-manager:8080
679687
- LOG_FORMAT=json
680688
- MODEL_NAME=Qwen/Qwen3-Reranker-0.6B
@@ -718,6 +726,7 @@ services:
718726
environment:
719727
- NVIDIA_VISIBLE_DEVICES=all
720728
- CLOUD_API_URL=https://cloud-api.near.ai
729+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
721730
- COMPOSE_MANAGER_URL=http://compose-manager:8080
722731
- LOG_FORMAT=json
723732
- MODEL_NAME=openai/whisper-large-v3
@@ -767,6 +776,7 @@ services:
767776
environment:
768777
- NVIDIA_VISIBLE_DEVICES=all
769778
- CLOUD_API_URL=https://cloud-api.near.ai
779+
- CLOUD_API_USAGE_TOKEN=${CLOUD_API_USAGE_TOKEN}
770780
- COMPOSE_MANAGER_URL=http://compose-manager:8080
771781
- LOG_FORMAT=json
772782
- MODEL_NAME=openai/privacy-filter

0 commit comments

Comments
 (0)