Skip to content

Commit bf6bd8d

Browse files
committed
GLM-5.1: bump inference-proxy + wire web_context_search env
Image bump - 05ad3e830dfca99e499705deab54616db483cf0b222be83b591b6037b7fd2abc + 59e42dd68faa15eb0c23521029a2fc3d80d86a4143f9f766542357918be33a8c (nearai/inference-proxy#144 — server-side `web_context_search` agent loop for /v1/chat/completions; PR merged on main as 177c58c.) Env wiring - WEB_CONTEXT_SEARCH_URL=${WEB_CONTEXT_SEARCH_URL} - WEB_CONTEXT_SEARCH_API_KEY=${WEB_CONTEXT_SEARCH_API_KEY} Both gated by compose interpolation so the feature stays dormant until the host's .env defines them (the inference-proxy startup validation requires both set or both unset, so leaving one of them hardcoded while the other comes from .env would block startup on hosts that haven't been provisioned with a Brave key). To enable on a deployed host, populate .env with: WEB_CONTEXT_SEARCH_URL=https://api.search.brave.com/res/v1/llm/context WEB_CONTEXT_SEARCH_API_KEY=<brave-subscription-token> Until then, this is a pure image bump — no behavior change.
1 parent 3953252 commit bf6bd8d

1 file changed

Lines changed: 16 additions & 1 deletion

File tree

GLM-5.1.yaml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ x-nvidia: &nvidia
1515
hard: 65535
1616

1717
x-vllm-proxy-common: &vllm-proxy-common
18-
image: nearaidev/vllm-proxy-rs@sha256:05ad3e830dfca99e499705deab54616db483cf0b222be83b591b6037b7fd2abc
18+
image: nearaidev/vllm-proxy-rs@sha256:59e42dd68faa15eb0c23521029a2fc3d80d86a4143f9f766542357918be33a8c
1919
user: root
2020
privileged: true
2121
<<: *nvidia
@@ -94,6 +94,21 @@ services:
9494
- VLLM_BASE_URL=http://glm51:8000
9595
- TLS_CERT_PATH=/etc/letsencrypt/live/completions.near.ai/fullchain.pem
9696
- USE_NV_ATTESTATION_SDK=true
97+
# Server-side `web_context_search` tool loop (inference-proxy#144).
98+
# Activated only when a client sends
99+
# `tools: [{"type":"web_context_search"}]` on /v1/chat/completions
100+
# with stream:true. The tool runs inside this CVM, hitting Brave's
101+
# LLM Context endpoint directly; tool args + results stay within the
102+
# E2EE perimeter. Both URL and key are gated by env interpolation —
103+
# if the host's `.env` doesn't define them, compose substitutes
104+
# empty strings, the inference-proxy treats the feature as
105+
# unconfigured, and existing chat-completion behavior is unchanged.
106+
# To enable on a host, set both
107+
# `WEB_CONTEXT_SEARCH_URL=https://api.search.brave.com/res/v1/llm/context`
108+
# and `WEB_CONTEXT_SEARCH_API_KEY=<brave-subscription-token>` in
109+
# the host .env before compose up.
110+
- WEB_CONTEXT_SEARCH_URL=${WEB_CONTEXT_SEARCH_URL}
111+
- WEB_CONTEXT_SEARCH_API_KEY=${WEB_CONTEXT_SEARCH_API_KEY}
97112

98113
glm51:
99114
<<: *nvidia

0 commit comments

Comments
 (0)