You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pack TokenSpeed encoder inputs into offset SHM segments, preserve placeholder spans for faster worker handoff, and default video tensor transport to auto.
Signed-off-by: yechank-nvidia <161688079+yechank-nvidia@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/reference/configuration.md
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -942,11 +942,17 @@ smg \
942
942
These env-only variables tune how the router ships preprocessed multimodal
943
943
tensors (image/video encoder inputs) to a TokenSpeed worker. They do not affect
944
944
accuracy — the inline and shared-memory paths produce byte-identical tensors.
945
+
SHM handles include offsets; multi-item TokenSpeed encoder inputs may share one
946
+
packed segment while preserving the same byte-exact tensor payloads and reducing
947
+
per-tensor file lifecycle overhead.
945
948
946
949
| Environment Variable | Default | Description |
947
950
|---------------------|---------|-------------|
948
-
|`SMG_TOKENSPEED_MM_TENSOR_TRANSPORT`|`inline`| Transport for large MM tensors: `inline` (gRPC bytes), `shm` (always use `/dev/shm`), or `auto` (use `/dev/shm` only when the worker is *verified* to share it). In `auto`, the router compares the worker's advertised `/dev/shm` namespace token (`GetServerInfo`) to its own and uses SHM only on a match; otherwise it falls back to inline. No locality configuration is needed. |
951
+
|`SMG_TOKENSPEED_MM_TENSOR_TRANSPORT`|image/audio: `inline`; video: `auto`| Transport for large MM tensors: `inline` (gRPC bytes), `shm` (always use `/dev/shm`), or `auto` (use `/dev/shm` only when the worker is *verified* to share it). When unset, image/audio stay inline while video uses `auto` to avoid the high-throughput video gRPC byte-copy path on colocated workers without hurting image TTFT. In `auto`, the router compares the worker's advertised `/dev/shm` namespace token (`GetServerInfo`) to its own and uses SHM only on a match; otherwise it falls back to inline. No locality configuration is needed. |
949
952
|`SMG_TOKENSPEED_MM_SHM_MIN_BYTES`|`65536`| Minimum tensor size (bytes) before the SHM path is used; smaller tensors stay inline. |
953
+
|`SMG_MM_PREPROCESS_PAR_MIN_BYTES`|`524288`| Minimum output size before CPU image/video preprocessing splits work across helper threads. |
954
+
|`SMG_MM_PREPROCESS_PAR_MIN_ROWS`|`32`| Minimum output rows or block bands per helper thread for CPU multimodal preprocessing. |
955
+
|`SMG_MM_PREPROCESS_PAR_MAX_THREADS`|`8`| Maximum helper threads spawned per image/video preprocessing pass. Raise for large single requests; keep lower for high-concurrency TTFT. |
0 commit comments