feat(multimodal): configurable tensor transport + vLLM SHM and video#1818
feat(multimodal): configurable tensor transport + vLLM SHM and video#1818slin1237 wants to merge 3 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds shared multimodal tensor transport over shared memory, expands vLLM and TokenSpeed schemas and routing config, threads transport selection through router and proto conversion paths, adds servicer-side SHM readers, and introduces unit and e2e coverage for image and video SHM flows. ChangesMultimodal SHM Tensor Transport
Sequence Diagram(s)sequenceDiagram
participant Router
participant MultimodalAssembly
participant ProtoWrapper
participant GrpcClient
participant VllmServicer
participant mm_shm
Router->>MultimodalAssembly: assemble_multimodal_data(..., transport)
MultimodalAssembly->>ProtoWrapper: build TensorData / SHM payloads
ProtoWrapper->>GrpcClient: generate request with SHM handles
GrpcClient->>GrpcClient: finish_vllm_request / cleanup_mm_shm_handles on failure
GrpcClient->>VllmServicer: GenerateRequest with TensorData
VllmServicer->>mm_shm: tensor_payload_bytes_from_shm(handle)
mm_shm-->>VllmServicer: bytes
VllmServicer-->>Router: response stream
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
| "TokenSpeed multimodal tensor transport configured" | ||
| shm_min_bytes, | ||
| dev_writable = mm_shm_dev_writable(), | ||
| "TokenSpeed multimodal tensor transport configured (global default; worker specs may override)" |
There was a problem hiding this comment.
🟡 Nit: This log message still says "TokenSpeed" but log_mm_transport_config_once is now the engine-agnostic config logger (called for vLLM too). Should read something like "Multimodal tensor transport configured (global default; worker specs may override)".
There was a problem hiding this comment.
Code Review
This pull request unifies and extends the multimodal tensor transport mechanism to support both TokenSpeed and vLLM workers, allowing large tensor payloads to be shared via /dev/shm (shared memory) instead of being sent inline over gRPC. It introduces global configuration options, per-worker overrides in WorkerSpec, and support for video modality in vLLM. Feedback on the changes highlights a security vulnerability in the shared memory filename validation that could allow arbitrary file deletion in /dev/shm, as well as a potential KeyError crash in the vLLM servicer when processing flat keys.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def validated_shm_name(name: str) -> str: | ||
| """Reject path traversal / absolute names before touching the filesystem.""" | ||
| name = name.lstrip("/") | ||
| if not name or "/" in name or name in (".", "..") or "\x00" in name: | ||
| raise ValueError(f"Invalid shm tensor name: {name!r}") | ||
| return name |
There was a problem hiding this comment.
The validated_shm_name function strips leading slashes and checks for path traversal characters, but it does not verify that the shared memory file name starts with the expected prefix (smg-mm- or smg-tokenspeed-). Since the worker may unlink the file after reading it (when UNLINK_MM_SHM_AFTER_READ is enabled), a compromised or malicious router could potentially supply an arbitrary file name in /dev/shm to read or delete it. Restricting the name to the expected prefixes mitigates this risk.
| def validated_shm_name(name: str) -> str: | |
| """Reject path traversal / absolute names before touching the filesystem.""" | |
| name = name.lstrip("/") | |
| if not name or "/" in name or name in (".", "..") or "\x00" in name: | |
| raise ValueError(f"Invalid shm tensor name: {name!r}") | |
| return name | |
| def validated_shm_name(name: str) -> str: | |
| """Reject path traversal / absolute names before touching the filesystem.""" | |
| name = name.lstrip("/") | |
| if not name or "/" in name or name in (".", "..") or "\x00" in name: | |
| raise ValueError(f"Invalid shm tensor name: {name!r}") | |
| if not (name.startswith("smg-mm-") or name.startswith("smg-tokenspeed-")): | |
| raise ValueError(f"Invalid shm tensor name prefix: {name!r}") | |
| return name |
| elif key in flat: | ||
| sizes_key = flat[key] | ||
| if sizes_key not in flat_sizes_cache: | ||
| flat_sizes_cache[sizes_key] = hf_dict[sizes_key].flatten().to(torch.int64) |
There was a problem hiding this comment.
If a client sends a flat_keys map containing a sizes_key that is not present in model_specific_tensors (and thus not in hf_dict), accessing hf_dict[sizes_key] will raise a KeyError. Checking if sizes_key exists in hf_dict and raising a clear ValueError prevents an unexpected crash and provides better error reporting.
elif key in flat:
sizes_key = flat[key]
if sizes_key not in hf_dict:
raise ValueError(f"Flat sizes key {sizes_key!r} for {key!r} not found in model_specific_tensors")
if sizes_key not in flat_sizes_cache:
flat_sizes_cache[sizes_key] = hf_dict[sizes_key].flatten().to(torch.int64)| /// for this worker (e.g. force `shm` for a co-located worker, `inline` for a | ||
| /// remote one). | ||
| #[serde(default, skip_serializing_if = "Option::is_none")] | ||
| pub multimodal_tensor_transport: Option<String>, |
There was a problem hiding this comment.
🟡 Nit: This field accepts any string without validation. A typo like "smh" instead of "shm" silently falls back to inline (the other arm in resolve_mm_shm_enabled logs a warning, but only once via OnceLock — so the second misconfigured worker is entirely silent). Consider either a serde deserialize validation or an enum type to catch invalid values at worker registration time rather than at request time. The CLI flag already validates with value_parser = ["inline", "shm", "auto"]; the API path doesn't get the same protection.
| continue; | ||
| }; | ||
| let Some(rest) = name.strip_prefix("smg-tokenspeed-") else { | ||
| let Some(rest) = name.strip_prefix(MM_SHM_NAME_PREFIX) else { |
There was a problem hiding this comment.
🟡 Nit: The prefix changed from "smg-tokenspeed-" to "smg-mm-", so the orphan sweep will no longer clean up SHM files left behind by a previous SMG version that crashed before its consumer could unlink them. During a rolling upgrade, any smg-tokenspeed-* crash orphans in /dev/shm will persist until the next reboot. Consider also sweeping the legacy prefix here (same dead-pid logic applies).
There was a problem hiding this comment.
Clean, well-structured PR. The engine-agnostic refactor of the SHM transport is consistent, the vLLM video wiring is correct, and the SHM lifecycle management (build-path + send-path cleanup) properly mirrors the existing TokenSpeed pattern. Three nits posted — no blocking issues.
Summary: 0 🔴 Important · 3 🟡 Nit · 0 🟣 Pre-existing
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 11e5b227f2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| kv_role=kv_role, | ||
| kv_engine_id=kv_engine_id, | ||
| data_parallel_size=parallel.data_parallel_size, | ||
| shm_namespace_id=mm_shm.shm_namespace_id(), |
There was a problem hiding this comment.
Pin vLLM servicer to the new proto package
When the vLLM servicer is installed or published without also installing the regenerated smg-grpc-proto, grpc_servicer/pyproject.toml still allows the existing smg-grpc-proto>=0.4.11, whose GetServerInfoResponse has no shm_namespace_id field and whose TensorData has no payload oneof. In that environment this constructor rejects the unknown keyword during worker discovery, so vLLM gRPC workers fail to start/register before any request can run; the repo’s development guide says proto+servicer field changes need a matching proto release/bump or split PR.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
grpc_servicer/smg_grpc_servicer/vllm/servicer.py (1)
697-718:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject invalid placeholder ranges before creating
PlaceholderRange.Lines 706-718 accept placeholder ranges without bounds validation. Out-of-range or zero-length entries can silently create invalid mask/range state and misalign multimodal embedding placement.
Suggested fix
placeholders = [] for p in mm_proto.mm_placeholders: + if p.length <= 0: + raise ValueError("Multimodal placeholder length must be > 0") + if p.offset + p.length > len(prompt_token_ids): + raise ValueError( + f"Multimodal placeholder out of bounds: " + f"offset={p.offset}, length={p.length}, prompt_len={len(prompt_token_ids)}" + ) is_embed = None if prompt_ids_tensor is not None: mask = prompt_ids_tensor[p.offset : p.offset + p.length] == pad_token_id🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpc_servicer/smg_grpc_servicer/vllm/servicer.py` around lines 697 - 718, Add bounds validation for each placeholder entry in the mm_proto.mm_placeholders loop before creating the PlaceholderRange object. Validate that p.offset is non-negative, p.length is positive, and that the range (p.offset + p.length) does not exceed the length of prompt_token_ids. Skip invalid entries or raise an appropriate error to prevent out-of-bounds placeholder ranges from being processed, ensuring correct multimodal embedding alignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@grpc_servicer/smg_grpc_servicer/vllm/servicer.py`:
- Around line 88-94: The _tensor_from_proto function does not validate that the
payload byte length matches the expected tensor size before calling reshape,
which can cause a cryptic runtime error when malformed inputs are provided. Add
validation logic before the reshape operation that calculates the expected byte
size based on the tensor dtype and shape (using torch.tensor with the dtype and
computing the total element count), compares it with the actual payload byte
length, and raises a clear ValueError if they do not match.
---
Outside diff comments:
In `@grpc_servicer/smg_grpc_servicer/vllm/servicer.py`:
- Around line 697-718: Add bounds validation for each placeholder entry in the
mm_proto.mm_placeholders loop before creating the PlaceholderRange object.
Validate that p.offset is non-negative, p.length is positive, and that the range
(p.offset + p.length) does not exceed the length of prompt_token_ids. Skip
invalid entries or raise an appropriate error to prevent out-of-bounds
placeholder ranges from being processed, ensuring correct multimodal embedding
alignment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0e95bac5-caa2-49d9-8362-d2329f28c5af
⛔ Files ignored due to path filters (1)
e2e_test/fixtures/videos/dog.mp4is excluded by!**/*.mp4
📒 Files selected for processing (20)
crates/grpc_client/proto/common.protocrates/grpc_client/proto/tokenspeed_scheduler.protocrates/grpc_client/proto/vllm_engine.protocrates/protocols/src/worker.rsdocs/reference/configuration.mde2e_test/chat_completions/test_multimodal_shm.pygrpc_servicer/smg_grpc_servicer/mm_shm.pygrpc_servicer/smg_grpc_servicer/tokenspeed/servicer.pygrpc_servicer/smg_grpc_servicer/vllm/servicer.pygrpc_servicer/tests/test_mm_shm.pymodel_gateway/src/config/builder.rsmodel_gateway/src/config/types.rsmodel_gateway/src/main.rsmodel_gateway/src/routers/grpc/client.rsmodel_gateway/src/routers/grpc/multimodal.rsmodel_gateway/src/routers/grpc/pd_router.rsmodel_gateway/src/routers/grpc/proto_wrapper.rsmodel_gateway/src/routers/grpc/regular/stages/chat/request_building.rsmodel_gateway/src/routers/grpc/regular/stages/messages/request_building.rsmodel_gateway/src/routers/grpc/router.rs
11e5b22 to
63e88d5
Compare
|
Addressed all review feedback and rebased on Root cause of the vLLM e2e failure ( Review comments
pre-commit: ran Rebased onto |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 63e88d5b54
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| multimodal_tensor_transport = None, | ||
| multimodal_shm_min_bytes = None, |
There was a problem hiding this comment.
Append Python constructor args to preserve positions
Because these two new parameters are inserted before dp_aware in the PyO3 signature, existing positional calls to smg.smg_rs.Router/_Router that were valid before now bind the old dp_aware argument to multimodal_tensor_transport and shift every following argument; PyO3 will either reject the type or configure unrelated fields incorrectly. The same signature already appends health_check_port last to avoid this exact break, so these options should also be appended or made keyword-only.
Useful? React with 👍 / 👎.
| *WRITABLE.get_or_init(|| { | ||
| let name = format!("smg-tokenspeed-probe-{}", process::id()); | ||
| let path = tokenspeed_shm_path(&name); | ||
| let name = format!("smg-mm-probe-{}", process::id()); |
There was a problem hiding this comment.
Make the SHM writability probe name unique
In deployments where multiple gateway processes share /dev/shm but run in separate PID namespaces, they can all have the same process::id() (commonly PID 1), so this fixed probe name collides. Since the probe uses create_new and caches the result in OnceLock, a concurrent or stale smg-mm-probe-1 file makes mm_shm_dev_writable() permanently return false for that process, causing shm/auto transport to fall back to inline even though the real per-payload names would be writable.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
grpc_servicer/smg_grpc_servicer/tokenspeed/servicer.py (1)
1184-1214: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winTokenSpeed SHM validation still allows non-SMG namespaces.
This path still accepts any top-level
/dev/shmname that passes traversal checks, then reads/unlinks it. Please enforce the same prefix allowlist (smg-mm-/smg-tokenspeed-) used by the sharedmm_shmhelper to avoid arbitrary/dev/shmtargeting.🔒 Suggested fix
`@staticmethod` def _validated_shm_name(name: str) -> str: name = name.lstrip("/") if not name or "/" in name or name in (".", "..") or "\x00" in name: raise ValueError(f"Invalid TensorData.shm name: {name!r}") + if not name.startswith(("smg-mm-", "smg-tokenspeed-")): + raise ValueError(f"TensorData.shm name outside allowed namespace: {name!r}") return name🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpc_servicer/smg_grpc_servicer/tokenspeed/servicer.py` around lines 1184 - 1214, The _validated_shm_name method only checks for path traversal vulnerabilities but does not enforce the required namespace prefix allowlist, allowing access to arbitrary shared memory files in /dev/shm. Modify the _validated_shm_name method to add a prefix validation check that ensures the name starts with one of the allowed prefixes (such as smg-mm- or smg-tokenspeed-) before allowing the name. If the name does not match the allowed prefixes, raise a ValueError with a descriptive error message indicating the invalid namespace prefix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bindings/python/src/smg/router_args.py`:
- Around line 501-505: The add_argument call for --multimodal-shm-min-bytes uses
type=int without validation, allowing negative values which are invalid for a
minimum bytes threshold. Replace type=int with a custom type validation function
that converts the input to an integer and rejects negative values by raising an
ArgumentTypeError, ensuring only non-negative integers are accepted for this
argument.
---
Outside diff comments:
In `@grpc_servicer/smg_grpc_servicer/tokenspeed/servicer.py`:
- Around line 1184-1214: The _validated_shm_name method only checks for path
traversal vulnerabilities but does not enforce the required namespace prefix
allowlist, allowing access to arbitrary shared memory files in /dev/shm. Modify
the _validated_shm_name method to add a prefix validation check that ensures the
name starts with one of the allowed prefixes (such as smg-mm- or
smg-tokenspeed-) before allowing the name. If the name does not match the
allowed prefixes, raise a ValueError with a descriptive error message indicating
the invalid namespace prefix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fa4dc311-16d3-4f9a-9d20-cbcee6fbb6b7
⛔ Files ignored due to path filters (1)
e2e_test/fixtures/videos/dog.mp4is excluded by!**/*.mp4
📒 Files selected for processing (24)
bindings/python/src/lib.rsbindings/python/src/smg/router_args.pycrates/grpc_client/proto/common.protocrates/grpc_client/proto/tokenspeed_scheduler.protocrates/grpc_client/proto/vllm_engine.protocrates/grpc_client/python/pyproject.tomlcrates/protocols/src/worker.rsdocs/reference/configuration.mde2e_test/chat_completions/test_multimodal_shm.pygrpc_servicer/pyproject.tomlgrpc_servicer/smg_grpc_servicer/mm_shm.pygrpc_servicer/smg_grpc_servicer/tokenspeed/servicer.pygrpc_servicer/smg_grpc_servicer/vllm/servicer.pygrpc_servicer/tests/test_mm_shm.pymodel_gateway/src/config/builder.rsmodel_gateway/src/config/types.rsmodel_gateway/src/main.rsmodel_gateway/src/routers/grpc/client.rsmodel_gateway/src/routers/grpc/multimodal.rsmodel_gateway/src/routers/grpc/pd_router.rsmodel_gateway/src/routers/grpc/proto_wrapper.rsmodel_gateway/src/routers/grpc/regular/stages/chat/request_building.rsmodel_gateway/src/routers/grpc/regular/stages/messages/request_building.rsmodel_gateway/src/routers/grpc/router.rs
63e88d5 to
5623b42
Compare
| } | ||
| } | ||
|
|
||
| fn log_tokenspeed_mm_timing_enabled() -> bool { |
There was a problem hiding this comment.
🟡 Nit: This function was missed in the tokenspeed_ → mm_ rename sweep. It's now the sole caller from the engine-agnostic mm_tensor_payload (line 380), so the tokenspeed in its name is misleading — renaming to log_mm_timing_enabled would keep it consistent with the rest of the refactor.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5623b42270
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if UNLINK_MM_SHM_AFTER_READ: | ||
| try: | ||
| os.unlink(path) |
There was a problem hiding this comment.
Keep SHM segments alive when PD decode reuses them
When vLLM sequential PD handles a multimodal request with SHM enabled, execute_sequential_pd clones the request for prefill and then reuses the original proto_request for decode without clearing mm_inputs (model_gateway/src/routers/grpc/common/stages/request_execution.rs:571-573), so both legs carry the same SHM handle. This unlink runs after the prefill servicer reads the tensor, causing the decode servicer to open a missing /dev/shm file and fail; either avoid sending the multimodal payload to decode on that path or defer cleanup until all consumers are done.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
grpc_servicer/smg_grpc_servicer/tokenspeed/servicer.py (1)
1186-1214: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winHarden TokenSpeed SHM name validation to the same namespace allowlist.
Line 1210’s local validator accepts any
/dev/shmbasename, so a craftedshm_handle.namecan still read/unlink non-transport entries. This bypasses the new prefix-boundary hardening introduced inmm_shm.validated_shm_name. Reuse the shared validator (or shared read helper) here to keep both engines under the same security contract.Suggested minimal fix
@@ `@staticmethod` def _validated_shm_name(name: str) -> str: - name = name.lstrip("/") - if not name or "/" in name or name in (".", "..") or "\x00" in name: - raise ValueError(f"Invalid TensorData.shm name: {name!r}") - return name + return mm_shm.validated_shm_name(name)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpc_servicer/smg_grpc_servicer/tokenspeed/servicer.py` around lines 1186 - 1214, The local _validated_shm_name method uses permissive validation that only checks for basic invalid characters and patterns, but it allows reading or unlinking non-transport SHM entries that should be protected. Replace the call to the local _validated_shm_name validator in the function that reads shared memory (around line 1186) with a call to the shared validator mm_shm.validated_shm_name that includes proper prefix-boundary hardening. This ensures both the TokenSpeed servicer and other engines enforce the same strict security contract for SHM access, preventing crafted shm_handle names from bypassing namespace restrictions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bindings/python/src/lib.rs`:
- Around line 805-806: The new constructor parameters
multimodal_tensor_transport and multimodal_shm_min_bytes are being inserted in
the middle of the _Router PyO3 constructor parameter list at lines 805-806 and
927-928, which breaks backward compatibility for existing Python callers using
positional arguments. Move these new parameters to the end of the constructor
parameter list instead of inserting them in the middle, ensuring all existing
positional argument indices remain unchanged.
In `@model_gateway/src/routers/grpc/proto_wrapper.rs`:
- Around line 481-484: The mm_shm_min_bytes_env function returns an untrimmed
string from env_first which causes integer parsing to fail silently when
whitespace is present in the environment variable value. To fix this, call
trim() on the value returned by env_first before attempting to parse it as a
usize, ensuring that values with leading or trailing whitespace like " 65536 "
are correctly parsed and honored instead of being ignored.
---
Outside diff comments:
In `@grpc_servicer/smg_grpc_servicer/tokenspeed/servicer.py`:
- Around line 1186-1214: The local _validated_shm_name method uses permissive
validation that only checks for basic invalid characters and patterns, but it
allows reading or unlinking non-transport SHM entries that should be protected.
Replace the call to the local _validated_shm_name validator in the function that
reads shared memory (around line 1186) with a call to the shared validator
mm_shm.validated_shm_name that includes proper prefix-boundary hardening. This
ensures both the TokenSpeed servicer and other engines enforce the same strict
security contract for SHM access, preventing crafted shm_handle names from
bypassing namespace restrictions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e5e23567-d60d-4af3-855f-874d563a2876
⛔ Files ignored due to path filters (1)
e2e_test/fixtures/videos/dog.mp4is excluded by!**/*.mp4
📒 Files selected for processing (25)
bindings/python/src/lib.rsbindings/python/src/smg/router_args.pycrates/grpc_client/proto/common.protocrates/grpc_client/proto/tokenspeed_scheduler.protocrates/grpc_client/proto/vllm_engine.protocrates/grpc_client/python/pyproject.tomlcrates/protocols/src/worker.rsdocs/reference/configuration.mde2e_test/chat_completions/test_multimodal_shm.pygrpc_servicer/pyproject.tomlgrpc_servicer/smg_grpc_servicer/mm_shm.pygrpc_servicer/smg_grpc_servicer/tokenspeed/servicer.pygrpc_servicer/smg_grpc_servicer/vllm/servicer.pygrpc_servicer/tests/test_mm_shm.pymodel_gateway/src/config/builder.rsmodel_gateway/src/config/types.rsmodel_gateway/src/main.rsmodel_gateway/src/routers/grpc/client.rsmodel_gateway/src/routers/grpc/multimodal.rsmodel_gateway/src/routers/grpc/pd_router.rsmodel_gateway/src/routers/grpc/proto_wrapper.rsmodel_gateway/src/routers/grpc/regular/stages/chat/request_building.rsmodel_gateway/src/routers/grpc/regular/stages/messages/request_building.rsmodel_gateway/src/routers/grpc/router.rsscripts/ci_install_e2e_deps.sh
Make the multimodal tensor transport engine-agnostic and configurable, extend the shared-memory transport (previously TokenSpeed-only) to vLLM, and add vLLM video inputs. Transport config: - New --multimodal-tensor-transport (inline|shm|auto) and --multimodal-shm-min-bytes CLI flags + RouterConfig fields, plus per-worker WorkerSpec overrides. Precedence: worker spec -> router config -> SMG_MM_* env (legacy SMG_TOKENSPEED_MM_* honored as aliases) -> built-in default. The multimodal subsystem receives resolved values and never depends on RouterConfig. vLLM shared memory: - Hoist ShmHandle/RemoteTensorHandle to common.proto and migrate TokenSpeed to them; add the transport oneof to vLLM TensorData. The gateway writes large tensors to /dev/shm (shared mm_shm I/O, smg-mm- prefix) with build- and send-path cleanup; the vLLM servicer reads them via a shared mm_shm helper and advertises shm_namespace_id so `auto` can verify co-location. vLLM video: - Assemble video inputs for vLLM (is_video) instead of rejecting them; the servicer routes the encoder tensor to pixel_values_videos with `video` field configs. Adds an e2e test running Qwen3-VL video over SHM with a committed fixture, asserting the SHM transport is exercised. Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
5623b42 to
a2d7a08
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
grpc_servicer/smg_grpc_servicer/tokenspeed/servicer.py (1)
1183-1214: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winUse the shared SHM validator here too.
This TokenSpeed path still accepts any basename under
/dev/shm, whilegrpc_servicer/smg_grpc_servicer/mm_shm.pynow restricts handles to thesmg-mm-/smg-tokenspeed-namespaces. That leaves TokenSpeed able to read or unlink arbitrary/dev/shmentries from a crafted request, so the new boundary is only enforced on the vLLM side.Suggested fix
def _tensor_payload_bytes_from_shm( shm_handle: common_pb2.ShmHandle, ) -> bytes: - name = TokenSpeedSchedulerServicer._validated_shm_name(shm_handle.name) - - path = os.path.join("/dev/shm", name) - fd = None - try: - fd = os.open(path, os.O_RDONLY) - raw = os.pread(fd, int(shm_handle.nbytes), int(shm_handle.offset)) - finally: - if fd is not None: - os.close(fd) - if fd is not None and UNLINK_MM_SHM_AFTER_READ: - try: - os.unlink(path) - except FileNotFoundError: - pass - - if len(raw) != int(shm_handle.nbytes): - raise ValueError( - f"TensorData.shm byte length mismatch for name={shm_handle.name!r}: " - f"expected {int(shm_handle.nbytes)}, got {len(raw)}" - ) - return raw + return mm_shm.tensor_payload_bytes_from_shm(shm_handle)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpc_servicer/smg_grpc_servicer/tokenspeed/servicer.py` around lines 1183 - 1214, The TokenSpeed shared-memory read path bypasses the centralized SHM namespace validation, so `_tensor_payload_bytes_from_shm` can still open and unlink arbitrary `/dev/shm` entries. Update this flow to reuse the shared validator from `grpc_servicer/smg_grpc_servicer/mm_shm.py` (or the same namespace-checking logic) before `os.open` and `os.unlink`, and keep the existing `TokenSpeedSchedulerServicer._validated_shm_name` call only if it enforces the same `smg-mm-` / `smg-tokenspeed-` restrictions.
♻️ Duplicate comments (1)
bindings/python/src/smg/router_args.py (1)
501-505: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winReject negative
--multimodal-shm-min-bytesvalues.Line 503 still accepts negative integers, but this value is later consumed by the Rust config path as
Option<usize>. That means an invalid CLI value is accepted here and only fails later at the Python→Rust boundary instead of being rejected immediately.Suggested fix
+ def _non_negative_int(value: str) -> int: + parsed = int(value) + if parsed < 0: + raise argparse.ArgumentTypeError( + "--multimodal-shm-min-bytes must be >= 0" + ) + return parsed + multimodal_group.add_argument( f"--{prefix}multimodal-shm-min-bytes", - type=int, + type=_non_negative_int, default=None, help="Minimum multimodal tensor size (bytes) before the SHM transport is used", )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bindings/python/src/smg/router_args.py` around lines 501 - 505, Reject invalid negative values for the multimodal SHM threshold in the router argument parser. Update the argument definition in the multimodal group inside the router_args.py parser setup so the --multimodal-shm-min-bytes option validates non-negative integers before they are passed onward, instead of allowing them to reach the Python→Rust boundary as an Option[usize]. Use the existing multimodal argument registration path and its type/validation handling to enforce this immediately at parse time.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@grpc_servicer/smg_grpc_servicer/mm_shm.py`:
- Around line 45-58: The SHM read path in tensor_payload_bytes_from_shm still
follows symlinks, so a crafted shared-memory name can escape the intended
namespace. Update tensor_payload_bytes_from_shm to open the resolved path with
symlink-safe flags and/or validate the opened file descriptor refers to a
regular file under the expected shm directory before reading; keep the fix
localized to tensor_payload_bytes_from_shm and the validated_shm_name/path
handling around os.open, os.pread, and os.unlink.
In `@model_gateway/src/routers/grpc/multimodal.rs`:
- Around line 1753-1765: The worker override handling in the multimodal
transport resolver should not let an invalid
WorkerSpec.multimodal_tensor_transport value suppress a valid global setting.
Update the mode selection around worker_transport_mode_override() and the match
on mode in the multimodal transport path so unknown worker values are logged via
log_unknown_mm_transport_once() and then ignored, allowing the code to fall back
to the resolved global transport from transport / DEFAULT_MM_TENSOR_TRANSPORT
instead of immediately returning false.
In `@scripts/ci_install_e2e_deps.sh`:
- Around line 20-27: The ffmpeg dependency check in ci_install_e2e_deps.sh is
swallowing install failures, which lets the E2E job continue and fail later in
the test phase. Update the ffmpeg installation path so it remains fatal when
apt-get install or the apt-get check fails, or explicitly set a flag that the
video test flow can read to skip those tests. Keep the change centered around
the ffmpeg/apt-get branch in the CI install script so the job behavior is clear
and actionable.
---
Outside diff comments:
In `@grpc_servicer/smg_grpc_servicer/tokenspeed/servicer.py`:
- Around line 1183-1214: The TokenSpeed shared-memory read path bypasses the
centralized SHM namespace validation, so `_tensor_payload_bytes_from_shm` can
still open and unlink arbitrary `/dev/shm` entries. Update this flow to reuse
the shared validator from `grpc_servicer/smg_grpc_servicer/mm_shm.py` (or the
same namespace-checking logic) before `os.open` and `os.unlink`, and keep the
existing `TokenSpeedSchedulerServicer._validated_shm_name` call only if it
enforces the same `smg-mm-` / `smg-tokenspeed-` restrictions.
---
Duplicate comments:
In `@bindings/python/src/smg/router_args.py`:
- Around line 501-505: Reject invalid negative values for the multimodal SHM
threshold in the router argument parser. Update the argument definition in the
multimodal group inside the router_args.py parser setup so the
--multimodal-shm-min-bytes option validates non-negative integers before they
are passed onward, instead of allowing them to reach the Python→Rust boundary as
an Option[usize]. Use the existing multimodal argument registration path and its
type/validation handling to enforce this immediately at parse time.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: dedf9c64-b5b8-4113-a8a7-212353691222
⛔ Files ignored due to path filters (1)
e2e_test/fixtures/videos/dog.mp4is excluded by!**/*.mp4
📒 Files selected for processing (25)
bindings/python/src/lib.rsbindings/python/src/smg/router_args.pycrates/grpc_client/proto/common.protocrates/grpc_client/proto/tokenspeed_scheduler.protocrates/grpc_client/proto/vllm_engine.protocrates/grpc_client/python/pyproject.tomlcrates/protocols/src/worker.rsdocs/reference/configuration.mde2e_test/chat_completions/test_multimodal_shm.pygrpc_servicer/pyproject.tomlgrpc_servicer/smg_grpc_servicer/mm_shm.pygrpc_servicer/smg_grpc_servicer/tokenspeed/servicer.pygrpc_servicer/smg_grpc_servicer/vllm/servicer.pygrpc_servicer/tests/test_mm_shm.pymodel_gateway/src/config/builder.rsmodel_gateway/src/config/types.rsmodel_gateway/src/main.rsmodel_gateway/src/routers/grpc/client.rsmodel_gateway/src/routers/grpc/multimodal.rsmodel_gateway/src/routers/grpc/pd_router.rsmodel_gateway/src/routers/grpc/proto_wrapper.rsmodel_gateway/src/routers/grpc/regular/stages/chat/request_building.rsmodel_gateway/src/routers/grpc/regular/stages/messages/request_building.rsmodel_gateway/src/routers/grpc/router.rsscripts/ci_install_e2e_deps.sh
| if ! command -v ffmpeg >/dev/null 2>&1; then | ||
| if command -v apt-get >/dev/null 2>&1; then | ||
| echo "Installing ffmpeg (video decode backend)..." | ||
| apt-get update -qq && apt-get install -y --no-install-recommends ffmpeg \ | ||
| || echo "WARNING: ffmpeg install failed; multimodal video e2e tests will fail" >&2 | ||
| else | ||
| echo "WARNING: ffmpeg + apt-get unavailable; multimodal video e2e tests will fail" >&2 | ||
| fi |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Don’t hide ffmpeg install failures from the E2E job.
The new video test does not skip when ffmpeg is missing, so continuing here only pushes the failure into the test phase with a less actionable error. If this script is the CI dependency gate for that suite, the install needs to stay fatal there or set a flag that the video tests can skip on.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/ci_install_e2e_deps.sh` around lines 20 - 27, The ffmpeg dependency
check in ci_install_e2e_deps.sh is swallowing install failures, which lets the
E2E job continue and fail later in the test phase. Update the ffmpeg
installation path so it remains fatal when apt-get install or the apt-get check
fails, or explicitly set a flag that the video test flow can read to skip those
tests. Keep the change centered around the ffmpeg/apt-get branch in the CI
install script so the job behavior is clear and actionable.
- e2e: skip the vLLM video SHM test when ffmpeg is unavailable instead of failing (an apt install flake was the only red CI check) - servicer: open SHM payloads with O_NOFOLLOW so a symlink planted at a validated name in world-writable /dev/shm can't redirect the read/unlink - gateway: clear mm_inputs on the sequential-PD decode leg so it no longer re-reads an SHM segment the prefill servicer already unlinked (mirrors execute_dual_dispatch) - gateway: make the /dev/shm writability probe name unique (pid+nanos) to avoid spurious failure across PID namespaces that share /dev/shm - gateway: an invalid per-worker transport override now falls through to the global mode instead of silently forcing inline - bindings: append multimodal_* constructor args last to preserve PyO3 positional compatibility; reject negative --multimodal-shm-min-bytes - proto_wrapper: finish the rename sweep (log_mm_timing_enabled) Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
grpc_servicer/smg_grpc_servicer/mm_shm.py (1)
45-68: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winReject oversized SHM handles before calling
os.pread.This helper still trusts
shm_handle.nbytesfor the read size. Ingrpc_servicer/smg_grpc_servicer/vllm/servicer.py:87-104,_tensor_from_proto()only computes the dtype/shape-based expected byte count after calling into this helper, so a malformed handle can force an arbitrarily large read/allocation and take down the servicer before the later length check runs.Suggested fix
-def tensor_payload_bytes_from_shm(shm_handle, shm_dir: str = DEFAULT_SHM_DIR) -> bytes: +def tensor_payload_bytes_from_shm( + shm_handle, expected_nbytes: int, shm_dir: str = DEFAULT_SHM_DIR +) -> bytes: """Read a tensor payload the gateway wrote to ``shm_dir`` for ``shm_handle``.""" name = validated_shm_name(shm_handle.name) path = os.path.join(shm_dir, name) + actual_nbytes = int(shm_handle.nbytes) + if actual_nbytes != expected_nbytes: + raise ValueError( + f"shm tensor byte length mismatch for name={shm_handle.name!r}: " + f"expected {expected_nbytes}, got {actual_nbytes}" + ) fd = None try: # O_NOFOLLOW: /dev/shm is world-writable, so a same-host attacker could # plant a symlink at the (validated) name pointing at an arbitrary file; # refuse to follow it so a crafted handle can't read/unlink outside SHM. fd = os.open(path, os.O_RDONLY | os.O_NOFOLLOW) - raw = os.pread(fd, int(shm_handle.nbytes), int(shm_handle.offset)) + raw = os.pread(fd, expected_nbytes, int(shm_handle.offset)) @@ - if len(raw) != int(shm_handle.nbytes): + if len(raw) != expected_nbytes: raise ValueError( f"shm tensor byte length mismatch for name={shm_handle.name!r}: " - f"expected {int(shm_handle.nbytes)}, got {len(raw)}" + f"expected {expected_nbytes}, got {len(raw)}" )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpc_servicer/smg_grpc_servicer/mm_shm.py` around lines 45 - 68, The shm read helper in tensor_payload_bytes_from_shm still trusts shm_handle.nbytes before os.pread, so reject malformed or oversized handles up front instead of reading first. Add a pre-read size validation in tensor_payload_bytes_from_shm using the expected tensor byte count derived from the handle metadata, and fail fast before os.pread when the requested size is invalid; then keep the existing post-read length check as a safety net. Use the existing tensor_payload_bytes_from_shm and _tensor_from_proto flow to locate the change, and make sure the validation happens before any allocation or read is attempted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@model_gateway/src/routers/grpc/common/stages/request_execution.rs`:
- Around line 571-577: The decode request in request_execution.rs is clearing
multimodal inputs unconditionally, which breaks cases where relay_kv_params is
false and decode must recompute the prompt locally. Update the logic around
decode_request.clear_mm_inputs() so mm_inputs are only stripped on the remote-KV
path, and keep them intact when local recompute is required. Use the existing
decode path in execute_dual_dispatch and the relay_kv_params gating to either
preserve multimodal tensors or route multimodal n > 1 requests through a safe
inline/reject path before SHM is unlinked.
---
Outside diff comments:
In `@grpc_servicer/smg_grpc_servicer/mm_shm.py`:
- Around line 45-68: The shm read helper in tensor_payload_bytes_from_shm still
trusts shm_handle.nbytes before os.pread, so reject malformed or oversized
handles up front instead of reading first. Add a pre-read size validation in
tensor_payload_bytes_from_shm using the expected tensor byte count derived from
the handle metadata, and fail fast before os.pread when the requested size is
invalid; then keep the existing post-read length check as a safety net. Use the
existing tensor_payload_bytes_from_shm and _tensor_from_proto flow to locate the
change, and make sure the validation happens before any allocation or read is
attempted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 66d9fa8b-3d92-4daa-b61d-40b900bd364c
📒 Files selected for processing (7)
bindings/python/src/lib.rsbindings/python/src/smg/router_args.pye2e_test/chat_completions/test_multimodal_shm.pygrpc_servicer/smg_grpc_servicer/mm_shm.pymodel_gateway/src/routers/grpc/common/stages/request_execution.rsmodel_gateway/src/routers/grpc/multimodal.rsmodel_gateway/src/routers/grpc/proto_wrapper.rs
| // Decode keeps the prefill request_id (load-bearing for NIXL P/D | ||
| // correlation on vLLM < 0.13). Strip multimodal data: the decode worker | ||
| // only needs the transferred KV cache, not the pixel tensors, and reusing | ||
| // an SHM handle would re-read a segment the prefill servicer already | ||
| // unlinked. Mirrors execute_dual_dispatch. | ||
| let mut decode_request = proto_request; | ||
| decode_request.clear_mm_inputs(); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift
Don’t strip multimodal inputs when decode must recompute the prompt.
When relay_kv_params is false, the NIXL path skips KV relay and expects decode to recompute the prompt locally. Clearing mm_inputs unconditionally here leaves multimodal decode requests without the image/video tensors needed for that recompute. Gate this to the remote-KV path, or force a safe inline/reject path for multimodal n > 1 before prefill unlinks SHM handles.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@model_gateway/src/routers/grpc/common/stages/request_execution.rs` around
lines 571 - 577, The decode request in request_execution.rs is clearing
multimodal inputs unconditionally, which breaks cases where relay_kv_params is
false and decode must recompute the prompt locally. Update the logic around
decode_request.clear_mm_inputs() so mm_inputs are only stripped on the remote-KV
path, and keep them intact when local recompute is required. Use the existing
decode path in execute_dual_dispatch and the relay_kv_params gating to either
preserve multimodal tensors or route multimodal n > 1 requests through a safe
inline/reject path before SHM is unlinked.
…e mm Review follow-up: the previous commit cleared mm_inputs on the sequential-PD decode leg, which breaks the case where decode recomputes the prompt locally (e.g. n>1 NIXL, or when KV relay is skipped) — it would lose the pixel tensors. Instead, force inline multimodal transport for disaggregated (Dual) requests in resolve_mm_shm_enabled and stop stripping mm_inputs on decode. SHM is unsafe for PD regardless of the strip: it is single-consumer (the servicer unlinks each segment after reading) while a PD request has two legs. Inline keeps both the resume-from-KV and recompute paths correct; SHM stays available for single-worker requests. Addresses coderabbit (recompute) and codex P2 (PD SHM lifecycle). Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c72d3da8d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| GrpcClient::Vllm(_) => { | ||
| ensure_image_only(&precomputed, "vLLM")?; | ||
| Ok(MultimodalData::Vllm(assemble_vllm(precomputed))) | ||
| ensure_image_or_video(&precomputed, "vLLM")?; |
There was a problem hiding this comment.
Use video flat sizes before accepting vLLM video
When this starts routing video payloads to vLLM, Qwen3-VL video requests still carry the image flat-layout key: qwen3_vl.rs declares pixel_values as FieldLayout::flat("patches_per_image"), while the video preprocessor emits patches_per_video (qwen_vl_base.rs). The vLLM servicer then rewrites pixel_values to pixel_values_videos but keeps the stale patches_per_image sizes key and aborts with Flat sizes key 'patches_per_image' ... not found, so the newly added vLLM video path fails for the advertised Qwen3-VL video case unless the layout is made modality-specific before enabling video here.
Useful? React with 👍 / 👎.
Description
Problem
The multimodal tensor transport was TokenSpeed-only and tuned exclusively via
SMG_TOKENSPEED_MM_*environment variables. There was no first-class(CLI/config) way to select it, no per-worker control, and vLLM workers always
received large preprocessed tensors inline over gRPC. vLLM also rejected video
inputs entirely — which is exactly the case where avoiding the gRPC copy matters
most.
Solution
Make the transport engine-agnostic and configurable, extend the shared-memory
(
/dev/shm) transport to vLLM, and wire up vLLM video. The multimodal subsystemreceives resolved values and never depends on
RouterConfig.Changes
Transport config
--multimodal-tensor-transport(inline|shm|auto) and--multimodal-shm-min-bytesCLI flags +RouterConfigfields, plus per-workerWorkerSpecoverrides (multimodal_tensor_transport,multimodal_shm_min_bytes).SMG_MM_*env → built-indefault. Legacy
SMG_TOKENSPEED_MM_*names are honored as fallback aliases.vLLM shared memory
ShmHandle/RemoteTensorHandleintocommon.protoand migrateTokenSpeed to them; add the transport
oneofto vLLMTensorData(inlinestays field 1 for wire compatibility).
/dev/shmvia shared, engine-neutralmm_shmI/O (smg-mm-prefix, orphan sweep), with build- and send-pathcleanup mirroring TokenSpeed.
mm_shmhelper and advertisesshm_namespace_id(GetServerInfo) soautocan verify/dev/shmco-location.
vLLM video
is_video) instead of rejecting them; theservicer routes the encoder tensor to
pixel_values_videoswithvideofieldconfigs. (sglang/TRT remain image-only.)
Docs
docs/reference/configuration.mdupdated for the flags, precedence, per-workeroverride, and vLLM support.
Test Plan
Ran locally (macOS):
cargo +nightly fmt --all— cleancargo clippy -p smg -p smg-grpc-client -p openai-protocol --all-targets -- -D warnings— clean (default features;--all-featurespulls opencv)cargo test -p smg— green (the pre-existinglocal_shm_namespace_id_resolves_on_linuxis Linux-only and skipped on macOS),cargo test -p openai-protocol— greencargo check --workspace --tests,cargo build -p smg-python -p smg-golang— cleanmm_shmunit tests (/dev/shmround-trip), servicerpy_compile+ruff, proto regen field assertionsNew tests:
into_protoinline / below-threshold / image vs video modalitygrpc_servicer/tests/test_mm_shm.py(shared SHM reader)e2e_test/chat_completions/test_multimodal_shm.py): Qwen3-VL image andvideo over vLLM gRPC with
--multimodal-tensor-transport shm, asserting correctoutput and that
smg_mm_tensors_total{path="shm",runtime="vllm"}increased(so a silent inline fallback fails the test). Requires a GPU runner; runs in CI.
Checklist
cargo +nightly fmtpassescargo clippy --all-targets -- -D warningspasses (default features)Summary by CodeRabbit
inline,shm,auto) plus--multimodal-shm-min-bytesto control SHM eligibility.shm_namespace_id) and unified SHM lifecycle cleanup behavior.