Skip to content

Commit c9ff8c6

Browse files
authored
feat(server): wire video_url content blocks through chat completion handler
* feat(server): wire video_url content blocks through chat handler Server video requests were silently ignored because `extract_chat_video_paths` sat behind `#[allow(dead_code)]` while the chat-completion handler had no path to feed the helper's output through to the VLM runtime. Wire it up, and add the path-traversal guard that 's security review flagged as latent-HIGH so the wiring lands with the safety check, not after. Path-traversal guard (`src/server/media.rs`): The `file://` and bare-local-path branches now require the resolved canonical path to sit under one of the entries in the `MLXCEL_VIDEO_DIR_ALLOWLIST` env var (comma-separated, canonical directories). Empty/unset is fail-closed — every local-filesystem video reference is rejected until an operator explicitly opts in. The guard also rejects symlinks whose canonical target escapes the allowlist (canonicalize resolves symlinks before the prefix check), non-regular files (directories, FIFOs, devices), and files whose extension is not in the project's video allowlist. The public helper now reads the env var; tests use a sibling `extract_chat_video_paths_with_allowlist` so they don't depend on process-wide state. Handler wiring (`src/server/routes/chat.rs`, `src/server/chat_request.rs`): `prepare_chat_request_with_cache` now resolves video paths alongside images and audio. The video paths flow through a new `videos` field on `ModelRequest::Generate`, threaded through `BatchScheduler::enqueue_request` and into `prepare_request_vlm_embeddings`, where a new `prepare_request_video_embeddings` helper mirrors the CLI's `compute_gemma4_video_embeddings`: probe ffmpeg, decode each video at the per-request FPS (defaulting to `multimodal::video::DEFAULT_FPS`), expand `<|video|>` placeholders into per-frame `<boi> image_token*N <eoi>` runs, and dispatch through the Gemma 4 vision tower. Static media-support detection (`src/server/state.rs`, `src/server/startup.rs`): `AppState` gains a `media_support: ModelMediaSupport` field resolved once at startup from the model directory's `config.json` via `crate::models::get_model_type`. The chat handler short-circuits `video_url` blocks for non-video-capable models with a 400 (`invalid_request_error`) before any allowlist resolution or worker dispatch runs — no token cost, no silent drop. Currently only `Gemma4VLM` flips `video=true`; the detection helper is the single edit point for adding future video-capable variants. Default-deny rationale: the `MLXCEL_VIDEO_DIR_ALLOWLIST` default is intentionally empty so a server stood up without the env var set cannot be tricked into reading `/etc/passwd` (or any other absolute path) by a request body. Operators trade explicit configuration for a guaranteed-zero attack surface — the same posture the upstream image-fetch helper uses for HTTP timeouts. Tests: - `src/server/media_tests.rs` adds six new cases covering the guard: empty allowlist (default state), path outside allowlist, symlink whose target escapes the allowlist (Unix-only), non-regular file with a video extension, happy-path inside the allowlist, and the canonicalization-into-sandbox parent-dir case. The pre-existing `file://`-scheme and bare-local-path tests now construct an explicit sandbox per test rather than relying on process-wide allowlist state. - `src/server/routes/chat.rs` adds three unit tests for `request_has_video_blocks` covering text-only, image+text, and explicit `video_url` payloads. - `src/server/startup_tests.rs` adds three tests for `detect_model_media_support` covering Gemma 4 VLM detection, missing config fallback, and text-only model behavior. `#[allow(dead_code)]` is removed from `extract_chat_video_paths` and its supporting helpers (`resolve_video_url`, `decode_video_data_uri`, `fetch_remote_video`, `infer_video_extension`, `write_video_temp_file`, `sanitize_video_extension`, `MAX_VIDEO_PAYLOAD_SIZE`). Closes * fix(server): close temp-file leak and HTTP buffer DoS in video handler review surfaced four security findings in the video URL handler. This commit closes all four on the same branch. HIGH-1 — Temp-file leak / disk-fill DoS. Every `data:video/...;base64,...` request and HTTP fetch wrote up to 1 GiB into `/tmp` and never cleaned up, because `write_video_temp_file` returned a bare `PathBuf` and no caller deleted it. The resolver now returns `(PathBuf, Option<TempFile>)` where `TempFile` is the existing `crate::multimodal::video::TempFile` Drop guard for caller-owned paths (data URIs and HTTP-fetched files); pre-existing `file://` paths are not wrapped because we don't own them. `PreparedChatRequest` stashes a `Vec<TempFile>` so the guards live for the duration of the request handling — the scheduler still receives only paths, but the guards drop alongside `PreparedChatRequest` once the response is sent, removing every server-owned temp file regardless of which return path we took (success, error, panic). HIGH-2 — HTTP fetch buffered the entire body before the size check. `response.bytes.await` would read the entire response body into memory before we could enforce `MAX_VIDEO_PAYLOAD_SIZE`, allowing a hostile origin to OOM the server by lying about `Content-Length`. The fetch now uses `response.bytes_stream` and accumulates per-chunk into a `Vec<u8>` with a `len + chunk_len > cap` check that fires before extending the buffer; on overflow the partial buffer is dropped and the request fails. The shared HTTP client also gained a 5 s connect timeout and a 5-hop redirect cap on top of the existing 10 s total timeout. New `reqwest` feature flag `stream` enables `bytes_stream`. MEDIUM-1 — Blocking syscalls on the Tokio runtime. `std::fs::canonicalize` and `std::fs::metadata` ran on Tokio worker threads, where a slow disk or NFS mount could stall the executor. Both calls in `resolve_local_video_path` now use `tokio::fs::*` and the function is `async`. `extract_chat_video_paths_with_allowlist` is propagated as `async` to all callers and tests. MEDIUM-2 — TOCTOU race + writable allowlist warning. The path-based resolver returns a path that ffmpeg then re-opens, so an attacker with write access inside an allowlisted directory can swap the file for a symlink between canonicalization and ffmpeg's open call. The full FD-passing fix is out of scope for this PR (tracked). For this PR a startup-time helper `scan_insecure_allowlist_dirs` walks every directory in `MLXCEL_VIDEO_DIR_ALLOWLIST`, checks Unix permissions via `PermissionsExt::mode`, and emits a `tracing::warn!` for any entry with group/world write bits set. Cfg-gated to `#[cfg(unix)]` and unit-tested with both world-writable and strict (0750) directories. The doc comment on `extract_chat_video_paths_with_allowlist` and the `VIDEO_DIR_ALLOWLIST_ENV` constant call out the residual TOCTOU window so future readers know the mitigation is operational, not technical. Tests: - `extract_chat_video_paths_async_canonicalize_works` exercises the async refactor with a parent-dir traversal that forces real canonicalize work. - `fetch_remote_video_streaming_rejects_oversized` starts an in-process HTTP server that advertises a Content-Length above the cap and confirms the streaming fetch returns no path and no temp guard. - `chat_request_drops_temp_files_on_completion` issues a `data:video/mp4;base64,...` request, asserts the temp file exists during processing, drops `PreparedChatRequest`, and asserts the temp file is gone. - `scan_insecure_allowlist_dirs_flags_world_writable_directory` / `_passes_strict_directory` cover the writability scan in both polarities. Mirrored as `startup_warns_on_world_writable_allowlist_dir` / `startup_passes_strict_allowlist_dir` in `startup_tests.rs`. Validated: - `cargo build --release --features cuda` clean. - `cargo test --features cuda --lib -- server::media server::routes::chat server::startup multimodal::video server::chat_request` 143 passed / 1 ignored. - `cargo clippy --features cuda --lib --tests -- -D warnings` clean (incidental `cloned_ref_to_slice_refs` lint hits in pre-existing tests cleaned up alongside the new ones). - `cargo fmt --all` clean. Refs:,.
1 parent 19c064d commit c9ff8c6

13 files changed

Lines changed: 1352 additions & 110 deletions

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ tower-http = { version = "0.5", features = ["cors", "trace"] }
100100
tower = { version = "0.4", features = ["util"] }
101101
hyper = { version = "1.1", features = ["server"] }
102102
hyper-util = { version = "0.1", features = ["tokio", "server-auto"] }
103-
reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] }
103+
reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls", "stream"] }
104104
async-stream = "0.3"
105105
fancy-regex = "0.17.0"
106106
toml = "0.8"

src/server/batch/scheduler.rs

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1023,10 +1023,19 @@ impl BatchScheduler {
10231023
options,
10241024
images,
10251025
audio,
1026+
videos,
10261027
response_tx,
10271028
cancelled,
10281029
} => {
1029-
self.enqueue_request(prompt, options, images, audio, response_tx, cancelled);
1030+
self.enqueue_request(
1031+
prompt,
1032+
options,
1033+
images,
1034+
audio,
1035+
videos,
1036+
response_tx,
1037+
cancelled,
1038+
);
10301039
false
10311040
}
10321041
ModelRequest::Shutdown => {
@@ -1042,6 +1051,7 @@ impl BatchScheduler {
10421051
options: ServerGenerateOptions,
10431052
images: Vec<Vec<u8>>,
10441053
audio: Vec<Vec<u8>>,
1054+
videos: Vec<(std::path::PathBuf, Option<f64>)>,
10451055
response_tx: mpsc::Sender<GenerateEvent>,
10461056
cancelled: Arc<AtomicBool>,
10471057
) {
@@ -1067,9 +1077,9 @@ impl BatchScheduler {
10671077
// that refuses to pad/concatenate a cache with no tensors. VLM
10681078
// requests may legitimately start with an empty token list (image
10691079
// tokens are injected later by `prepare_request_vlm_embeddings`),
1070-
// so this guard only applies to pure-text requests without images
1071-
// or audio.
1072-
if prompt_tokens.is_empty() && images.is_empty() && audio.is_empty() {
1080+
// so this guard only applies to pure-text requests without images,
1081+
// audio, or videos.
1082+
if prompt_tokens.is_empty() && images.is_empty() && audio.is_empty() && videos.is_empty() {
10731083
let _ = response_tx.send(GenerateEvent::Error(
10741084
"Empty prompt: request has no input tokens to process".to_string(),
10751085
));
@@ -1096,13 +1106,13 @@ impl BatchScheduler {
10961106
// feature-disabled, no ctx, and race paths), fall through to the
10971107
// cold-allocation path below.
10981108
//
1099-
// VLM / audio requests opt out of the cache path entirely: their
1100-
// pre-injection token stream is not self-describing (image token
1101-
// placeholders expand later inside `prepare_request_vlm_embeddings`),
1102-
// so matching against it risks reusing a KV slice built for a
1103-
// different media payload. Support for image-aware cache keys is
1104-
// tracked separately in issue #425.
1105-
let is_multimodal = !images.is_empty() || !audio.is_empty();
1109+
// VLM / audio / video requests opt out of the cache path entirely:
1110+
// their pre-injection token stream is not self-describing (image
1111+
// and video frame placeholders expand later inside
1112+
// `prepare_request_vlm_embeddings`), so matching against it risks
1113+
// reusing a KV slice built for a different media payload. Support
1114+
// for image-aware cache keys is tracked separately in issue #425.
1115+
let is_multimodal = !images.is_empty() || !audio.is_empty() || !videos.is_empty();
11061116
let ctx_ref = if is_multimodal {
11071117
None
11081118
} else {
@@ -1139,6 +1149,7 @@ impl BatchScheduler {
11391149
&mut prompt_tokens,
11401150
&images,
11411151
&audio,
1152+
&videos,
11421153
Some(self.vision_caches.as_ref()),
11431154
) {
11441155
Ok(emb) => emb,

src/server/chat_request.rs

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ use super::chat_template_kwargs::{
6565
ChatTemplateKwargs, extract_request_kwargs, merge_server_and_request, strip_rolling_checkpoint,
6666
strip_think_block,
6767
};
68-
use super::media::{extract_chat_audio_data, extract_chat_image_data};
68+
use super::media::{extract_chat_audio_data, extract_chat_image_data, extract_chat_video_paths};
6969
use super::prompt_cache::key::resolve_session_key;
7070
use super::types::ChatCompletionRequest;
7171
use super::types::request::Tool;
@@ -74,6 +74,33 @@ pub(crate) struct PreparedChatRequest {
7474
pub(crate) prompt: String,
7575
pub(crate) image_data: Vec<Vec<u8>>,
7676
pub(crate) audio_data: Vec<Vec<u8>>,
77+
/// Resolved video paths (issue #596). Each entry has been canonicalised
78+
/// and validated against `MLXCEL_VIDEO_DIR_ALLOWLIST`; the paired
79+
/// `Option<f64>` is the per-video sampling rate override from
80+
/// [`crate::server::types::request::VideoUrl::fps`].
81+
pub(crate) video_paths: Vec<(std::path::PathBuf, Option<f64>)>,
82+
/// RAII drop guards for every server-owned temp file backing
83+
/// `video_paths` (PR #600 review fix for the temp-file leak).
84+
///
85+
/// We keep these here rather than return them from
86+
/// [`super::media::extract_chat_video_paths`] so the lifetime of the
87+
/// guard equals the lifetime of the response handler: as soon as the
88+
/// last consumer of `PreparedChatRequest` drops it, the temp files
89+
/// vanish from `/tmp` regardless of which return path we took
90+
/// (success, early error, panic).
91+
///
92+
/// The scheduler still receives only `Vec<(PathBuf, Option<f64>)>` —
93+
/// paths are forwarded by value to the worker thread, but ownership of
94+
/// the temp file (i.e., the responsibility for deletion) stays inside
95+
/// `PreparedChatRequest`. ffmpeg reads the file by path during prefill;
96+
/// once that finishes the worker has no further reference to disk and
97+
/// dropping the guards in the HTTP handler is safe.
98+
///
99+
/// Read solely for its `Drop` impl; `dead_code` suppression here is
100+
/// intentional — every other access path would defeat the whole point
101+
/// of the guard.
102+
#[allow(dead_code)]
103+
pub(crate) video_temp_guards: Vec<crate::multimodal::video::TempFile>,
77104
}
78105

79106
/// Dedup set for the "defaulted `preserve_thinking=true`" info log.
@@ -183,15 +210,18 @@ pub(crate) async fn prepare_chat_request_with_cache(
183210
})
184211
};
185212

186-
let (image_data, audio_data) = tokio::join!(
213+
let (image_data, audio_data, (video_paths, video_temp_guards)) = tokio::join!(
187214
extract_chat_image_data(request),
188215
extract_chat_audio_data(request),
216+
extract_chat_video_paths(request),
189217
);
190218

191219
PreparedChatRequest {
192220
prompt,
193221
image_data,
194222
audio_data,
223+
video_paths,
224+
video_temp_guards,
195225
}
196226
}
197227

src/server/chat_request_tests.rs

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1215,3 +1215,84 @@ fn tool_call_message_history_round_trips_and_digests() {
12151215
"tool reorder must change the digest"
12161216
);
12171217
}
1218+
1219+
// ---------------------------------------------------------------------------
1220+
// PR #600 review fix (HIGH-1): temp-file lifetime tied to PreparedChatRequest
1221+
// ---------------------------------------------------------------------------
1222+
1223+
/// A `data:video/...;base64,...` URL produces a server-owned temp file that
1224+
/// must exist while `PreparedChatRequest` is alive and disappear once it
1225+
/// drops. The previous wiring leaked the file because nothing held a Drop
1226+
/// guard — every request added up to 1 GiB of `/tmp` debris.
1227+
///
1228+
/// This test does not require ffmpeg or a real video; it only checks the
1229+
/// resolver-to-guard wiring. The cap-checking and ffmpeg path are exercised
1230+
/// elsewhere.
1231+
#[tokio::test]
1232+
async fn chat_request_drops_temp_files_on_completion() {
1233+
use base64::Engine;
1234+
1235+
// Tiny payload — base64 of "hi" — is enough to trigger the temp-file
1236+
// write path without straining CI.
1237+
let payload = base64::engine::general_purpose::STANDARD.encode(b"hi");
1238+
let data_url = format!("data:video/mp4;base64,{payload}");
1239+
1240+
let request = ChatCompletionRequest {
1241+
model: "test-model".to_string(),
1242+
messages: vec![Message {
1243+
role: Role::User,
1244+
content: MessageContent::Parts(vec![ContentPart::VideoUrl {
1245+
video_url: crate::server::types::request::VideoUrl {
1246+
url: data_url,
1247+
fps: None,
1248+
},
1249+
}]),
1250+
name: None,
1251+
tool_call_id: None,
1252+
tool_calls: None,
1253+
}],
1254+
stream: false,
1255+
stream_options: None,
1256+
logprobs: None,
1257+
top_logprobs: None,
1258+
tools: None,
1259+
tool_choice: None,
1260+
parallel_tool_calls: None,
1261+
chat_template_kwargs: None,
1262+
extra_body: None,
1263+
prompt_cache_key: None,
1264+
user: None,
1265+
extra_body_fields: serde_json::Map::new(),
1266+
response_format: None,
1267+
params: SamplingParams::default(),
1268+
};
1269+
1270+
// Render with a no-op template — we only care about the media plumbing.
1271+
let processor = ChatTemplateProcessor::with_template(
1272+
"{% for m in messages %}{{ m.content }}{% endfor %}".to_string(),
1273+
);
1274+
let prepared = prepare_chat_request(&processor, &request, None).await;
1275+
1276+
// Resolved: exactly one temp path, exactly one guard.
1277+
assert_eq!(prepared.video_paths.len(), 1, "data:video URL must resolve");
1278+
assert_eq!(
1279+
prepared.video_temp_guards.len(),
1280+
1,
1281+
"data:video URL must yield one Drop guard"
1282+
);
1283+
let temp_path = prepared.video_paths[0].0.clone();
1284+
assert!(
1285+
temp_path.exists(),
1286+
"temp file must exist while PreparedChatRequest is alive"
1287+
);
1288+
1289+
// Drop the prepared struct; the guard's Drop impl should remove the file.
1290+
drop(prepared);
1291+
1292+
// Drop is synchronous and the file removal happens inside Drop, so the
1293+
// file must be gone by the time we reach this line.
1294+
assert!(
1295+
!temp_path.exists(),
1296+
"temp file must be removed once PreparedChatRequest drops; remained at {temp_path:?}"
1297+
);
1298+
}

0 commit comments

Comments
 (0)