-
Notifications
You must be signed in to change notification settings - Fork 638
feat: Python binding to download a model. #3593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- `fetch_llm(hf_repo_name) -> path` is a new Python binding that ensures a model is downloaded and available on local disk. Returns the path. Uses modelexpress if the server is running. - Use that in `vllm` worker to download the model before starting vllm, and run vllm off the local copy. This avoids vllm downloading it again. - In Rust expect the model to exist on disk before `LocalModelBuilder::build` is called. - Tidy up some Rust `&str` vs `Path`. Signed-off-by: Graham King <[email protected]>
WalkthroughIntroduces model pre-fetching and disk-based loading across components. Adds a Python-exposed fetch_llm, updates builders to resolve local/remote paths before engine startup, switches to served_model_name in vLLM metrics and endpoints, and refactors model card logic from repo-based to disk-based loaders. Applies conditional model_path assignment in multiple init paths. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as Caller (Python)
participant P as dynamo.llm (Python)
participant R as _core (Rust/PyO3)
participant LM as LocalModel
participant FS as Disk
participant W as vLLM Worker
U->>P: register_llm(model_or_remote)
alt path exists
P->>R: register_llm(path)
R->>FS: check path
else remote name
P->>R: fetch_llm(remote_name)
R->>LM: LocalModel::fetch(remote_name)
LM->>FS: download to cache
LM-->>R: local path
R->>FS: check path
end
R-->>P: resolved local path
P-->>W: start worker with served_model_name + path
W->>FS: load model from disk
W-->>U: ready (health/metrics use served_model_name)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (6)
lib/bindings/python/rust/llm/entrypoint.rs (1)
207-209
: Guarded model_path set: good. Consider canonicalized-name consistency.This aligns with the new builder API. Minor: LocalModelBuilder::model_path sets the default served name from the raw path, while build() later canonicalizes the path. This can yield a non-canonical default name. Consider moving the “default name from path” logic to build() (post-canonicalization) or canonicalizing before calling model_path to keep names consistent.
components/src/dynamo/vllm/main.py (1)
87-94
: Pre-fetch flow is correct; minor UX tweakLogic properly sets served_model_name to the HF repo and swaps model to the local path. Consider logging the resolved local path after fetch for traceability.
lib/llm/src/local_model.rs (2)
89-99
: Defaulting name from non-canonical pathDefaulting the model_name here uses the raw path; build() later canonicalizes, causing potential name/path mismatch. Prefer to remove this side-effect and set a default name post-canonicalization in build().
- /// The path must exist - pub fn model_path(&mut self, model_path: PathBuf) -> &mut Self { - // The served model name defaults to the full model path. - // This matches what vllm and sglang do. - if self.model_name.is_none() { - self.model_name = Some(model_path.display().to_string()); - } - self.model_path = Some(model_path); - self - } + /// The path must exist + pub fn model_path(&mut self, model_path: PathBuf) -> &mut Self { + self.model_path = Some(model_path); + self + }Outside this hunk (in build()), add a fallback to set the name from the canonicalized path when None:
// after loading card and having `model_path` match &self.model_name { Some(model_name) => card.set_name(model_name), None => card.set_name(&model_path.display().to_string()), }
262-266
: Set default name after canonicalizationIf you remove the earlier defaulting in model_path(), set the name here using the canonicalized path when model_name is None to keep identifiers stable.
- if let Some(model_name) = &self.model_name { - card.set_name(model_name); - } + match &self.model_name { + Some(model_name) => card.set_name(model_name), + None => card.set_name(&model_path.display().to_string()), + }lib/llm/src/model_card.rs (1)
770-778
: Error message wording: “repo” → “directory”For clarity now that we’re loading from disk, prefer “directory” in error messages.
- format!( - "unable to extract config.json from repo {}", - directory.display() - ) + format!( + "unable to extract config.json from directory {}", + directory.display() + )- let f = CheckedFile::from_disk(directory.join("generation_config.json")).with_context( - || { - format!( - "unable to extract generation_config from repo {}", - directory.display() - ) - }, - )?; + let f = CheckedFile::from_disk(directory.join("generation_config.json")).with_context( + || { + format!( + "unable to extract generation_config from directory {}", + directory.display() + ) + }, + )?;- format!( - "unable to extract tokenizer kind from repo {}", - directory.display() - ) + format!( + "unable to extract tokenizer kind from directory {}", + directory.display() + )Also applies to: 782-792, 814-822
launch/dynamo-run/src/lib.rs (1)
28-42
: Download flow looks good; consider a minor optimization and Windows check.
- Use Option::or_else to avoid cloning model_path_flag when model_path_pos is set.
- Verify on Windows that passing a Hugging Face repo id still reaches fetch as “Qwen/Qwen3-0.6B” (not with backslashes) when built from a PathBuf + display(). If not guaranteed, prefer preserving the original CLI string.
- let maybe_remote_repo = flags - .model_path_pos - .clone() - .or(flags.model_path_flag.clone()); + let maybe_remote_repo = flags + .model_path_pos + .clone() + .or_else(|| flags.model_path_flag.clone());
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
components/src/dynamo/vllm/main.py
(6 hunks)launch/dynamo-run/src/lib.rs
(2 hunks)lib/bindings/python/rust/lib.rs
(7 hunks)lib/bindings/python/rust/llm/entrypoint.rs
(1 hunks)lib/bindings/python/src/dynamo/_core.pyi
(1 hunks)lib/bindings/python/src/dynamo/llm/__init__.py
(1 hunks)lib/llm/src/local_model.rs
(5 hunks)lib/llm/src/model_card.rs
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
lib/bindings/python/src/dynamo/_core.pyi (1)
lib/bindings/python/rust/lib.rs (1)
fetch_llm
(270-275)
launch/dynamo-run/src/lib.rs (1)
lib/llm/src/local_model.rs (2)
model_path
(91-99)fetch
(331-333)
lib/bindings/python/rust/lib.rs (3)
lib/bindings/python/src/dynamo/_core.pyi (1)
fetch_llm
(895-900)lib/llm/src/local_model.rs (3)
model_path
(91-99)fetch
(331-333)default
(62-85)lib/bindings/python/rust/llm/entrypoint.rs (2)
from
(95-101)to_pyerr
(292-297)
lib/bindings/python/src/dynamo/llm/__init__.py (2)
lib/bindings/python/rust/lib.rs (2)
_core
(97-164)fetch_llm
(270-275)lib/bindings/python/src/dynamo/_core.pyi (1)
fetch_llm
(895-900)
lib/bindings/python/rust/llm/entrypoint.rs (1)
lib/llm/src/local_model.rs (1)
model_path
(91-99)
lib/llm/src/model_card.rs (4)
lib/llm/src/local_model.rs (2)
path
(339-341)display_name
(344-346)lib/llm/src/common/checked_file.rs (2)
path
(71-76)from_disk
(43-57)lib/llm/src/lib.rs (1)
file_json_field
(64-101)lib/runtime/src/slug.rs (1)
from_string
(23-25)
lib/llm/src/local_model.rs (2)
lib/llm/src/model_card.rs (1)
load_from_disk
(439-444)lib/llm/src/hub.rs (1)
from_hf
(19-76)
components/src/dynamo/vllm/main.py (3)
lib/bindings/python/rust/lib.rs (2)
fetch_llm
(270-275)serve_endpoint
(643-696)lib/bindings/python/src/dynamo/_core.pyi (2)
fetch_llm
(895-900)serve_endpoint
(140-152)components/src/dynamo/vllm/handlers.py (1)
clear_kv_blocks
(73-78)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: trtllm (arm64)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: clippy (.)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: tests (.)
- GitHub Check: tests (lib/bindings/python)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (10)
components/src/dynamo/vllm/main.py (5)
24-24
: Public re-export of fetch_llmImport looks correct and matches the binding.
178-183
: Init logs updated to served_model_nameClearer logs tied to the served identity. LGTM.
222-228
: Metrics labels keyed by served_model_nameGood switch; avoids path leakage in metrics.
267-268
: StatLogger labels set to served_model_nameConsistent with the rest of metrics usage. LGTM.
335-341
: Decode endpoint metrics labelsConsistent labeling; matches prefill path. LGTM.
lib/llm/src/local_model.rs (1)
326-334
: fetch() passthrough to hubSimple, clear async helper. LGTM.
lib/llm/src/model_card.rs (2)
488-494
: Local path validation and delegationfrom_local_path() reads cleanly and enforces directory existence. LGTM.
495-558
: Disk-based MDC constructionSensible defaults, optional artifacts handled gracefully, and context_length inference is robust. LGTM.
lib/bindings/python/src/dynamo/llm/__init__.py (1)
45-45
: Re-export fetch_llmPublic API exposure looks good.
launch/dynamo-run/src/lib.rs (1)
62-66
: LGTM: Guarded model_path application.Applying model_path after builder init and only when present is correct.
Signed-off-by: Graham King <[email protected]>
fetch_llm(hf_repo_name) -> path
is a new Python binding that ensures a model is downloaded and available on local disk. Returns the path. Uses modelexpress if the server is running.Use that in
vllm
worker to download the model before starting vllm, and run vllm off the local copy. This avoids vllm downloading it again.In Rust expect the model to exist on disk before
LocalModelBuilder::build
is called.Tidy up some Rust
&str
vsPath
.Summary by CodeRabbit
New Features
Refactor
Documentation