feat: Python binding to download a model. #3593

grahamking · 2025-10-13T21:06:35Z

fetch_llm(hf_repo_name) -> path is a new Python binding that ensures a model is downloaded and available on local disk. Returns the path. Uses modelexpress if the server is running.
Use that in vllm worker to download the model before starting vllm, and run vllm off the local copy. This avoids vllm downloading it again.
In Rust expect the model to exist on disk before LocalModelBuilder::build is called.
Tidy up some Rust &str vs Path.

Summary by CodeRabbit

New Features
- Automatically downloads and prepares models before startup, ensuring they’re available whether provided as local paths or remote names.
- Adds a Python API to fetch a model and return its local path.
Refactor
- Standardizes reporting to use the served model name across metrics and endpoints for clearer observability.
- Streamlines local path resolution and model loading, reducing setup friction and improving reliability.
Documentation
- Documents the new model fetch API with example usage and clarifies local model loading behavior.

- `fetch_llm(hf_repo_name) -> path` is a new Python binding that ensures a model is downloaded and available on local disk. Returns the path. Uses modelexpress if the server is running. - Use that in `vllm` worker to download the model before starting vllm, and run vllm off the local copy. This avoids vllm downloading it again. - In Rust expect the model to exist on disk before `LocalModelBuilder::build` is called. - Tidy up some Rust `&str` vs `Path`. Signed-off-by: Graham King <[email protected]>

coderabbitai · 2025-10-13T21:14:21Z

Walkthrough

Introduces model pre-fetching and disk-based loading across components. Adds a Python-exposed fetch_llm, updates builders to resolve local/remote paths before engine startup, switches to served_model_name in vLLM metrics and endpoints, and refactors model card logic from repo-based to disk-based loaders. Applies conditional model_path assignment in multiple init paths.

Changes

Cohort / File(s)	Summary
vLLM worker init and labeling `components/src/dynamo/vllm/main.py`	Pre-download model via fetch_llm before vLLM start; set served_model_name if unset; replace config.model usages with served_model_name in logs, metrics, health, and endpoint labels; adjust prefill/decode serve_endpoint configs; remove unconditional init log.
Runner init & builder path handling `launch/dynamo-run/src/lib.rs`, `lib/bindings/python/rust/llm/entrypoint.rs`	Resolve model_path early: use local path or fetch remote via LocalModel::fetch; set builder.model_path only when resolved; move model_path assignment after builder creation with guards; note that only worker holds model path.
Python bindings & API `lib/bindings/python/rust/lib.rs`, `lib/bindings/python/src/dynamo/_core.pyi`, `lib/bindings/python/src/dynamo/llm/__init__.py`	Add LocalModel import; expose new pyfunction fetch_llm(remote_name) and register in _core; update register_llm to resolve path or fetch; add async typing stub fetch_llm(remote_name: str) -> str; re-export fetch_llm in dynamo.llm.
Local model builder & fetch `lib/llm/src/local_model.rs`	Change LocalModelBuilder::model_path to require PathBuf and auto-set served name if absent; preserve model_name only when Some; canonicalize and require on-disk existence with clearer errors; set LocalModel::full_path from resolved path; add LocalModel::fetch(remote_name, ignore_weights) -> PathBuf.
Model card disk-based refactor `lib/llm/src/model_card.rs`	Replace repo-based loaders with disk-based variants: from_repo→from_disk across ModelInfoType, GenerationConfig, TokenizerKind, PromptFormatterArtifact; add ModelDeploymentCard::from_repo_checkout; derive display_name/slug from local path; update error contexts and path-based context_length.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as Caller (Python)
  participant P as dynamo.llm (Python)
  participant R as _core (Rust/PyO3)
  participant LM as LocalModel
  participant FS as Disk
  participant W as vLLM Worker

  U->>P: register_llm(model_or_remote)
  alt path exists
    P->>R: register_llm(path)
    R->>FS: check path
  else remote name
    P->>R: fetch_llm(remote_name)
    R->>LM: LocalModel::fetch(remote_name)
    LM->>FS: download to cache
    LM-->>R: local path
    R->>FS: check path
  end
  R-->>P: resolved local path
  P-->>W: start worker with served_model_name + path
  W->>FS: load model from disk
  W-->>U: ready (health/metrics use served_model_name)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I hopped through paths, from cloud to cache,
A model burrow, fetched in a flash.
Served name set, the gauges sing,
Disk-born cards now take the wing.
With tidy trails and queues aligned—
Thump-thump: deploy, and peace of mind. 🐇✨

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description does not follow the repository’s required template, as it lacks the prescribed Overview, Details, Where should the reviewer start, and Related Issues sections and instead provides only a bulleted summary of changes.	Please restructure the description to match the template by adding an Overview that describes the PR purpose, a Details section listing the changes, a Where should the reviewer start section pointing to key files, and a Related Issues section referencing any issue numbers.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly highlights the primary change of adding a Python binding to download a model and is concise and directly related to the core feature introduced in this pull request.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (6)

lib/bindings/python/rust/llm/entrypoint.rs (1)

207-209: Guarded model_path set: good. Consider canonicalized-name consistency.

This aligns with the new builder API. Minor: LocalModelBuilder::model_path sets the default served name from the raw path, while build() later canonicalizes the path. This can yield a non-canonical default name. Consider moving the “default name from path” logic to build() (post-canonicalization) or canonicalizing before calling model_path to keep names consistent.

components/src/dynamo/vllm/main.py (1)

87-94: Pre-fetch flow is correct; minor UX tweak

Logic properly sets served_model_name to the HF repo and swaps model to the local path. Consider logging the resolved local path after fetch for traceability.
lib/llm/src/local_model.rs (2)
89-99: Defaulting name from non-canonical path

Defaulting the model_name here uses the raw path; build() later canonicalizes, causing potential name/path mismatch. Prefer to remove this side-effect and set a default name post-canonicalization in build().
-    /// The path must exist
-    pub fn model_path(&mut self, model_path: PathBuf) -> &mut Self {
-        // The served model name defaults to the full model path.
-        // This matches what vllm and sglang do.
-        if self.model_name.is_none() {
-            self.model_name = Some(model_path.display().to_string());
-        }
-        self.model_path = Some(model_path);
-        self
-    }
+    /// The path must exist
+    pub fn model_path(&mut self, model_path: PathBuf) -> &mut Self {
+        self.model_path = Some(model_path);
+        self
+    }
Outside this hunk (in build()), add a fallback to set the name from the canonicalized path when None:
// after loading card and having `model_path`
match &self.model_name {
    Some(model_name) => card.set_name(model_name),
    None => card.set_name(&model_path.display().to_string()),
}
262-266: Set default name after canonicalization

If you remove the earlier defaulting in model_path(), set the name here using the canonicalized path when model_name is None to keep identifiers stable.
-        if let Some(model_name) = &self.model_name {
-            card.set_name(model_name);
-        }
+        match &self.model_name {
+            Some(model_name) => card.set_name(model_name),
+            None => card.set_name(&model_path.display().to_string()),
+        }
lib/llm/src/model_card.rs (1)
770-778: Error message wording: “repo” → “directory”

For clarity now that we’re loading from disk, prefer “directory” in error messages.
-            format!(
-                "unable to extract config.json from repo {}",
-                directory.display()
-            )
+            format!(
+                "unable to extract config.json from directory {}",
+                directory.display()
+            )
-        let f = CheckedFile::from_disk(directory.join("generation_config.json")).with_context(
-            || {
-                format!(
-                    "unable to extract generation_config from repo {}",
-                    directory.display()
-                )
-            },
-        )?;
+        let f = CheckedFile::from_disk(directory.join("generation_config.json")).with_context(
+            || {
+                format!(
+                    "unable to extract generation_config from directory {}",
+                    directory.display()
+                )
+            },
+        )?;
-            format!(
-                "unable to extract tokenizer kind from repo {}",
-                directory.display()
-            )
+            format!(
+                "unable to extract tokenizer kind from directory {}",
+                directory.display()
+            )
Also applies to: 782-792, 814-822
launch/dynamo-run/src/lib.rs (1)
28-42: Download flow looks good; consider a minor optimization and Windows check.

Use Option::or_else to avoid cloning model_path_flag when model_path_pos is set.

Verify on Windows that passing a Hugging Face repo id still reaches fetch as “Qwen/Qwen3-0.6B” (not with backslashes) when built from a PathBuf + display(). If not guaranteed, prefer preserving the original CLI string.
-    let maybe_remote_repo = flags
-        .model_path_pos
-        .clone()
-        .or(flags.model_path_flag.clone());
+    let maybe_remote_repo = flags
+        .model_path_pos
+        .clone()
+        .or_else(|| flags.model_path_flag.clone());

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4ff6f0 and 2a00fa2.

📒 Files selected for processing (8)

components/src/dynamo/vllm/main.py (6 hunks)
launch/dynamo-run/src/lib.rs (2 hunks)
lib/bindings/python/rust/lib.rs (7 hunks)
lib/bindings/python/rust/llm/entrypoint.rs (1 hunks)
lib/bindings/python/src/dynamo/_core.pyi (1 hunks)
lib/bindings/python/src/dynamo/llm/__init__.py (1 hunks)
lib/llm/src/local_model.rs (5 hunks)
lib/llm/src/model_card.rs (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (8)

lib/bindings/python/src/dynamo/_core.pyi (1)

lib/bindings/python/rust/lib.rs (1)

fetch_llm (270-275)

launch/dynamo-run/src/lib.rs (1)

lib/llm/src/local_model.rs (2)

model_path (91-99)

fetch (331-333)

lib/bindings/python/rust/lib.rs (3)

lib/bindings/python/src/dynamo/_core.pyi (1)

fetch_llm (895-900)

lib/llm/src/local_model.rs (3)

model_path (91-99)

fetch (331-333)

default (62-85)

lib/bindings/python/rust/llm/entrypoint.rs (2)

from (95-101)

to_pyerr (292-297)

lib/bindings/python/src/dynamo/llm/__init__.py (2)

lib/bindings/python/rust/lib.rs (2)

_core (97-164)

fetch_llm (270-275)

lib/bindings/python/src/dynamo/_core.pyi (1)

fetch_llm (895-900)

lib/bindings/python/rust/llm/entrypoint.rs (1)

lib/llm/src/local_model.rs (1)

model_path (91-99)

lib/llm/src/model_card.rs (4)

lib/llm/src/local_model.rs (2)

path (339-341)

display_name (344-346)

lib/llm/src/common/checked_file.rs (2)

path (71-76)

from_disk (43-57)

lib/llm/src/lib.rs (1)

file_json_field (64-101)

lib/runtime/src/slug.rs (1)

from_string (23-25)

lib/llm/src/local_model.rs (2)

lib/llm/src/model_card.rs (1)

load_from_disk (439-444)

lib/llm/src/hub.rs (1)

from_hf (19-76)

components/src/dynamo/vllm/main.py (3)

lib/bindings/python/rust/lib.rs (2)

fetch_llm (270-275)

serve_endpoint (643-696)

lib/bindings/python/src/dynamo/_core.pyi (2)

fetch_llm (895-900)

serve_endpoint (140-152)

components/src/dynamo/vllm/handlers.py (1)

clear_kv_blocks (73-78)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: trtllm (arm64)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: clippy (.)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: tests (.)
GitHub Check: tests (lib/bindings/python)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (10)

components/src/dynamo/vllm/main.py (5)

24-24: Public re-export of fetch_llm

Import looks correct and matches the binding.

178-183: Init logs updated to served_model_name

Clearer logs tied to the served identity. LGTM.

222-228: Metrics labels keyed by served_model_name

Good switch; avoids path leakage in metrics.

267-268: StatLogger labels set to served_model_name

Consistent with the rest of metrics usage. LGTM.

335-341: Decode endpoint metrics labels

Consistent labeling; matches prefill path. LGTM.

lib/llm/src/local_model.rs (1)

326-334: fetch() passthrough to hub

Simple, clear async helper. LGTM.

lib/llm/src/model_card.rs (2)

488-494: Local path validation and delegation

from_local_path() reads cleanly and enforces directory existence. LGTM.

495-558: Disk-based MDC construction

Sensible defaults, optional artifacts handled gracefully, and context_length inference is robust. LGTM.

lib/bindings/python/src/dynamo/llm/__init__.py (1)

45-45: Re-export fetch_llm

Public API exposure looks good.

launch/dynamo-run/src/lib.rs (1)

62-66: LGTM: Guarded model_path application.

Applying model_path after builder init and only when present is correct.

lib/bindings/python/rust/lib.rs

lib/bindings/python/src/dynamo/_core.pyi

lib/llm/src/local_model.rs

Signed-off-by: Graham King <[email protected]>

grahamking requested review from a team as code owners October 13, 2025 21:06

pull-request-size bot added the size/L label Oct 13, 2025

github-actions bot added the feat label Oct 13, 2025

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

lib/bindings/python/rust/lib.rs Show resolved Hide resolved

lib/bindings/python/rust/lib.rs Show resolved Hide resolved

lib/bindings/python/src/dynamo/_core.pyi Show resolved Hide resolved

lib/llm/src/local_model.rs Outdated Show resolved Hide resolved

Thanks Code Rabbit

b08d1fe

Signed-off-by: Graham King <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 13, 2025 22:01 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 13, 2025 22:02 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Python binding to download a model. #3593

feat: Python binding to download a model. #3593

Uh oh!

grahamking commented Oct 13, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 13, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Python binding to download a model. #3593

Are you sure you want to change the base?

feat: Python binding to download a model. #3593

Uh oh!

Conversation

grahamking commented Oct 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 13, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

grahamking commented Oct 13, 2025 •

edited by coderabbitai bot

Loading