Skip to content

Conversation

grahamking
Copy link
Contributor

@grahamking grahamking commented Oct 13, 2025

  • fetch_llm(hf_repo_name) -> path is a new Python binding that ensures a model is downloaded and available on local disk. Returns the path. Uses modelexpress if the server is running.

  • Use that in vllm worker to download the model before starting vllm, and run vllm off the local copy. This avoids vllm downloading it again.

  • In Rust expect the model to exist on disk before LocalModelBuilder::build is called.

  • Tidy up some Rust &str vs Path.

Summary by CodeRabbit

  • New Features

    • Automatically downloads and prepares models before startup, ensuring they’re available whether provided as local paths or remote names.
    • Adds a Python API to fetch a model and return its local path.
  • Refactor

    • Standardizes reporting to use the served model name across metrics and endpoints for clearer observability.
    • Streamlines local path resolution and model loading, reducing setup friction and improving reliability.
  • Documentation

    • Documents the new model fetch API with example usage and clarifies local model loading behavior.

- `fetch_llm(hf_repo_name) -> path` is a new Python binding that ensures
  a model is downloaded and available on local disk. Returns the path.
  Uses modelexpress if the server is running.

- Use that in `vllm` worker to download the model before starting vllm,
  and run vllm off the local copy. This avoids vllm downloading it
  again.

- In Rust expect the model to exist on disk before
  `LocalModelBuilder::build` is called.

- Tidy up some Rust `&str` vs `Path`.

Signed-off-by: Graham King <[email protected]>
@grahamking grahamking requested review from a team as code owners October 13, 2025 21:06
@github-actions github-actions bot added the feat label Oct 13, 2025
Copy link
Contributor

coderabbitai bot commented Oct 13, 2025

Walkthrough

Introduces model pre-fetching and disk-based loading across components. Adds a Python-exposed fetch_llm, updates builders to resolve local/remote paths before engine startup, switches to served_model_name in vLLM metrics and endpoints, and refactors model card logic from repo-based to disk-based loaders. Applies conditional model_path assignment in multiple init paths.

Changes

Cohort / File(s) Summary
vLLM worker init and labeling
components/src/dynamo/vllm/main.py
Pre-download model via fetch_llm before vLLM start; set served_model_name if unset; replace config.model usages with served_model_name in logs, metrics, health, and endpoint labels; adjust prefill/decode serve_endpoint configs; remove unconditional init log.
Runner init & builder path handling
launch/dynamo-run/src/lib.rs, lib/bindings/python/rust/llm/entrypoint.rs
Resolve model_path early: use local path or fetch remote via LocalModel::fetch; set builder.model_path only when resolved; move model_path assignment after builder creation with guards; note that only worker holds model path.
Python bindings & API
lib/bindings/python/rust/lib.rs, lib/bindings/python/src/dynamo/_core.pyi, lib/bindings/python/src/dynamo/llm/__init__.py
Add LocalModel import; expose new pyfunction fetch_llm(remote_name) and register in _core; update register_llm to resolve path or fetch; add async typing stub fetch_llm(remote_name: str) -> str; re-export fetch_llm in dynamo.llm.
Local model builder & fetch
lib/llm/src/local_model.rs
Change LocalModelBuilder::model_path to require PathBuf and auto-set served name if absent; preserve model_name only when Some; canonicalize and require on-disk existence with clearer errors; set LocalModel::full_path from resolved path; add LocalModel::fetch(remote_name, ignore_weights) -> PathBuf.
Model card disk-based refactor
lib/llm/src/model_card.rs
Replace repo-based loaders with disk-based variants: from_repo→from_disk across ModelInfoType, GenerationConfig, TokenizerKind, PromptFormatterArtifact; add ModelDeploymentCard::from_repo_checkout; derive display_name/slug from local path; update error contexts and path-based context_length.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as Caller (Python)
  participant P as dynamo.llm (Python)
  participant R as _core (Rust/PyO3)
  participant LM as LocalModel
  participant FS as Disk
  participant W as vLLM Worker

  U->>P: register_llm(model_or_remote)
  alt path exists
    P->>R: register_llm(path)
    R->>FS: check path
  else remote name
    P->>R: fetch_llm(remote_name)
    R->>LM: LocalModel::fetch(remote_name)
    LM->>FS: download to cache
    LM-->>R: local path
    R->>FS: check path
  end
  R-->>P: resolved local path
  P-->>W: start worker with served_model_name + path
  W->>FS: load model from disk
  W-->>U: ready (health/metrics use served_model_name)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I hopped through paths, from cloud to cache,
A model burrow, fetched in a flash.
Served name set, the gauges sing,
Disk-born cards now take the wing.
With tidy trails and queues aligned—
Thump-thump: deploy, and peace of mind. 🐇✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description does not follow the repository’s required template, as it lacks the prescribed Overview, Details, Where should the reviewer start, and Related Issues sections and instead provides only a bulleted summary of changes. Please restructure the description to match the template by adding an Overview that describes the PR purpose, a Details section listing the changes, a Where should the reviewer start section pointing to key files, and a Related Issues section referencing any issue numbers.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly highlights the primary change of adding a Python binding to download a model and is concise and directly related to the core feature introduced in this pull request.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
lib/bindings/python/rust/llm/entrypoint.rs (1)

207-209: Guarded model_path set: good. Consider canonicalized-name consistency.

This aligns with the new builder API. Minor: LocalModelBuilder::model_path sets the default served name from the raw path, while build() later canonicalizes the path. This can yield a non-canonical default name. Consider moving the “default name from path” logic to build() (post-canonicalization) or canonicalizing before calling model_path to keep names consistent.

components/src/dynamo/vllm/main.py (1)

87-94: Pre-fetch flow is correct; minor UX tweak

Logic properly sets served_model_name to the HF repo and swaps model to the local path. Consider logging the resolved local path after fetch for traceability.

lib/llm/src/local_model.rs (2)

89-99: Defaulting name from non-canonical path

Defaulting the model_name here uses the raw path; build() later canonicalizes, causing potential name/path mismatch. Prefer to remove this side-effect and set a default name post-canonicalization in build().

-    /// The path must exist
-    pub fn model_path(&mut self, model_path: PathBuf) -> &mut Self {
-        // The served model name defaults to the full model path.
-        // This matches what vllm and sglang do.
-        if self.model_name.is_none() {
-            self.model_name = Some(model_path.display().to_string());
-        }
-        self.model_path = Some(model_path);
-        self
-    }
+    /// The path must exist
+    pub fn model_path(&mut self, model_path: PathBuf) -> &mut Self {
+        self.model_path = Some(model_path);
+        self
+    }

Outside this hunk (in build()), add a fallback to set the name from the canonicalized path when None:

// after loading card and having `model_path`
match &self.model_name {
    Some(model_name) => card.set_name(model_name),
    None => card.set_name(&model_path.display().to_string()),
}

262-266: Set default name after canonicalization

If you remove the earlier defaulting in model_path(), set the name here using the canonicalized path when model_name is None to keep identifiers stable.

-        if let Some(model_name) = &self.model_name {
-            card.set_name(model_name);
-        }
+        match &self.model_name {
+            Some(model_name) => card.set_name(model_name),
+            None => card.set_name(&model_path.display().to_string()),
+        }
lib/llm/src/model_card.rs (1)

770-778: Error message wording: “repo” → “directory”

For clarity now that we’re loading from disk, prefer “directory” in error messages.

-            format!(
-                "unable to extract config.json from repo {}",
-                directory.display()
-            )
+            format!(
+                "unable to extract config.json from directory {}",
+                directory.display()
+            )
-        let f = CheckedFile::from_disk(directory.join("generation_config.json")).with_context(
-            || {
-                format!(
-                    "unable to extract generation_config from repo {}",
-                    directory.display()
-                )
-            },
-        )?;
+        let f = CheckedFile::from_disk(directory.join("generation_config.json")).with_context(
+            || {
+                format!(
+                    "unable to extract generation_config from directory {}",
+                    directory.display()
+                )
+            },
+        )?;
-            format!(
-                "unable to extract tokenizer kind from repo {}",
-                directory.display()
-            )
+            format!(
+                "unable to extract tokenizer kind from directory {}",
+                directory.display()
+            )

Also applies to: 782-792, 814-822

launch/dynamo-run/src/lib.rs (1)

28-42: Download flow looks good; consider a minor optimization and Windows check.

  • Use Option::or_else to avoid cloning model_path_flag when model_path_pos is set.
  • Verify on Windows that passing a Hugging Face repo id still reaches fetch as “Qwen/Qwen3-0.6B” (not with backslashes) when built from a PathBuf + display(). If not guaranteed, prefer preserving the original CLI string.
-    let maybe_remote_repo = flags
-        .model_path_pos
-        .clone()
-        .or(flags.model_path_flag.clone());
+    let maybe_remote_repo = flags
+        .model_path_pos
+        .clone()
+        .or_else(|| flags.model_path_flag.clone());
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4ff6f0 and 2a00fa2.

📒 Files selected for processing (8)
  • components/src/dynamo/vllm/main.py (6 hunks)
  • launch/dynamo-run/src/lib.rs (2 hunks)
  • lib/bindings/python/rust/lib.rs (7 hunks)
  • lib/bindings/python/rust/llm/entrypoint.rs (1 hunks)
  • lib/bindings/python/src/dynamo/_core.pyi (1 hunks)
  • lib/bindings/python/src/dynamo/llm/__init__.py (1 hunks)
  • lib/llm/src/local_model.rs (5 hunks)
  • lib/llm/src/model_card.rs (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
lib/bindings/python/src/dynamo/_core.pyi (1)
lib/bindings/python/rust/lib.rs (1)
  • fetch_llm (270-275)
launch/dynamo-run/src/lib.rs (1)
lib/llm/src/local_model.rs (2)
  • model_path (91-99)
  • fetch (331-333)
lib/bindings/python/rust/lib.rs (3)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • fetch_llm (895-900)
lib/llm/src/local_model.rs (3)
  • model_path (91-99)
  • fetch (331-333)
  • default (62-85)
lib/bindings/python/rust/llm/entrypoint.rs (2)
  • from (95-101)
  • to_pyerr (292-297)
lib/bindings/python/src/dynamo/llm/__init__.py (2)
lib/bindings/python/rust/lib.rs (2)
  • _core (97-164)
  • fetch_llm (270-275)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • fetch_llm (895-900)
lib/bindings/python/rust/llm/entrypoint.rs (1)
lib/llm/src/local_model.rs (1)
  • model_path (91-99)
lib/llm/src/model_card.rs (4)
lib/llm/src/local_model.rs (2)
  • path (339-341)
  • display_name (344-346)
lib/llm/src/common/checked_file.rs (2)
  • path (71-76)
  • from_disk (43-57)
lib/llm/src/lib.rs (1)
  • file_json_field (64-101)
lib/runtime/src/slug.rs (1)
  • from_string (23-25)
lib/llm/src/local_model.rs (2)
lib/llm/src/model_card.rs (1)
  • load_from_disk (439-444)
lib/llm/src/hub.rs (1)
  • from_hf (19-76)
components/src/dynamo/vllm/main.py (3)
lib/bindings/python/rust/lib.rs (2)
  • fetch_llm (270-275)
  • serve_endpoint (643-696)
lib/bindings/python/src/dynamo/_core.pyi (2)
  • fetch_llm (895-900)
  • serve_endpoint (140-152)
components/src/dynamo/vllm/handlers.py (1)
  • clear_kv_blocks (73-78)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: clippy (.)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (10)
components/src/dynamo/vllm/main.py (5)

24-24: Public re-export of fetch_llm

Import looks correct and matches the binding.


178-183: Init logs updated to served_model_name

Clearer logs tied to the served identity. LGTM.


222-228: Metrics labels keyed by served_model_name

Good switch; avoids path leakage in metrics.


267-268: StatLogger labels set to served_model_name

Consistent with the rest of metrics usage. LGTM.


335-341: Decode endpoint metrics labels

Consistent labeling; matches prefill path. LGTM.

lib/llm/src/local_model.rs (1)

326-334: fetch() passthrough to hub

Simple, clear async helper. LGTM.

lib/llm/src/model_card.rs (2)

488-494: Local path validation and delegation

from_local_path() reads cleanly and enforces directory existence. LGTM.


495-558: Disk-based MDC construction

Sensible defaults, optional artifacts handled gracefully, and context_length inference is robust. LGTM.

lib/bindings/python/src/dynamo/llm/__init__.py (1)

45-45: Re-export fetch_llm

Public API exposure looks good.

launch/dynamo-run/src/lib.rs (1)

62-66: LGTM: Guarded model_path application.

Applying model_path after builder init and only when present is correct.

Signed-off-by: Graham King <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant