[serve][llm] Feature: Add W&B Model Loading Callback for LLMEngine #58928
+147
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces a WandBModelLoadingCallback to enable users to load models directly from Weights & Biases (W&B) Artifacts within Ray LLM/Serve deployments.
This feature allows users to specify a model source using the wandb:// scheme, resolving it to a local disk path before the engine configuration is finalized.
Motivation
Currently, Ray LLM primarily supports model sources from Hugging Face or direct local/S3 paths. Integrating W&B Artifacts is a common requirement for teams using W&B for model versioning and tracking. This change leverages the new callback feature to provide this integration point without modifying core LLM server logic.
Implementation Details
WandBArtifactHandler: A helper class is introduced to manage W&B API interaction. It handles downloading artifacts (run.use_artifact(...).download()) and supports optional custom configuration via wandb_base_url and wandb_api_key for private/custom W&B instances.Symlinking: The handler creates a symlink from the W&B client's local cache location to the user-specified local_path. This is efficient as it avoids redundant copying if the artifact is already cached.Configuration Model:WandBDownloaderConfiguses Pydantic validation to enforce that the necessary paths parameter is provided in the callback_kwargs. paths is expected to be a list of (wandb_uri, local_path) tuples.Callback Logic(on_before_node_init): The callback initializes with its configuration validated by WandBDownloaderConfig. It iterates over the provided paths list. For each entry, it uses the WandBArtifactHandler to download the artifact and ensure it is available at the specified local_path.Related issues
This feature was discussed in the Ray LLM Slack channel, where the callback approach was identified as the preferred path forward for integrating custom model sources like W&B Artifacts.
Additional information
How to Use
To use a W&B Artifact as the model source, a user must configure the
LLMConfigto point the model source to the local path where the callback will download the artifact, and then pass theWandBDownloaderand its configuration via callback_config.Example:
API Changes:
Testing: