Skip to content

display available cached versions in TGI server error message of Neuron backend #3063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions backends/neuron/server/text_generation_server/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,10 +107,26 @@ def fetch_model(
if not is_cached(model_id, neuron_config):
hub_cache_url = "https://huggingface.co/aws-neuron/optimum-neuron-cache"
neuron_export_url = "https://huggingface.co/docs/optimum-neuron/main/en/guides/export_model#exporting-neuron-models-using-neuronx-tgi"
entries = get_hub_cached_entries(model_id, "inference")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is already called by is_cached: I'd rather avoid having two consecutive calls to the hub.
The is_cached method is never called anywhere else, so maybe you can change its signature to something like has_compatible_entry(neuron_config, entries). That way you can first fetch the entries, check if one is compatible, and otherwise just loop over incompatible entries just like you do.

available_configs = ""
if entries:
config_list = []
for entry in entries:
config = (
f"batch_size={entry['batch_size']}, "
f"sequence_length={entry['sequence_length']}, "
f"num_cores={entry['num_cores']}, "
f"auto_cast_type={entry['auto_cast_type']}"
)
config_list.append(config)
available_configs = "\nAvailable cached configurations for this model:\n- " + "\n- ".join(config_list)
else:
available_configs = "\nNo cached versions are currently available for that model with any configuration."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks quite redundant with the first line of the error message. Do we really need to say something more specific here ?

error_msg = (
f"No cached version found for {model_id} with {neuron_config}."
f"You can start a discussion to request it on {hub_cache_url}"
f"Alternatively, you can export your own neuron model as explained in {neuron_export_url}"
f"{available_configs}"
f"\nYou can start a discussion to request it on {hub_cache_url}"
f"\nAlternatively, you can export your own neuron model as explained in {neuron_export_url}"
)
raise ValueError(error_msg)
logger.warning(
Expand Down