Skip to content

When using tensorrt-llm or hugging-face-local, model should be saved as artifact #46

@passarel

Description

@passarel

Currently, for GenAI examples, whenever tensorrt-llm or hugging-face-local models are deployed, the service needs to download the model before starting. In order to accelerate model deployment, we should save the downloaded files of the models as artifacts, and load them from file whenever serving the model:

  • For hugging-face-local, model is saved in the HF cache folders, set up on the utils.py file. The path with the model folder should be passed as artifact, and load should use this artifact as parameter, instead of the repository name
  • For tensorrt-llm, model engine should be built, exported and saved as artifact. This should accelerate the loading of models

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovements to existing features

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions