When using tensorrt-llm or hugging-face-local, model should be saved as artifact

Currently, for GenAI examples, whenever tensorrt-llm or hugging-face-local models are deployed, the service needs to download the model before starting. In order to accelerate model deployment, we should save the downloaded files of the models as artifacts, and load them from file whenever serving the model:
* For hugging-face-local, model is saved in the HF cache folders, set up on the utils.py file. The path with the model folder should be passed as artifact, and load should use this artifact as parameter, instead of the repository name
* For tensorrt-llm, model engine should be built, exported and saved as artifact. This should accelerate the loading of models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using tensorrt-llm or hugging-face-local, model should be saved as artifact #46

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When using tensorrt-llm or hugging-face-local, model should be saved as artifact #46

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions