Skip to content

[QST] Support Triton ensemble runtime for SageMaker multi-model deployment #393

Open
@mvidela31

Description

@mvidela31

❓ Questions & Help

I was wondering if it is possible to support the SageMaker multi-model deployment using the Triton ensemble of Merlin models.

SageMaker already supports multilpe hosting modes for Model deployment with Triton Inference Server, including the Multi-model endpoints with ensemble hosting mode. I tried to use that hosting mode with the Triton ensembles of Merlin models, but according to the last update of the Merlin SageMaker example implementation #1040 the --model-control-mode=explicit control mode (required by multiple models hosting for dynamic model loading) was removed.

I hypothesize that the cause of this incompatibility is due to the generated Merlin executor_model is not a proper Triton ensemble (since its config.pbtxt file doesn't have the correct platform platform: "ensemble", neither the required ensemble_scheduling: {...} section), but just another Triton model that executes the 0_transformworkflowtriton and 1_predictpytorchtriton steps internally. Therefore, the executor_model it's not automatically recognized as the ensemble of the 0_transformworkflowtriton and 1_predictpytorchtriton models to be executed.

EDIT: I realized that in #255 the Triton ensemble runtime was deprecated and changed to the current executor model. It is possible to support the option of exporting the recommender system artifacts as a Triton ensemble, at least for Transformers4rec systems deployment?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions