Description
❓ Questions & Help
I was wondering if it is possible to support the SageMaker multi-model deployment using the Triton ensemble of Merlin models.
SageMaker already supports multilpe hosting modes for Model deployment with Triton Inference Server, including the Multi-model endpoints with ensemble hosting mode. I tried to use that hosting mode with the Triton ensembles of Merlin models, but according to the last update of the Merlin SageMaker example implementation #1040 the --model-control-mode=explicit
control mode (required by multiple models hosting for dynamic model loading) was removed.
I hypothesize that the cause of this incompatibility is due to the generated Merlin executor_model
is not a proper Triton ensemble (since its config.pbtxt
file doesn't have the correct platform platform: "ensemble"
, neither the required ensemble_scheduling: {...}
section), but just another Triton model that executes the 0_transformworkflowtriton
and 1_predictpytorchtriton
steps internally. Therefore, the executor_model
it's not automatically recognized as the ensemble of the 0_transformworkflowtriton
and 1_predictpytorchtriton
models to be executed.
EDIT: I realized that in #255 the Triton ensemble runtime was deprecated and changed to the current executor model. It is possible to support the option of exporting the recommender system artifacts as a Triton ensemble, at least for Transformers4rec systems deployment?