How to deploy a new model based on triton and fastertransformer_backend? #104
-
|
Hello, I am new to fastertransformer_backend and there are still many things that are not very clear to me. I have some questions to consult with you, mainly about how to deploy a new model. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
FasterTransformer cannot parse the model architecture. So, for a new model, you may need to develop the model (different model architectures and cuda kernels) first, and then encapsulate it by the triton backend. Then, you can call it in triton. |
Beta Was this translation helpful? Give feedback.
FasterTransformer cannot parse the model architecture. So, for a new model, you may need to develop the model (different model architectures and cuda kernels) first, and then encapsulate it by the triton backend. Then, you can call it in triton.