Skip to content
Discussion options

You must be logged in to vote

FasterTransformer cannot parse the model architecture. So, for a new model, you may need to develop the model (different model architectures and cuda kernels) first, and then encapsulate it by the triton backend. Then, you can call it in triton.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@siddharth-mavani
Comment options

Answer selected by byshiue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants