-
Notifications
You must be signed in to change notification settings - Fork 178
Closed
Description
Title should say it. But how do we actually scale for more throughput? For example let's say I use just one model for embedding. It's working fine but I need more throughput, do I deploy same model multiple times like embedding-model-1, embedding-model-2 and such? Because from my understanding, there is an internal queue for incoming requests and requests and continuously batched and sent to the model.
Thanks
Metadata
Metadata
Assignees
Labels
No labels