-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
State management:
- Keep track of which model(s) is in memory to help with advanced batching (NOT pure FIFO)
- Prioritization?
Queuing
- Inference queue
- Advanced batching -- when the queue contains separate requests for the same model, batch them and run all jobs requesting that model before moving onto the next model (with a max of 15-20 minutes with any one model in memory, if we have other jobs waiting in the queue. This should balance efficiency, i.e. batching, with fairness, i.e. FIFO queuing).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels