...these inference server pods will run the launcher process that starts up a VLLM server and puts it to sleep.
What is the pool manager to be implemented in Go responsible for?
- warm inventory of inference server-running pods
- makes sure pool always has a min size of launcher pods ready to start vllm servers
- allocates a launcher pod to the dual pod controller when request comes in, and retrieves pod back when done (stop vllm server)
...these inference server pods will run the launcher process that starts up a VLLM server and puts it to sleep.
What is the pool manager to be implemented in Go responsible for?