Skip to content

"Pool manager" to create a set of pre-warmed/reusable inference server pods #18

@aavarghese

Description

@aavarghese

...these inference server pods will run the launcher process that starts up a VLLM server and puts it to sleep.

What is the pool manager to be implemented in Go responsible for?

  • warm inventory of inference server-running pods
  • makes sure pool always has a min size of launcher pods ready to start vllm servers
  • allocates a launcher pod to the dual pod controller when request comes in, and retrieves pod back when done (stop vllm server)

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions