The launcher's REST API needs to have a way for the client to fetch a given child's output:
- Since the norm for Kubernetes containers is to mix both STDOUT and STDERR in one log, that would be the right thing to do here.
- Since the output grows over time and can be quite large, the API needs to be designed to return output in chunks. Perhaps something like this: the request has a starting position and a length limit.
- We should also extend the requester so that the vLLM output can be relayed through it, so that a user of the server-requesting Pod sees the vLLM output in the requester's log.
CC. @MikeSpreitzer @lionelvillard