Summary
Currently, SkyRL support multi-lora training for synchronous RL with #1617 and #1579, but it still does not support multi-lora for asynchronous RL.
Currently, pause_generation in SkyRL uses vLLM's native pause_generation method which pauses all requests in the vLLM engine scheduler . However, in a multi-tenant setting, we would want to issue a pause_generation request that only pauses generations for the given client.
The goal state is as follows:
- Client A and Client B connect to the SkyRL Tinker API server and start a training run
- Client A and Client B asynchronously run sample requests with inference and training being non-colocated.
- Client A samples enough responses and runs a training update step. Client A uses the generated samples until now and issues forward_backward + optim_step requests.
- Client A issues a
save_weights_for_sampler request. This issues a pause_generation request, which pauses all requests in-flight for Client A. SkyRL's WorkerDispatch broadcasts the updated LoRA adapter for Client A and issues a resume_generation request to resume generations for Client A. Meanwhile, Client B continues generation as normal.
- < training proceeds >
Summary
Currently, SkyRL support multi-lora training for synchronous RL with #1617 and #1579, but it still does not support multi-lora for asynchronous RL.
Currently,
pause_generationin SkyRL uses vLLM's nativepause_generationmethod which pauses all requests in the vLLM engine scheduler . However, in a multi-tenant setting, we would want to issue apause_generationrequest that only pauses generations for the given client.The goal state is as follows:
save_weights_for_samplerrequest. This issues apause_generationrequest, which pauses all requests in-flight for Client A. SkyRL'sWorkerDispatchbroadcasts the updated LoRA adapter for Client A and issues aresume_generationrequest to resume generations for Client A. Meanwhile, Client B continues generation as normal.