Skip to content

[tinker] Enable multi-lora training for async RL #1647

@SumanthRH

Description

@SumanthRH

Summary

Currently, SkyRL support multi-lora training for synchronous RL with #1617 and #1579, but it still does not support multi-lora for asynchronous RL.

Currently, pause_generation in SkyRL uses vLLM's native pause_generation method which pauses all requests in the vLLM engine scheduler . However, in a multi-tenant setting, we would want to issue a pause_generation request that only pauses generations for the given client.

The goal state is as follows:

  1. Client A and Client B connect to the SkyRL Tinker API server and start a training run
  2. Client A and Client B asynchronously run sample requests with inference and training being non-colocated.
  3. Client A samples enough responses and runs a training update step. Client A uses the generated samples until now and issues forward_backward + optim_step requests.
  4. Client A issues a save_weights_for_sampler request. This issues a pause_generation request, which pauses all requests in-flight for Client A. SkyRL's WorkerDispatch broadcasts the updated LoRA adapter for Client A and issues a resume_generation request to resume generations for Client A. Meanwhile, Client B continues generation as normal.
  5. < training proceeds >

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions