ray-less / rpc-less version for simpler debugging of smaller models #2202
Replies: 4 comments 9 replies
-
|
Beta Was this translation helpful? Give feedback.
-
|
It would also be easier to do precise memory control, if there was a pipeline where actor/ref/rollout were just objects in a single process using the same PyTorch allocator and share KV cache workspace (e.g. between rollout and ref) - mention by Unsloth in https://unsloth.ai/blog/grpo |
Beta Was this translation helpful? Give feedback.
-
|
And a ray-less, single-process, sequential pipeline (which uses Python function calls instead of RPCs) would also serve as the up-to-date baseline (important for demonstrating performance benefits/features/scalability of distributed versions) |
Beta Was this translation helpful? Give feedback.
-
|
rl2 and uvg The simplest and most convenient source I've found. I really like the simplicity and ease of customizing the source. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Debugging distributed workers is hard, and even more so with ray.
Is it possible at all to have some sidekick, synchronous version (so also meaning a single-worker/single-process wrt FSDP) fitting one GPU for interactive debugging? (or a single-worker recipe where you can just drop in
breakpoint()) Ideally it would reuse most of the code used for the distributed version, but be executed in a single process / synchronously, so that interactive debugging is possible, even from terminalOf course, it wouldn't be 100% faithful wrt distributed aspect, but for small models it could be a helpful debugging test bed, where you can just insert
breakpoint()Beta Was this translation helpful? Give feedback.
All reactions