-
Notifications
You must be signed in to change notification settings - Fork 822
feat: Variable DP rank #5666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Variable DP rank #5666
Conversation
|
👋 Hi agolajko! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThe prefill router's bootstrap_room generation was refactored from random generation to a deterministic encoding scheme. A static AtomicU64 counter now accumulates base values combined with dp_rank to compute bootstrap_room. Prefill decision pool size is now derived from the active router and added to logging context. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: agolajko <agoston.lajko@gmail.com>
2712c22 to
accaafc
Compare
Signed-off-by: agolajko <agoston.lajko@gmail.com>
Signed-off-by: agolajko <agoston.lajko@gmail.com>
Overview:
Encodes prefill dp_rank to
bootstrap_roomso SGLang can use to find the rank to be used for KV cache retrieval during decodeDetails:
Problem
SGLang decode workers use
bootstrap_room % prefill_dp_size(SGLang source) to identify which prefill worker holds their KV cache. Currently, Dynamo assigns bootstrap_room randomly, which doesn't encode the prefill worker information needed for this calculation.This forces decode workers to be routed to the same rank as their corresponding prefill workers to ensure correct KV cache retrieval. As a result, prefill and decode must have matching DP sizes, preventing configurations like 4 prefill workers with 8 decode workers and limiting routing flexibility.
Related Issues:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.