You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, we have been seeing larger divergence between sglang and transformers than with vllm, impacting on policy RL stability. I wanted to see if anyone has been observing similar? Here is sglang compared to vllm with torch inductor disabled:
Without torch inductor the mismatch appears similar.
I have tried: triton attn w/ reduce in fp32, disabling radix cache, disabling cuda graphs, pytorch sampling instead of flashinfer, torch native attention backend without much difference
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, we have been seeing larger divergence between sglang and transformers than with vllm, impacting on policy RL stability. I wanted to see if anyone has been observing similar? Here is sglang compared to vllm with torch inductor disabled:
Without torch inductor the mismatch appears similar.
I have tried: triton attn w/ reduce in fp32, disabling radix cache, disabling cuda graphs, pytorch sampling instead of flashinfer, torch native attention backend without much difference
Script: https://gist.github.com/rawsh/245b3ddd466911d744b2d1b9f409d21b
Beta Was this translation helpful? Give feedback.
All reactions