-
Notifications
You must be signed in to change notification settings - Fork 651
[rl] Update callsite to init_batch_invariance to pass attention backend. #2176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| def get_vllm_attention_backend() -> AttentionBackendEnum: | ||
| if os.getenv("VLLM_ATTENTION_BACKEND") is None: | ||
| raise RuntimeError("VLLM_ATTENTION_BACKEND is not set.") | ||
| return getattr(AttentionBackendEnum, os.getenv("VLLM_ATTENTION_BACKEND")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VLLM_ATTENTION_BACKEND environment variable has been deprecated by vllm-project/vllm#26315. This PR updates the batch invariant initialization accordingly.
Why we still use this envvar?
Personally I think we should avoid envvar as much as possible, so I'd prefer if we don't have to call
VLLM_BATCH_INVARIANT=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN python3 torchtitan/experiments/rl/unified/simple_rl_multiprocess.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering what error are you seeing if you don't set VLLM_ATTENTION_BACKEND=FLASH_ATTN and don't pass anything into init_batch_invariance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@acisseJZhong the error we saw with init_batch_invariance():
File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/unified/simple_rl_multiprocess.py", line 25, in <module>
from torchtitan.experiments.rl.unified.actors.generator import Generator
File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/unified/actors/generator.py", line 18, in <module>
from torchtitan.experiments.rl.vllm_compat.simple_rl import (
File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/vllm_compat/simple_rl.py", line 43, in <module>
init_batch_invariance()
TypeError: init_batch_invariance() missing 1 required positional argument: 'attention_backend'
and the error when init_batch_invariance(None):
Traceback (most recent call last):
File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/unified/simple_rl_multiprocess.py", line 25, in <module>
from torchtitan.experiments.rl.unified.actors.generator import Generator
File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/unified/actors/generator.py", line 18, in <module>
from torchtitan.experiments.rl.vllm_compat.simple_rl import (
File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/vllm_compat/simple_rl.py", line 43, in <module>
init_batch_invariance(None)
File "/data/users/zhxchen17/vllm/vllm/model_executor/layers/batch_invariant.py", line 1057, in init_batch_invariance
override_envs_for_invariance(attention_backend)
File "/data/users/zhxchen17/vllm/vllm/model_executor/layers/batch_invariant.py", line 1025, in override_envs_for_invariance
raise RuntimeError(error)
RuntimeError: VLLM batch_invariant mode requires an attention backend in ['FLASH_ATTN', 'FLASHINFER', 'FLASH_ATTN_MLA', 'TRITON_MLA'], but got 'None'. Please use --attention-backend or attention_config to set one of the supported backends before enabling batch_invariant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tianyu-l Right now other than envvar there seems no better way inject configs to simple_rl_multiprocess.py.
Other 3 options without using envvar:
- Hardcode FLASH_ATTN backend for now in the main script and make it clear it's hardcoded.
- Make simple_rl_multiprocess.py read some config files which contains the attention backend name.
- Make simple_rl_multiprocess.py accept cmd line arg, but this seems not consistent with the main trainer script, so not considering it atm.
Seems 1. is better for now if we stick with flash attention for a while.
2e155d8 to
3377c05
Compare
Summary: In vllm-project/vllm#26315 and vllm-project/vllm#30704, vllm has deprecated usage of VLLM_ATTENTION_BACKEND, and init_batch_invariance() now takes a required argument for attention backend. So we're updating the callsites according to the latest change from vllm. Test Plan: VLLM_BATCH_INVARIANT=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN python3 torchtitan/experiments/rl/unified/simple_rl_multiprocess.py Reviewers: Subscribers: Tasks: Tags:
3377c05 to
834b405
Compare
Summary:
In vllm-project/vllm#26315 and vllm-project/vllm#30704,
vllm has deprecated usage of VLLM_ATTENTION_BACKEND, and init_batch_invariance() now takes a
required argument for attention backend. So we're updating the callsites according to the
latest change from vllm.
Test Plan:
VLLM_BATCH_INVARIANT=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN python3 torchtitan/experiments/rl/unified/simple_rl_multiprocess.py
Reviewers:
Subscribers:
Tasks:
Tags: