Skip to content

Conversation

@zhxchen17
Copy link
Contributor

@zhxchen17 zhxchen17 commented Dec 24, 2025

Summary:

In vllm-project/vllm#26315 and vllm-project/vllm#30704,
vllm has deprecated usage of VLLM_ATTENTION_BACKEND, and init_batch_invariance() now takes a
required argument for attention backend. So we're updating the callsites according to the
latest change from vllm.

Test Plan:

VLLM_BATCH_INVARIANT=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN python3 torchtitan/experiments/rl/unified/simple_rl_multiprocess.py

Reviewers:

Subscribers:

Tasks:

Tags:

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 24, 2025
def get_vllm_attention_backend() -> AttentionBackendEnum:
if os.getenv("VLLM_ATTENTION_BACKEND") is None:
raise RuntimeError("VLLM_ATTENTION_BACKEND is not set.")
return getattr(AttentionBackendEnum, os.getenv("VLLM_ATTENTION_BACKEND"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm-project/vllm#30704 says

The VLLM_ATTENTION_BACKEND environment variable has been deprecated by vllm-project/vllm#26315. This PR updates the batch invariant initialization accordingly.

Why we still use this envvar?

Personally I think we should avoid envvar as much as possible, so I'd prefer if we don't have to call

VLLM_BATCH_INVARIANT=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN python3 torchtitan/experiments/rl/unified/simple_rl_multiprocess.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering what error are you seeing if you don't set VLLM_ATTENTION_BACKEND=FLASH_ATTN and don't pass anything into init_batch_invariance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@acisseJZhong the error we saw with init_batch_invariance():

  File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/unified/simple_rl_multiprocess.py", line 25, in <module>
    from torchtitan.experiments.rl.unified.actors.generator import Generator
  File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/unified/actors/generator.py", line 18, in <module>
    from torchtitan.experiments.rl.vllm_compat.simple_rl import (
  File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/vllm_compat/simple_rl.py", line 43, in <module>
    init_batch_invariance()
TypeError: init_batch_invariance() missing 1 required positional argument: 'attention_backend'

and the error when init_batch_invariance(None):

Traceback (most recent call last):
  File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/unified/simple_rl_multiprocess.py", line 25, in <module>
    from torchtitan.experiments.rl.unified.actors.generator import Generator
  File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/unified/actors/generator.py", line 18, in <module>
    from torchtitan.experiments.rl.vllm_compat.simple_rl import (
  File "/data/users/zhxchen17/torchtitan/torchtitan/experiments/rl/vllm_compat/simple_rl.py", line 43, in <module>
    init_batch_invariance(None)
  File "/data/users/zhxchen17/vllm/vllm/model_executor/layers/batch_invariant.py", line 1057, in init_batch_invariance
    override_envs_for_invariance(attention_backend)
  File "/data/users/zhxchen17/vllm/vllm/model_executor/layers/batch_invariant.py", line 1025, in override_envs_for_invariance
    raise RuntimeError(error)
RuntimeError: VLLM batch_invariant mode requires an attention backend in ['FLASH_ATTN', 'FLASHINFER', 'FLASH_ATTN_MLA', 'TRITON_MLA'], but got 'None'. Please use --attention-backend or attention_config to set one of the supported backends before enabling batch_invariant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tianyu-l Right now other than envvar there seems no better way inject configs to simple_rl_multiprocess.py.
Other 3 options without using envvar:

  1. Hardcode FLASH_ATTN backend for now in the main script and make it clear it's hardcoded.
  2. Make simple_rl_multiprocess.py read some config files which contains the attention backend name.
  3. Make simple_rl_multiprocess.py accept cmd line arg, but this seems not consistent with the main trainer script, so not considering it atm.

Seems 1. is better for now if we stick with flash attention for a while.

@zhxchen17 zhxchen17 force-pushed the zhxchen17/init_batch_invariance branch from 2e155d8 to 3377c05 Compare December 24, 2025 16:57
@zhxchen17 zhxchen17 requested a review from tianyu-l December 24, 2025 16:57
Summary:

In vllm-project/vllm#26315 and vllm-project/vllm#30704,
vllm has deprecated usage of VLLM_ATTENTION_BACKEND, and init_batch_invariance() now takes a
required argument for attention backend. So we're updating the callsites according to the
latest change from vllm.

Test Plan:

VLLM_BATCH_INVARIANT=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN python3 torchtitan/experiments/rl/unified/simple_rl_multiprocess.py

Reviewers:

Subscribers:

Tasks:

Tags:
@zhxchen17 zhxchen17 force-pushed the zhxchen17/init_batch_invariance branch from 3377c05 to 834b405 Compare December 24, 2025 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants