[model, feature] qwen3-omni: add packed sequence support and shared sequence utilities#4304
Open
hbhflw2000 wants to merge 2 commits into
Open
[model, feature] qwen3-omni: add packed sequence support and shared sequence utilities#4304hbhflw2000 wants to merge 2 commits into
hbhflw2000 wants to merge 2 commits into
Conversation
Signed-off-by: hbhflw2000 <417911774@qq.com>
Signed-off-by: hbhflw2000 <417911774@qq.com>
f0f95d8 to
042eed1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Add Qwen3-Omni packed sequence training support and introduce shared raw sequence padding / packed-sequence metadata utilities for the Qwen3-Omni training path.
Changelog
pack_sequences_in_batch=Trueforward-step support.input_idsavailable for model-internal mRoPE while slicing train tensors on CP ranks.training/utils/padding_utils.py.PackedSeqParamsconstruction intraining/utils/packed_seq_utils.py.Design note / RFC
This implementation follows the existing Qwen3-VL packed-padding pattern: pad raw batch sequence tensors to an aligned dense length, build uniform THD
PackedSeqParams, and keep model-specific multimodal / mRoPE handling inside the Qwen3-Omni step and model code.This PR intentionally does not reuse
slice_batch_for_context_parallelfor Qwen3-Omni raw-batch padding. That utility operates after embedding preparation and slicesinputs_embeds, while Qwen3-Omni needs pre-forward raw sequence normalization so the fullinput_idstensor remains available for multimodal placeholder handling and mRoPE.The shared abstraction here is intentionally narrow: compute the padded target sequence length, pad/truncate common raw batch tensors, and construct uniform THD
PackedSeqParams. Model-specific logic such as multimodal merge, CP rank slicing, and mRoPE handling remains in Qwen3-Omni code.ATTENTION: Qwen3-VL code is intentionally left unchanged in this PR. Applying these helpers back to Qwen3-VL can be considered separately with Qwen3-VL-specific regression coverage.
Validation
Unit tests: