cp: `fix: resolve VLM CI failures for PP recipes and collate_fn (1799)` into `r0.4.0` by svcnvidia-nemo-ci · Pull Request #1889 · NVIDIA-NeMo/Automodel

svcnvidia-nemo-ci · 2026-04-17T02:27:55Z

beep boop [🤖]: Hi @HuiyingLi 👋,

we've cherry picked #1799 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

* fix: resolve VLM CI failures for PP recipes and collate_fn - Add drop_last, collate_fn, and num_workers to validation dataloaders for kimi25vl_medpix, qwen3_vl_moe_235b, and qwen3_5_moe_medpix PP recipes - Fix qwen3_5_moe_medpix model ID placeholder to Qwen/Qwen3.5-397B-A17B - Reduce qwen3_5_35b_neat_packing packed_sequence lengths from 4096 to 2048 to fix OOM - Make drop_overlong optional (default False) in default_collate_fn to prevent batch size reduction that crashes PP schedules - Add ffmpeg to Docker image for torchcodec audio/video decoding - Update max_length to 2048 across VLM PP recipes - Add unit tests for drop_overlong behavior Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * chore: remove Dockerfile change from this PR The ffmpeg install belongs in a separate PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * fix: make kimi_k25_vl_collate_fn truncate instead of drop overlong samples Align kimi_k25_vl_collate_fn with default_collate_fn by adding drop_overlong parameter (default False). With PP, dropping samples reduces batch size below what the schedule expects, causing "Expecting N arg_mbs but got M" errors. Truncation preserves batch size while respecting max_length. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * fix: reduce kimi25vl max_length from 2048 to 1024 Lower sequence length to fit within GPU memory budget for pp_size=8, ep_size=32, local_batch_size=8 configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * fix: delegate truncation to processor in kimi_k25_vl_collate_fn Address Claude review: instead of slicing input_ids after image token expansion (which breaks pixel_values/grid_thws consistency), pass truncation=True and max_length to the processor so it truncates text tokens before expansion. This keeps image data in sync with tokens. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * fix: add post-expansion truncation in kimi_k25_vl_collate_fn Address Claude review: image token expansion can push sequences past max_length even after processor truncation. Add post-expansion truncation for the default (non-drop) path to ensure consistent sequence lengths for torch.stack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * fix: drop image data when truncation cuts into image token region Address Claude review: after post-expansion truncation, verify that all expected image tokens survived. If truncation cut into the image region, skip pixel_values/grid_thws for that sample to avoid mismatch in the model forward pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * fix: replace orphaned image tokens and add truncation-into-image test Address Claude review: when truncation cuts into the image token region and image data is dropped, replace remaining media_token_id tokens with pad_token_id to prevent the model from looking for missing pixel data. Add test covering this path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * test: assert truncation=True in default_collate_fn max_length test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * fix: inline vision processing in Qwen3VLMoeModel to fix image token mismatch Replace super().forward(input_ids=None) with inline vision feature processing. When input_ids=None, HF's get_placeholder_mask falls back to bfloat16 embedding comparison which produces false positives, causing "Image features and image tokens do not match" on the 235B. Inline processing passes the original input_ids for reliable integer comparison. Also fix video token ID (151652 -> 151656) and add PP attention mask size guard to prevent MoE routing errors on non-Stage-0 ranks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * fix: pass mm_token_type_ids to get_rope_index in inline vision path The inline vision processing was passing image_grid_thw as positional arg where get_rope_index expects mm_token_type_ids, causing shape mismatch on 235B: "shape of the mask [2047] does not match [3]". Use keyword arguments to pass mm_token_type_ids from kwargs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * fix: compute expected_image_tokens across all images in grid_thws Address Claude review: for multi-image samples, sum expected tokens across all images in grid_thws, not just the first one. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * fix: resolve linting errors in qwen3_vl_moe tests Fix ruff I001 (import sort), F401 (unused import), F841 (unused variable). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * style: auto-format collate_fns.py for ruff Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * test: add CPU-runnable coverage for qwen3_vl_moe inline vision and PP guard Move the module-level CUDA skip marker onto the classes that actually instantiate the full model, then add two new CPU-runnable test classes (TestQwen3VLMoeModelInlineVisionCpu, TestQwen3VLMoeForConditionalGenerationPpGuardCpu) that invoke forward() as an unbound method with a MagicMock self. This lets codecov/patch see coverage of the new inline vision branches (single-modality, merged images+videos, text-only, dict attention mask) and the PP attention-mask guard / chunked pixel_values dispatch without requiring GPU, which was blocking PR merge at 58% patch coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * refactor: extract _DEFAULT_MERGE_KERNEL to avoid silent drift Replace the hardcoded (2, 2) merge kernel in kimi_k25_vl_collate_fn's post-expansion truncation check with a module-level _DEFAULT_MERGE_KERNEL constant, and reference the same constant from _expand_image_tokens' default. This ensures both computations stay in sync if the vision-tower merge kernel ever changes. Addresses Claude review feedback on commit 86c5535. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> --------- Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

svcnvidia-nemo-ci · 2026-04-17T02:27:57Z

/ok to test e313626

copy-pr-bot · 2026-04-17T02:27:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

akoumpa · 2026-04-19T20:26:32Z

/ok to test c60de72

svcnvidia-nemo-ci requested a review from HuiyingLi April 17, 2026 02:27

svcnvidia-nemo-ci requested review from HuiyingLi, ZhiyuLi-Nvidia, adil-a, akoumpa, hemildesai and pthombre as code owners April 17, 2026 02:27

svcnvidia-nemo-ci added cherry-pick Run CICD Trigger Testing CICD labels Apr 17, 2026

copy-pr-bot bot temporarily deployed to nemo-ci April 17, 2026 02:28 Inactive

copy-pr-bot bot temporarily deployed to test April 17, 2026 02:28 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 17, 2026 05:03 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 17, 2026 05:26 Failure

Merge branch 'r0.4.0' into cherry-pick-1799-r0.4.0

c60de72

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 20:26 Inactive

copy-pr-bot bot temporarily deployed to test April 19, 2026 20:26 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 20:29 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 20:45 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 21:09 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 19, 2026 21:09 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 21:09 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 20, 2026 09:04 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `fix: resolve VLM CI failures for PP recipes and collate_fn (1799)` into `r0.4.0`#1889

cp: `fix: resolve VLM CI failures for PP recipes and collate_fn (1799)` into `r0.4.0`#1889
svcnvidia-nemo-ci wants to merge 2 commits intor0.4.0from
cherry-pick-1799-r0.4.0

svcnvidia-nemo-ci commented Apr 17, 2026

Uh oh!

svcnvidia-nemo-ci commented Apr 17, 2026

Uh oh!

copy-pr-bot bot commented Apr 17, 2026

Uh oh!

akoumpa commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

svcnvidia-nemo-ci commented Apr 17, 2026

Uh oh!

svcnvidia-nemo-ci commented Apr 17, 2026

Uh oh!

copy-pr-bot bot commented Apr 17, 2026

Uh oh!

akoumpa commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants