Skip to content

Fix Nano Nemotron VL regressions#38655

Open
netanel-haber wants to merge 4 commits intovllm-project:mainfrom
netanel-haber:bugfix/nano-nemotron-vl-avoid-vllm-config-deepcopy-and-get-hf-processor
Open

Fix Nano Nemotron VL regressions#38655
netanel-haber wants to merge 4 commits intovllm-project:mainfrom
netanel-haber:bugfix/nano-nemotron-vl-avoid-vllm-config-deepcopy-and-get-hf-processor

Conversation

@netanel-haber
Copy link
Copy Markdown
Contributor

@netanel-haber netanel-haber commented Mar 31, 2026

Fixes two recent Nano Nemotron VL regressions:

  1. Stop deep-copying VllmConfig in the mamba state helpers. Since [HMA]Move hybrid blksize to update_block_size_for_backend to fix attn supported block size is not 16 issue #37467, get_mamba_state_shape_from_config() runs during worker startup, and deep-copying the full config can now fail on live parameters with a BasevLLMParameter.__new__ / torch.nn.Parameter.__deepcopy__ mismatch.
  2. Avoid get_hf_processor() in metadata hot paths, which avoids tokenizer RuntimeError: Already borrowed, likely exposed by [Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop #34789.

Also adds nano-nemotron-vl smoke tests for init and dummy-weight generation only, and skips HF comparison.

Run smoke tests: python -m pytest -x tests/models/multimodal/generation/test_common.py -k nemotron_nano_vl_v2

Reproduction of smoke test doing its job:

  1. On 353ce77 (before the deepcopy fix), the smoke test fails the deepcopy regression, with the mismatch sig in 1 above.
  2. The same smoke test passes at this PR branch head.
  3. The smoke test also catches the _merge_kwargs regression when e812bf7 is reverted on top of this PR branch, failing with AttributeError: 'NanoNemotronVLProcessor' object has no attribute '_merge_kwargs'.

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Mar 31, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Nemotron Nano VL v2 model and refactors several multimodal components. Key changes include the addition of a run_hf flag in the VLM test utility to allow for smoketests that skip Hugging Face output comparison, and the implementation of sound_config for audio support in the Nemotron model executor. Additionally, video support logic was simplified across the Nemotron and Radio models, and the test registry was updated to use the official Hugging Face model ID for Nemotron Nano VL v2. I have no feedback to provide.

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Use `with_hf_config(text_config)` in mamba state helpers to avoid deepcopying
live parameters in `static_forward_context` so we don't get an error from a
`__new__` mismatch between `BasevLLMParameter` and `torch.nn.Parameter`.

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
… prevent tokenizer `RuntimeError: Already borrowed`

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
@netanel-haber netanel-haber force-pushed the bugfix/nano-nemotron-vl-avoid-vllm-config-deepcopy-and-get-hf-processor branch from c2c96ce to a2e674a Compare March 31, 2026 20:29
@netanel-haber netanel-haber marked this pull request as ready for review March 31, 2026 20:54
Copy link
Copy Markdown
Member

@tomeras91 tomeras91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Added a few small comments

Also, let's wait for @DarkLight1337's review for the smoke tests approach without running HF

@cached_property
def supports_video(self):
return self.get_hf_processor().supports_video
return True
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this model really always support video?

image_limit = {"image": None}
video_limit = {"video": None} if self.supports_video else {}
audio_limit = {"audio": None} if self.audio_extractor is not None else {}
audio_limit = {"audio": None} if self.sound_config is not None else {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to have something like
audio_limit = {"audio": None} if self.supports_sound else {}?

target_channels = None
if extractor := self.audio_extractor:
target_sr = extractor.sampling_rate
if config := self.sound_config:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This syntax seems redundant..

if self.sound_config:
   target_sr = self.sound_config.sampling_rate

@@ -226,20 +225,24 @@
def audio_extractor(self) -> ParakeetExtractor | None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see audio_extractor is only ever used to:

  1. check if it is not None. I guess this is equivalent to checking if the model supports audio? If so, that's another reason to have a supports_audio property
  2. access extractor.sampling_rate and extractor.audio_length(). I see sampling_rate is available in the sound_config. Is audio_length() also available from the config? If so, do we need the audio_extractor?

fields = self._get_image_fields_config(hf_inputs)
if self.info.supports_video:
fields |= self._get_video_fields_config(hf_inputs)
if self.info.audio_extractor:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it OK to access audio_extractor here? Doesn't it call get_hf_processer() which may trigger the tokenizer RuntimeError?
In any case, I think it's best to have the same implementation for "is audio supported" all across this file, as mentioned in previous comments

vllm_runner_kwargs: dict[str, Any] | None,
hf_model_kwargs: dict[str, Any] | None,
patch_hf_runner: Callable[[HfRunner], HfRunner] | None,
run_hf: bool = True,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just define a separate test file for the model rather than adding it to the common tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-modality Related to multi-modality (#4194)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants