Fix Nano Nemotron VL regressions#38655

Open

netanel-haber wants to merge 4 commits intovllm-project:mainfrom

netanel-haber:bugfix/nano-nemotron-vl-avoid-vllm-config-deepcopy-and-get-hf-processor

Contributor

netanel-haber commented Mar 31, 2026 •

edited

Loading

Fixes two recent Nano Nemotron VL regressions:

Stop deep-copying VllmConfig in the mamba state helpers. Since [HMA]Move hybrid blksize to update_block_size_for_backend to fix attn supported block size is not 16 issue #37467, get_mamba_state_shape_from_config() runs during worker startup, and deep-copying the full config can now fail on live parameters with a BasevLLMParameter.__new__ / torch.nn.Parameter.__deepcopy__ mismatch.
Avoid get_hf_processor() in metadata hot paths, which avoids tokenizer RuntimeError: Already borrowed, likely exposed by [Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop #34789.

Also adds nano-nemotron-vl smoke tests for init and dummy-weight generation only, and skips HF comparison.

Run smoke tests: python -m pytest -x tests/models/multimodal/generation/test_common.py -k nemotron_nano_vl_v2

Reproduction of smoke test doing its job:

On 353ce77 (before the deepcopy fix), the smoke test fails the deepcopy regression, with the mismatch sig in 1 above.
The same smoke test passes at this PR branch head.
The smoke test also catches the _merge_kwargs regression when e812bf7 is reverted on top of this PR branch, failing with AttributeError: 'NanoNemotronVLProcessor' object has no attribute '_merge_kwargs'.


          unblock dummy weights for video temporal compression in radio

939c5bb

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

mergify bot added the multi-modality label

gemini-code-assist bot reviewed

View reviewed changes

Contributor

gemini-code-assist bot left a comment

Code Review

This pull request introduces support for the Nemotron Nano VL v2 model and refactors several multimodal components. Key changes include the addition of a run_hf flag in the VLM test utility to allow for smoketests that skip Hugging Face output comparison, and the implementation of sound_config for audio support in the Nemotron model executor. Additionally, video support logic was simplified across the Nemotron and Radio models, and the test registry was updated to use the official Hugging Face model ID for Nemotron Nano VL v2. I have no feedback to provide.

netanel-haber added 3 commits

March 31, 2026 13:29


          add vllm-only multimodal smoke coverage for nemotron_nano_vl

353ce77

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>


          Avoid deepcopying VllmConfig in Nano Nemotron VL

a904844

Use `with_hf_config(text_config)` in mamba state helpers to avoid deepcopying
live parameters in `static_forward_context` so we don't get an error from a
`__new__` mismatch between `BasevLLMParameter` and `torch.nn.Parameter`.

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>


          nano_nemotron_vl: Avoid get_hf_processor() in metadata hot paths to…

a2e674a

… prevent tokenizer `RuntimeError: Already borrowed`

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

netanel-haber force-pushed the bugfix/nano-nemotron-vl-avoid-vllm-config-deepcopy-and-get-hf-processor branch from c2c96ce to a2e674a Compare

March 31, 2026 20:29

netanel-haber marked this pull request as ready for review

March 31, 2026 20:54

netanel-haber requested review from DarkLight1337, tomeras91 and ywang96 as code owners

March 31, 2026 20:54

tomeras91 reviewed

View reviewed changes

Member

tomeras91 left a comment

Overall looks good. Added a few small comments

Also, let's wait for @DarkLight1337's review for the smoke tests approach without running HF

vllm/model_executor/models/nano_nemotron_vl.py

                   @cached_property
                   def supports_video(self):
-                      return self.get_hf_processor().supports_video
+                      return True

Member

tomeras91 Mar 31, 2026

Does this model really always support video?

vllm/model_executor/models/nano_nemotron_vl.py

                       image_limit = {"image": None}
                       video_limit = {"video": None} if self.supports_video else {}
-                      audio_limit = {"audio": None} if self.audio_extractor is not None else {}
+                      audio_limit = {"audio": None} if self.sound_config is not None else {}

Member

tomeras91 Mar 31, 2026

Does it make more sense to have something like
audio_limit = {"audio": None} if self.supports_sound else {}?

vllm/model_executor/models/nano_nemotron_vl.py

    
                      target_channels = None

                      if extractor := self.audio_extractor:

                          target_sr = extractor.sampling_rate

                      if config := self.sound_config:

Member

tomeras91 Mar 31, 2026

nit: This syntax seems redundant..

if self.sound_config:
   target_sr = self.sound_config.sampling_rate

vllm/model_executor/models/nano_nemotron_vl.py

		@@ -226,20 +225,24 @@
		def audio_extractor(self) -> ParakeetExtractor \| None:

Member

tomeras91 Mar 31, 2026

I see audio_extractor is only ever used to:

check if it is not None. I guess this is equivalent to checking if the model supports audio? If so, that's another reason to have a supports_audio property
access extractor.sampling_rate and extractor.audio_length(). I see sampling_rate is available in the sound_config. Is audio_length() also available from the config? If so, do we need the audio_extractor?

tomeras91 reviewed

View reviewed changes

vllm/model_executor/models/nano_nemotron_vl.py

                       fields = self._get_image_fields_config(hf_inputs)
                       if self.info.supports_video:
                           fields |= self._get_video_fields_config(hf_inputs)
                       if self.info.audio_extractor:

Member

tomeras91 Mar 31, 2026

Why is it OK to access audio_extractor here? Doesn't it call get_hf_processer() which may trigger the tokenizer RuntimeError?
In any case, I think it's best to have the same implementation for "is audio supported" all across this file, as mentioned in previous comments

DarkLight1337 reviewed

View reviewed changes

tests/models/multimodal/generation/vlm_utils/core.py

                   vllm_runner_kwargs: dict[str, Any] | None,
                   hf_model_kwargs: dict[str, Any] | None,
                   patch_hf_runner: Callable[[HfRunner], HfRunner] | None,
+                  run_hf: bool = True,

Member

DarkLight1337 Apr 1, 2026

Let's just define a separate test file for the model rather than adding it to the common tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

DarkLight1337 DarkLight1337 left review comments

tomeras91 tomeras91 left review comments

ywang96 Awaiting requested review from ywang96 ywang96 is a code owner

+1 more reviewer

gemini-code-assist[bot] gemini-code-assist[bot] left review comments

At least 1 approving review is required to merge this pull request.

Labels