Fix Nano Nemotron VL regressions#38655
Fix Nano Nemotron VL regressions#38655netanel-haber wants to merge 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request introduces support for the Nemotron Nano VL v2 model and refactors several multimodal components. Key changes include the addition of a run_hf flag in the VLM test utility to allow for smoketests that skip Hugging Face output comparison, and the implementation of sound_config for audio support in the Nemotron model executor. Additionally, video support logic was simplified across the Nemotron and Radio models, and the test registry was updated to use the official Hugging Face model ID for Nemotron Nano VL v2. I have no feedback to provide.
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Use `with_hf_config(text_config)` in mamba state helpers to avoid deepcopying live parameters in `static_forward_context` so we don't get an error from a `__new__` mismatch between `BasevLLMParameter` and `torch.nn.Parameter`. Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
… prevent tokenizer `RuntimeError: Already borrowed` Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
c2c96ce to
a2e674a
Compare
tomeras91
left a comment
There was a problem hiding this comment.
Overall looks good. Added a few small comments
Also, let's wait for @DarkLight1337's review for the smoke tests approach without running HF
| @cached_property | ||
| def supports_video(self): | ||
| return self.get_hf_processor().supports_video | ||
| return True |
There was a problem hiding this comment.
Does this model really always support video?
| image_limit = {"image": None} | ||
| video_limit = {"video": None} if self.supports_video else {} | ||
| audio_limit = {"audio": None} if self.audio_extractor is not None else {} | ||
| audio_limit = {"audio": None} if self.sound_config is not None else {} |
There was a problem hiding this comment.
Does it make more sense to have something like
audio_limit = {"audio": None} if self.supports_sound else {}?
| target_channels = None | ||
| if extractor := self.audio_extractor: | ||
| target_sr = extractor.sampling_rate | ||
| if config := self.sound_config: |
There was a problem hiding this comment.
nit: This syntax seems redundant..
if self.sound_config:
target_sr = self.sound_config.sampling_rate| @@ -226,20 +225,24 @@ | |||
| def audio_extractor(self) -> ParakeetExtractor | None: | |||
There was a problem hiding this comment.
I see audio_extractor is only ever used to:
- check if it is not None. I guess this is equivalent to checking if the model supports audio? If so, that's another reason to have a
supports_audioproperty - access
extractor.sampling_rateandextractor.audio_length(). I seesampling_rateis available in thesound_config. Isaudio_length()also available from the config? If so, do we need theaudio_extractor?
| fields = self._get_image_fields_config(hf_inputs) | ||
| if self.info.supports_video: | ||
| fields |= self._get_video_fields_config(hf_inputs) | ||
| if self.info.audio_extractor: |
There was a problem hiding this comment.
Why is it OK to access audio_extractor here? Doesn't it call get_hf_processer() which may trigger the tokenizer RuntimeError?
In any case, I think it's best to have the same implementation for "is audio supported" all across this file, as mentioned in previous comments
| vllm_runner_kwargs: dict[str, Any] | None, | ||
| hf_model_kwargs: dict[str, Any] | None, | ||
| patch_hf_runner: Callable[[HfRunner], HfRunner] | None, | ||
| run_hf: bool = True, |
There was a problem hiding this comment.
Let's just define a separate test file for the model rather than adding it to the common tests
Fixes two recent Nano Nemotron VL regressions:
VllmConfigin the mamba state helpers. Since [HMA]Move hybrid blksize to update_block_size_for_backend to fix attn supported block size is not 16 issue #37467,get_mamba_state_shape_from_config()runs during worker startup, and deep-copying the full config can now fail on live parameters with aBasevLLMParameter.__new__/torch.nn.Parameter.__deepcopy__mismatch.get_hf_processor()in metadata hot paths, which avoids tokenizerRuntimeError: Already borrowed, likely exposed by [Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop #34789.Also adds
nano-nemotron-vlsmoke tests for init and dummy-weight generation only, and skips HF comparison.Run smoke tests:
python -m pytest -x tests/models/multimodal/generation/test_common.py -k nemotron_nano_vl_v2Reproduction of smoke test doing its job:
_merge_kwargsregression when e812bf7 is reverted on top of this PR branch, failing withAttributeError: 'NanoNemotronVLProcessor' object has no attribute '_merge_kwargs'.