-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
Describe the bug
There is inconsistent documentation and missing implementation support for LLaMA-Nemotron-VL within the NVIDIA NeMo framework. The current official documentation references model setup, configuration, and inference examples that are either incomplete, outdated, or unsupported in the actual codebase. Additionally, critical pull requests (such as PR #13819) that were intended to address this issue remain unmerged, leaving the LLaMA-Nemotron-VL pipeline in a partially functional state.
This leads to confusion for users attempting to load or finetune the model, as the documented components (e.g., model config paths, tokenizer references, and visual encoder integration) do not align with the available code in the latest release of NeMo.
Steps/Code to reproduce bug
- Follow the setup as per the official LLaMA-Nemotron-VL documentation.
- Attempt to initialize the model as shown:
from nemo.collections.multimodal.models.vlms import LlamaNemotronVLModel model = LlamaNemotronVLModel.from_pretrained("nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1")
- Observe that the import either fails or the model cannot be loaded due to missing components in the
vlmmodule or missing configuration entries. - Attempting to run training/inference scripts from the documentation results in attribute errors or unresolved references (e.g., missing
visual_backboneconfig keys).
Expected behavior
The LLaMA-Nemotron-VL model should be fully supported in NeMo, with:
- Corresponding class definitions and configuration files in the
vlmsubmodule. - A reproducible pipeline for inference and finetuning as described in the documentation.
- A validated pretrained model checkpoint loadable through the
from_pretrained()interface. - Updated examples aligned with the current repository structure and dependencies.
Environment overview (please complete the following information)
- Environment location: Cloud (Azure VM)
- Method of NeMo install:
pip install nemo_toolkit['all'] - Additional attempts: Installed from source (latest main branch, as of Nov 2025)
- PR reference: Llama Nemotron VL #13819
Environment details
- OS version: Ubuntu 22.04
- PyTorch version: 2.4.1
- Python version: 3.10.14
- CUDA version: 12.2
- GPU model: NVIDIA A100 80GB
Additional context
- The PR #13819 appears to introduce partial support for LLaMA-Nemotron-VL but has not been merged, leaving users unable to replicate documented workflows.
- The documentation page still lists configuration options and usage patterns referencing unimplemented modules.
- Several users have reported similar issues on GitHub Discussions, but no stable release or example notebook currently demonstrates a working multimodal inference setup for this model.
This gap between the documentation and repository codebase severely limits reproducibility and adoption of LLaMA-Nemotron-VL within the NeMo ecosystem.