Inconsisten documentation and lack of support of LLama-Nemotron-VL

**Describe the bug**

There is inconsistent documentation and missing implementation support for **LLaMA-Nemotron-VL** within the NVIDIA NeMo framework. The current official documentation references model setup, configuration, and inference examples that are either incomplete, outdated, or unsupported in the actual codebase. Additionally, critical pull requests (such as [PR #13819](https://github.com/NVIDIA-NeMo/NeMo/pull/13819)) that were intended to address this issue remain unmerged, leaving the LLaMA-Nemotron-VL pipeline in a partially functional state.

This leads to confusion for users attempting to load or finetune the model, as the documented components (e.g., model config paths, tokenizer references, and visual encoder integration) do not align with the available code in the latest release of NeMo.

---

**Steps/Code to reproduce bug**

1. Follow the setup as per the [official LLaMA-Nemotron-VL documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llama_nemotron_vl.html).  
2. Attempt to initialize the model as shown:  
   ```python
   from nemo.collections.multimodal.models.vlms import LlamaNemotronVLModel
   model = LlamaNemotronVLModel.from_pretrained("nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1")
   ```
3. Observe that the import either fails or the model cannot be loaded due to missing components in the `vlm` module or missing configuration entries.
4. Attempting to run training/inference scripts from the documentation results in attribute errors or unresolved references (e.g., missing `visual_backbone` config keys).

---

**Expected behavior**

The LLaMA-Nemotron-VL model should be fully supported in NeMo, with:
- Corresponding class definitions and configuration files in the `vlm` submodule.  
- A reproducible pipeline for inference and finetuning as described in the documentation.  
- A validated pretrained model checkpoint loadable through the `from_pretrained()` interface.  
- Updated examples aligned with the current repository structure and dependencies.

---

**Environment overview (please complete the following information)**

 - Environment location: Cloud (Azure VM)
 - Method of NeMo install: `pip install nemo_toolkit['all']`
 - Additional attempts: Installed from source (latest main branch, as of Nov 2025)
 - PR reference: https://github.com/NVIDIA-NeMo/NeMo/pull/13819

---

**Environment details**

- OS version: Ubuntu 22.04  
- PyTorch version: 2.4.1  
- Python version: 3.10.14  
- CUDA version: 12.2  
- GPU model: NVIDIA A100 80GB  

---

**Additional context**

- The PR [#13819](https://github.com/NVIDIA-NeMo/NeMo/pull/13819) appears to introduce partial support for LLaMA-Nemotron-VL but has not been merged, leaving users unable to replicate documented workflows.  
- The [documentation page](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llama_nemotron_vl.html) still lists configuration options and usage patterns referencing unimplemented modules.  
- Several users have reported similar issues on GitHub Discussions, but no stable release or example notebook currently demonstrates a working multimodal inference setup for this model.  

This gap between the documentation and repository codebase severely limits reproducibility and adoption of LLaMA-Nemotron-VL within the NeMo ecosystem.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsisten documentation and lack of support of LLama-Nemotron-VL #15023

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsisten documentation and lack of support of LLama-Nemotron-VL #15023

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions