Skip to content

Feature Request: Support NVIDIA Nemotron 3 Nano #1075

@msievers

Description

@msievers

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

I wanted to benchmark the newly released NVIDIA Nemotron 3 Nano model but it seems not be supported by ik_llama.cpp

I'm using the following version of ik_llama.cpp

./build/bin/llama-server --version
version: 4072 (21fc9322)
built with gcc (GCC) 15.2.0 for x86_64-pc-linux-gnu

and this quantisation from Unsloth

https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF?show_file_info=Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL.gguf

Unsloth guide is located at

https://docs.unsloth.ai/models/nemotron-3

When trying the run llama-bench it returns with an error saying

./build/bin/llama-bench -v -m ~/Downloads/Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL.gguf

... verbose output shortened ...

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'nemotron_h_moe'
llama_load_model_from_file: failed to load model

To be honest, I have no idea how much effort is required to support this model, and if the effort is too high, it might not make sense to invest work here. On the other hand, if it turns out to be easy, it would be interesting to be able to compare the performance with llama.cpp CPU-only.

Motivation

NVIDIA Nemotron 3 Nano has a certain amount of potential, at least on paper, and is claimed to be noticeably faster on the GPU than comparable MoE models such as Qwen3-30B-A3B. At least, that is what NVIDIA claims. This would make it an interesting option as a fast MoE model for CPU-only or mixed CPU/GPU inference on weak hardware.

Image

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions