Feature Request: Support NVIDIA Nemotron 3 Nano

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

I wanted to benchmark the newly released NVIDIA Nemotron 3 Nano model but it seems not be supported by `ik_llama.cpp`

I'm using the following version of `ik_llama.cpp`

```shell
./build/bin/llama-server --version
version: 4072 (21fc9322)
built with gcc (GCC) 15.2.0 for x86_64-pc-linux-gnu
```

and this quantisation from Unsloth

https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF?show_file_info=Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL.gguf

Unsloth guide is located at

https://docs.unsloth.ai/models/nemotron-3

When trying the run `llama-bench` it returns with an error saying

```
./build/bin/llama-bench -v -m ~/Downloads/Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL.gguf

... verbose output shortened ...

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'nemotron_h_moe'
llama_load_model_from_file: failed to load model
```

To be honest, I have no idea how much effort is required to support this model, and if the effort is too high, it might not make sense to invest work here. On the other hand, if it turns out to be easy, it would be interesting to be able to compare the performance with `llama.cpp` CPU-only.

### Motivation

NVIDIA Nemotron 3 Nano has a certain amount of potential, at least on paper, and is claimed to be noticeably faster on the GPU than comparable MoE models such as Qwen3-30B-A3B. At least, that is what NVIDIA claims. This would make it an interesting option as a fast MoE model for CPU-only or mixed CPU/GPU inference on weak hardware.

<img width="1536" height="576" alt="Image" src="https://github.com/user-attachments/assets/17403496-fef2-4495-8a83-559f95ba3c22" />

### Possible Implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Support NVIDIA Nemotron 3 Nano #1075

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature Request: Support NVIDIA Nemotron 3 Nano #1075

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions