Feature request: Gemma 4 model support

Hi, thank you for the great work on TensorRT-Edge-LLM!

### Detailed description of the requested feature
I'd like to request support for the Gemma 4 model family, particularly Gemma 4 E2B (2B parameters).

We're currently using Qwen3-VL-2B via Edge-LLM on Jetson Orin NX. While Edge-LLM's TensorRT optimization and vocabulary reduction work great with Qwen3-VL, we find that Gemma 4 E2B produces better quality output at the same 2B scale. We're currently running Gemma 4 E2B through llama.cpp as a workaround, but would love to leverage TensorRT optimization.

Vocabulary reduction support would be especially valuable — Gemma 4's 262k vocab makes decoding heavily memory-bandwidth-bound on Orin NX. The vocab reduction feature was a significant speedup for Qwen3-VL, and we'd expect even larger gains for Gemma 4 given the 8x larger default vocab.

I understand Gemma 3 support is currently the priority (#36). I also note that TensorRT-LLM (server-side) has begun merging Gemma 4 text support (https://github.com/NVIDIA/TensorRT-LLM/issues/12808). Filing this to register edge-side interest for future planning.

### Timeline
Nice to have. We have a working llama.cpp path in the meantime.

### Describe alternatives you've considered
- llama.cpp with GGUF Q4_K_XL — currently in use. Works, but misses TensorRT optimization and vocabulary reduction.
- Qwen3-VL-2B via Edge-LLM — currently in use with good performance, but Gemma 4 E2B produces better quality output at the same parameter count.

### Target hardware/use case
Jetson Orin NX 16GB, real-time VLM inference with image input.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Gemma 4 model support #72

Detailed description of the requested feature

Timeline

Describe alternatives you've considered

Target hardware/use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: Gemma 4 model support #72

Description

Detailed description of the requested feature

Timeline

Describe alternatives you've considered

Target hardware/use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions