Add support for InternLM2 model architecture by amdrajeevp1 · Pull Request #1958 · microsoft/onnxruntime-genai

amdrajeevp1 · 2026-01-30T03:59:28Z

Add InternLM2 Model Support

Adds full support for InternLM2 model family (1.8B, 7B, etc.) to ONNX Runtime GenAI.

Changes

Core Implementation

New InternLM2Model builder (src/python/py/models/builders/internlm.py)
- Extends LlamaModel with InternLM2-specific weight mapping
- GQA support: 16 query heads, 8 KV heads (2:1 ratio)
- Proper grouped QKV weight splitting for GroupQueryAttention operator
Model registration (builder.py, __init__.py, model_type.h)
- Maps InternLM2ForCausalLM → InternLM2Model
- Adds "internlm2" to supported model types

Tokenizer Support

Upstream: Contributed InternLM2Tokenizer support to onnxruntime-extensions#1023 (merged)
Dependencies:
- Updated cmake/deps.txt to onnxruntime-extensions commit 087953cd
- Removed local patch in cmake/external/onnxruntime_external_deps.cmake
Fix: Set correct model_max_length in tokenizer_config.json (prevents 1e30 invalid values)

Documentation

Updated README.md and src/python/py/models/README.md

Usage

Export

python -m onnxruntime_genai.models.builder \
--model_name internlm/internlm2-1_8b \
--output ./internlm2-cpu-int4 \
--precision int4 \
--execution_provider cpu

Inference

import onnxruntime_genai as og
model = og.Model("./internlm2-cpu-int4")
tokenizer = og.Tokenizer(model)
... standard generation code

Testing

✅ InternLM2-1.8B INT4 CPU: export and inference
✅ InternLM2-7B INT4 CPU: export tested
✅ GQA weight splitting verified
✅ Tokenizer recognition working

References

Model: https://huggingface.co/internlm/internlm2-1_8b
Upstream PR: Add InternLM2Tokenizer support (BPE tokenizer) onnxruntime-extensions#1023

This commit adds support for exporting InternLM2 models to ONNX format. Key changes: - Add InternLM2Model class in src/python/py/models/builders/internlm.py - Register InternLM2ForCausalLM architecture in builder.py - Implement grouped/interleaved QKV weight splitting for GQA - Map InternLM2-specific attribute names to base model equivalents - Add documentation and example in examples/python/internlm2/ InternLM2 uses a Llama-based architecture with grouped query attention and a unique grouped/interleaved QKV weight layout. The implementation correctly handles this layout during weight extraction. Tested with: - InternLM2-1.8B (FP32, INT4 RTN, INT4 AWQ) - Model generates coherent text and valid code Model hub: https://huggingface.co/internlm/internlm2-1_8b Paper: https://arxiv.org/abs/2403.17297

src/python/py/models/builders/internlm.py

kunal-vaishnavi · 2026-01-30T05:18:35Z

Thanks for your contribution! Can you also make the following additions for InternLM in alphabetical order?

Add the model type that gets generated in the genai_config.json here:

onnxruntime-genai/src/models/model_type.h

Line 15 in 88729a0

    
           static constexpr std::array<std::string_view, 20> LLM = {"chatglm", "decoder", "ernie4_5", "gemma", "gemma2", "gemma3_text", "gpt2", "gptoss", "granite", "llama", "mistral", "nemotron", "olmo", "phi", "phimoe", "phi3", "phi3small", "qwen2", "qwen3", "smollm3"};

Add the model type to the model builder README
Add the model type to the repo's main README

- Add comprehensive MULTI_SIZE_SUPPORT.md documenting 1.8B/7B/20B compatibility - Add export scripts for InternLM2-7B (Bash and PowerShell) - Update README with model size comparison table - Add hardware requirements and performance estimates - Include GPU export examples for 7B model The implementation is architecture-based and works with all InternLM2 sizes: - Dynamically reads config parameters (heads, layers, dimensions) - Adaptive weight splitting based on GQA ratios - No hardcoded model sizes Tested: InternLM2-1.8B Compatible: InternLM2-7B, InternLM2-20B, all Chat variants

- Merge MULTI_SIZE_SUPPORT.md into README.md for single comprehensive guide - Remove export_7b.ps1 and export_7b.sh scripts (examples already in README) - Streamline documentation structure - All export commands and multi-size information now in one place

- Add AMD copyright to builder.py (after Microsoft copyright) - Add AMD copyright to builders/__init__.py (after Microsoft copyright) - Update internlm.py with AMD copyright - Add AMD copyright to README.md

https://huggingface.co/onnx-community/InternLM2-ONNX/ Signed-off-by: Rajeev Patwari <rajeevp@amd.com>

Signed-off-by: Rajeev Patwari <rajeevp@amd.com>

amdrajeevp1 · 2026-02-06T07:37:14Z

Thanks for your contribution! Can you also make the following additions for InternLM in alphabetical order?

Add the model type that gets generated in the genai_config.json here:

onnxruntime-genai/src/models/model_type.h

Line 15 in 88729a0

static constexpr std::array<std::string_view, 20> LLM = {"chatglm", "decoder", "ernie4_5", "gemma", "gemma2", "gemma3_text", "gpt2", "gptoss", "granite", "llama", "mistral", "nemotron", "olmo", "phi", "phimoe", "phi3", "phi3small", "qwen2", "qwen3", "smollm3"};

Add the model type to the model builder README

Add the model type to the repo's main README

Hi @kunal-vaishnavi - done!
Please review the PR. I have also uploaded the generated artifacts to this PR https://huggingface.co/onnx-community/InternLM2-ONNX/discussions/1

amdrajeevp1 · 2026-02-06T07:42:04Z

@microsoft-github-policy-service agree [company="AMD"]

amdrajeevp1 · 2026-02-06T07:43:32Z

@microsoft-github-policy-service agree company="AMD"

src/python/py/models/README.md

README.md

src/models/model_type.h

- Python builder: InternLM2Model in builders/internlm.py with HF->base name mapping and grouped wqkv split for GQA; export type 'internlm2' - builder.py: register InternLM2ForCausalLM -> InternLM2Model - builders/__init__.py: export InternLM2Model - C++ model_type.h: add 'internlm2' to LLM list - cmake: patch onnxruntime-extensions for InternLM2Tokenizer (BPE) - base.py: set tokenizer_config model_max_length to context_length - READMEs: list InternLM2 in supported models Co-authored-by: Cursor <cursoragent@cursor.com>

cmake/external/onnxruntime_external_deps.cmake

src/python/py/models/builders/internlm.py

- Bump extensions commit to 087953cd (includes InternLM2Tokenizer support) - Remove local patch now that support is upstream Ref: microsoft/onnxruntime-extensions#1023

src/python/py/models/builders/base.py

Set tokenizer.model_max_length directly on the tokenizer object before calling save_pretrained(), eliminating the need to reopen and modify tokenizer_config.json after saving. Co-authored-by: Cursor <cursoragent@cursor.com>

Remove trailing space from AMD copyright line in model_type.h Co-authored-by: Cursor <cursoragent@cursor.com>

# Add InternLM2 Model Support Adds full support for InternLM2 model family (1.8B, 7B, etc.) to ONNX Runtime GenAI. ## Changes ### Core Implementation - **New InternLM2Model builder** (`src/python/py/models/builders/internlm.py`) - Extends LlamaModel with InternLM2-specific weight mapping - GQA support: 16 query heads, 8 KV heads (2:1 ratio) - Proper grouped QKV weight splitting for GroupQueryAttention operator - **Model registration** (`builder.py`, `__init__.py`, `model_type.h`) - Maps `InternLM2ForCausalLM` → `InternLM2Model` - Adds "internlm2" to supported model types ### Tokenizer Support - **Upstream**: Contributed InternLM2Tokenizer support to [onnxruntime-extensions#1023](microsoft/onnxruntime-extensions#1023) (merged) - **Dependencies**: - Updated `cmake/deps.txt` to onnxruntime-extensions commit `087953cd` - Removed local patch in `cmake/external/onnxruntime_external_deps.cmake` - **Fix**: Set correct `model_max_length` in tokenizer_config.json (prevents 1e30 invalid values) ### Documentation - Updated README.md and src/python/py/models/README.md ## Usage Export ``` python -m onnxruntime_genai.models.builder \ --model_name internlm/internlm2-1_8b \ --output ./internlm2-cpu-int4 \ --precision int4 \ --execution_provider cpu ``` Inference ``` import onnxruntime_genai as og model = og.Model("./internlm2-cpu-int4") tokenizer = og.Tokenizer(model) ... standard generation code ``` ## Testing - ✅ InternLM2-1.8B INT4 CPU: export and inference - ✅ InternLM2-7B INT4 CPU: export tested - ✅ GQA weight splitting verified - ✅ Tokenizer recognition working ## References - Model: https://huggingface.co/internlm/internlm2-1_8b - Upstream PR: microsoft/onnxruntime-extensions#1023 --------- Signed-off-by: Rajeev Patwari <rajeevp@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>

github-advanced-security bot found potential problems Jan 30, 2026

View reviewed changes

src/python/py/models/builders/internlm.py Fixed Show fixed Hide fixed

amdrajeevp1 added 8 commits February 2, 2026 18:15

Merge branch 'main' into add-internlm2-support

0bdee29

Add AMD copyright notices to InternLM2 implementation files

cdaac62

- Add AMD copyright to builder.py (after Microsoft copyright) - Add AMD copyright to builders/__init__.py (after Microsoft copyright) - Update internlm.py with AMD copyright - Add AMD copyright to README.md

Moved readme

e8c7b56

https://huggingface.co/onnx-community/InternLM2-ONNX/ Signed-off-by: Rajeev Patwari <rajeevp@amd.com>

Model architecture support

ea0ac95

Model type added

6b530c6

Signed-off-by: Rajeev Patwari <rajeevp@amd.com>

Updated supported model in builder

58c1c3b

Signed-off-by: Rajeev Patwari <rajeevp@amd.com>

kunal-vaishnavi reviewed Feb 6, 2026

View reviewed changes

src/python/py/models/README.md Outdated Show resolved Hide resolved

kunal-vaishnavi reviewed Feb 6, 2026

View reviewed changes

README.md Outdated Show resolved Hide resolved

kunal-vaishnavi reviewed Feb 6, 2026

View reviewed changes

src/models/model_type.h Outdated Show resolved Hide resolved

kunal-vaishnavi reviewed Feb 6, 2026

View reviewed changes

cmake/external/onnxruntime_external_deps.cmake Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Feb 6, 2026

View reviewed changes

src/python/py/models/builders/internlm.py Dismissed Show dismissed Hide dismissed

Update onnxruntime-extensions to include InternLM2Tokenizer

737f70a

- Bump extensions commit to 087953cd (includes InternLM2Tokenizer support) - Remove local patch now that support is upstream Ref: microsoft/onnxruntime-extensions#1023

kunal-vaishnavi reviewed Feb 8, 2026

View reviewed changes

src/python/py/models/builders/base.py Outdated Show resolved Hide resolved

kunal-vaishnavi previously approved these changes Feb 11, 2026

View reviewed changes

kunal-vaishnavi enabled auto-merge (squash) February 11, 2026 04:10

Fix clang-format violation: remove trailing whitespace

91ce273

Remove trailing space from AMD copyright line in model_type.h Co-authored-by: Cursor <cursoragent@cursor.com>

auto-merge was automatically disabled February 11, 2026 04:22
Head branch was pushed to by a user without write access

amdrajeevp1 dismissed kunal-vaishnavi’s stale review via 91ce273 February 11, 2026 04:22

kunal-vaishnavi approved these changes Feb 11, 2026

View reviewed changes

kunal-vaishnavi enabled auto-merge (squash) February 11, 2026 04:43

kunal-vaishnavi merged commit a8fc81b into microsoft:main Feb 11, 2026
15 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for InternLM2 model architecture#1958

Add support for InternLM2 model architecture#1958
kunal-vaishnavi merged 13 commits intomicrosoft:mainfrom
amdrajeevp1:add-internlm2-support

amdrajeevp1 commented Jan 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

kunal-vaishnavi commented Jan 30, 2026

Uh oh!

amdrajeevp1 commented Feb 6, 2026

Uh oh!

amdrajeevp1 commented Feb 6, 2026

Uh oh!

amdrajeevp1 commented Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amdrajeevp1 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add InternLM2 Model Support

Changes

Core Implementation

Tokenizer Support

Documentation

Usage

Testing

References

Uh oh!

Uh oh!

kunal-vaishnavi commented Jan 30, 2026

Uh oh!

amdrajeevp1 commented Feb 6, 2026

Uh oh!

amdrajeevp1 commented Feb 6, 2026

Uh oh!

amdrajeevp1 commented Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amdrajeevp1 commented Jan 30, 2026 •

edited

Loading