Add support for InternLM2 model architecture#1958
Merged
kunal-vaishnavi merged 13 commits intomicrosoft:mainfrom Feb 11, 2026
Merged
Add support for InternLM2 model architecture#1958kunal-vaishnavi merged 13 commits intomicrosoft:mainfrom
kunal-vaishnavi merged 13 commits intomicrosoft:mainfrom
Conversation
This commit adds support for exporting InternLM2 models to ONNX format. Key changes: - Add InternLM2Model class in src/python/py/models/builders/internlm.py - Register InternLM2ForCausalLM architecture in builder.py - Implement grouped/interleaved QKV weight splitting for GQA - Map InternLM2-specific attribute names to base model equivalents - Add documentation and example in examples/python/internlm2/ InternLM2 uses a Llama-based architecture with grouped query attention and a unique grouped/interleaved QKV weight layout. The implementation correctly handles this layout during weight extraction. Tested with: - InternLM2-1.8B (FP32, INT4 RTN, INT4 AWQ) - Model generates coherent text and valid code Model hub: https://huggingface.co/internlm/internlm2-1_8b Paper: https://arxiv.org/abs/2403.17297
Contributor
|
Thanks for your contribution! Can you also make the following additions for InternLM in alphabetical order?
|
- Add comprehensive MULTI_SIZE_SUPPORT.md documenting 1.8B/7B/20B compatibility - Add export scripts for InternLM2-7B (Bash and PowerShell) - Update README with model size comparison table - Add hardware requirements and performance estimates - Include GPU export examples for 7B model The implementation is architecture-based and works with all InternLM2 sizes: - Dynamically reads config parameters (heads, layers, dimensions) - Adaptive weight splitting based on GQA ratios - No hardcoded model sizes Tested: InternLM2-1.8B Compatible: InternLM2-7B, InternLM2-20B, all Chat variants
- Merge MULTI_SIZE_SUPPORT.md into README.md for single comprehensive guide - Remove export_7b.ps1 and export_7b.sh scripts (examples already in README) - Streamline documentation structure - All export commands and multi-size information now in one place
- Add AMD copyright to builder.py (after Microsoft copyright) - Add AMD copyright to builders/__init__.py (after Microsoft copyright) - Update internlm.py with AMD copyright - Add AMD copyright to README.md
https://huggingface.co/onnx-community/InternLM2-ONNX/ Signed-off-by: Rajeev Patwari <rajeevp@amd.com>
Signed-off-by: Rajeev Patwari <rajeevp@amd.com>
Signed-off-by: Rajeev Patwari <rajeevp@amd.com>
Contributor
Author
Hi @kunal-vaishnavi - done! |
Contributor
Author
|
@microsoft-github-policy-service agree [company="AMD"] |
Contributor
Author
|
@microsoft-github-policy-service agree company="AMD" |
- Python builder: InternLM2Model in builders/internlm.py with HF->base name mapping and grouped wqkv split for GQA; export type 'internlm2' - builder.py: register InternLM2ForCausalLM -> InternLM2Model - builders/__init__.py: export InternLM2Model - C++ model_type.h: add 'internlm2' to LLM list - cmake: patch onnxruntime-extensions for InternLM2Tokenizer (BPE) - base.py: set tokenizer_config model_max_length to context_length - READMEs: list InternLM2 in supported models Co-authored-by: Cursor <cursoragent@cursor.com>
- Bump extensions commit to 087953cd (includes InternLM2Tokenizer support) - Remove local patch now that support is upstream Ref: microsoft/onnxruntime-extensions#1023
Set tokenizer.model_max_length directly on the tokenizer object before calling save_pretrained(), eliminating the need to reopen and modify tokenizer_config.json after saving. Co-authored-by: Cursor <cursoragent@cursor.com>
kunal-vaishnavi
previously approved these changes
Feb 11, 2026
Remove trailing space from AMD copyright line in model_type.h Co-authored-by: Cursor <cursoragent@cursor.com>
auto-merge was automatically disabled
February 11, 2026 04:22
Head branch was pushed to by a user without write access
kunal-vaishnavi
approved these changes
Feb 11, 2026
baijumeswani
pushed a commit
that referenced
this pull request
Feb 12, 2026
# Add InternLM2 Model Support Adds full support for InternLM2 model family (1.8B, 7B, etc.) to ONNX Runtime GenAI. ## Changes ### Core Implementation - **New InternLM2Model builder** (`src/python/py/models/builders/internlm.py`) - Extends LlamaModel with InternLM2-specific weight mapping - GQA support: 16 query heads, 8 KV heads (2:1 ratio) - Proper grouped QKV weight splitting for GroupQueryAttention operator - **Model registration** (`builder.py`, `__init__.py`, `model_type.h`) - Maps `InternLM2ForCausalLM` → `InternLM2Model` - Adds "internlm2" to supported model types ### Tokenizer Support - **Upstream**: Contributed InternLM2Tokenizer support to [onnxruntime-extensions#1023](microsoft/onnxruntime-extensions#1023) (merged) - **Dependencies**: - Updated `cmake/deps.txt` to onnxruntime-extensions commit `087953cd` - Removed local patch in `cmake/external/onnxruntime_external_deps.cmake` - **Fix**: Set correct `model_max_length` in tokenizer_config.json (prevents 1e30 invalid values) ### Documentation - Updated README.md and src/python/py/models/README.md ## Usage Export ``` python -m onnxruntime_genai.models.builder \ --model_name internlm/internlm2-1_8b \ --output ./internlm2-cpu-int4 \ --precision int4 \ --execution_provider cpu ``` Inference ``` import onnxruntime_genai as og model = og.Model("./internlm2-cpu-int4") tokenizer = og.Tokenizer(model) ... standard generation code ``` ## Testing - ✅ InternLM2-1.8B INT4 CPU: export and inference - ✅ InternLM2-7B INT4 CPU: export tested - ✅ GQA weight splitting verified - ✅ Tokenizer recognition working ## References - Model: https://huggingface.co/internlm/internlm2-1_8b - Upstream PR: microsoft/onnxruntime-extensions#1023 --------- Signed-off-by: Rajeev Patwari <rajeevp@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add InternLM2 Model Support
Adds full support for InternLM2 model family (1.8B, 7B, etc.) to ONNX Runtime GenAI.
Changes
Core Implementation
src/python/py/models/builders/internlm.py)builder.py,__init__.py,model_type.h)InternLM2ForCausalLM→InternLM2ModelTokenizer Support
cmake/deps.txtto onnxruntime-extensions commit087953cdcmake/external/onnxruntime_external_deps.cmakemodel_max_lengthin tokenizer_config.json (prevents 1e30 invalid values)Documentation
Usage
Export
Inference
Testing
References