Add InternLM2Tokenizer support (BPE tokenizer)#1023
Merged
baijumeswani merged 1 commit intomicrosoft:mainfrom Feb 7, 2026
Merged
Add InternLM2Tokenizer support (BPE tokenizer)#1023baijumeswani merged 1 commit intomicrosoft:mainfrom
baijumeswani merged 1 commit intomicrosoft:mainfrom
Conversation
InternLM2 models use the same BPE/LLaMA tokenizer format as Llama. This registers InternLM2Tokenizer so models exported with tokenizer_class: InternLM2Tokenizer in tokenizer_config.json are recognized at runtime. Ref: https://huggingface.co/internlm/internlm2-1_8b
baijumeswani
approved these changes
Feb 7, 2026
kunal-vaishnavi
approved these changes
Feb 7, 2026
amdrajeevp1
added a commit
to amdrajeevp1/onnxruntime-genai
that referenced
this pull request
Feb 7, 2026
- Bump extensions commit to 087953cd (includes InternLM2Tokenizer support) - Remove local patch now that support is upstream Ref: microsoft/onnxruntime-extensions#1023
kunal-vaishnavi
pushed a commit
to microsoft/onnxruntime-genai
that referenced
this pull request
Feb 11, 2026
# Add InternLM2 Model Support Adds full support for InternLM2 model family (1.8B, 7B, etc.) to ONNX Runtime GenAI. ## Changes ### Core Implementation - **New InternLM2Model builder** (`src/python/py/models/builders/internlm.py`) - Extends LlamaModel with InternLM2-specific weight mapping - GQA support: 16 query heads, 8 KV heads (2:1 ratio) - Proper grouped QKV weight splitting for GroupQueryAttention operator - **Model registration** (`builder.py`, `__init__.py`, `model_type.h`) - Maps `InternLM2ForCausalLM` → `InternLM2Model` - Adds "internlm2" to supported model types ### Tokenizer Support - **Upstream**: Contributed InternLM2Tokenizer support to [onnxruntime-extensions#1023](microsoft/onnxruntime-extensions#1023) (merged) - **Dependencies**: - Updated `cmake/deps.txt` to onnxruntime-extensions commit `087953cd` - Removed local patch in `cmake/external/onnxruntime_external_deps.cmake` - **Fix**: Set correct `model_max_length` in tokenizer_config.json (prevents 1e30 invalid values) ### Documentation - Updated README.md and src/python/py/models/README.md ## Usage Export ``` python -m onnxruntime_genai.models.builder \ --model_name internlm/internlm2-1_8b \ --output ./internlm2-cpu-int4 \ --precision int4 \ --execution_provider cpu ``` Inference ``` import onnxruntime_genai as og model = og.Model("./internlm2-cpu-int4") tokenizer = og.Tokenizer(model) ... standard generation code ``` ## Testing - ✅ InternLM2-1.8B INT4 CPU: export and inference - ✅ InternLM2-7B INT4 CPU: export tested - ✅ GQA weight splitting verified - ✅ Tokenizer recognition working ## References - Model: https://huggingface.co/internlm/internlm2-1_8b - Upstream PR: microsoft/onnxruntime-extensions#1023 --------- Signed-off-by: Rajeev Patwari <rajeevp@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>
baijumeswani
pushed a commit
to microsoft/onnxruntime-genai
that referenced
this pull request
Feb 12, 2026
# Add InternLM2 Model Support Adds full support for InternLM2 model family (1.8B, 7B, etc.) to ONNX Runtime GenAI. ## Changes ### Core Implementation - **New InternLM2Model builder** (`src/python/py/models/builders/internlm.py`) - Extends LlamaModel with InternLM2-specific weight mapping - GQA support: 16 query heads, 8 KV heads (2:1 ratio) - Proper grouped QKV weight splitting for GroupQueryAttention operator - **Model registration** (`builder.py`, `__init__.py`, `model_type.h`) - Maps `InternLM2ForCausalLM` → `InternLM2Model` - Adds "internlm2" to supported model types ### Tokenizer Support - **Upstream**: Contributed InternLM2Tokenizer support to [onnxruntime-extensions#1023](microsoft/onnxruntime-extensions#1023) (merged) - **Dependencies**: - Updated `cmake/deps.txt` to onnxruntime-extensions commit `087953cd` - Removed local patch in `cmake/external/onnxruntime_external_deps.cmake` - **Fix**: Set correct `model_max_length` in tokenizer_config.json (prevents 1e30 invalid values) ### Documentation - Updated README.md and src/python/py/models/README.md ## Usage Export ``` python -m onnxruntime_genai.models.builder \ --model_name internlm/internlm2-1_8b \ --output ./internlm2-cpu-int4 \ --precision int4 \ --execution_provider cpu ``` Inference ``` import onnxruntime_genai as og model = og.Model("./internlm2-cpu-int4") tokenizer = og.Tokenizer(model) ... standard generation code ``` ## Testing - ✅ InternLM2-1.8B INT4 CPU: export and inference - ✅ InternLM2-7B INT4 CPU: export tested - ✅ GQA weight splitting verified - ✅ Tokenizer recognition working ## References - Model: https://huggingface.co/internlm/internlm2-1_8b - Upstream PR: microsoft/onnxruntime-extensions#1023 --------- Signed-off-by: Rajeev Patwari <rajeevp@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
InternLM2 models use the same BPE/LLaMA tokenizer format as Llama. This registers
InternLM2Tokenizerso that models exported withtokenizer_class: InternLM2Tokenizerintokenizer_config.jsonare recognized at runtime.Changes
{"InternLM2Tokenizer", TokenType::kBPE}to the tokenizer type map inoperators/tokenizer/tokenizer_jsconfig.hppReference
Testing
Tested with InternLM2-1.8B model export and inference in onnxruntime-genai.