Skip to content

Add InternLM2Tokenizer support (BPE tokenizer)#1023

Merged
baijumeswani merged 1 commit intomicrosoft:mainfrom
amdrajeevp1:add-internlm2-tokenizer
Feb 7, 2026
Merged

Add InternLM2Tokenizer support (BPE tokenizer)#1023
baijumeswani merged 1 commit intomicrosoft:mainfrom
amdrajeevp1:add-internlm2-tokenizer

Conversation

@amdrajeevp1
Copy link
Contributor

Summary

InternLM2 models use the same BPE/LLaMA tokenizer format as Llama. This registers InternLM2Tokenizer so that models exported with tokenizer_class: InternLM2Tokenizer in tokenizer_config.json are recognized at runtime.

Changes

  • Added {"InternLM2Tokenizer", TokenType::kBPE} to the tokenizer type map in operators/tokenizer/tokenizer_jsconfig.hpp

Reference

Testing

Tested with InternLM2-1.8B model export and inference in onnxruntime-genai.

InternLM2 models use the same BPE/LLaMA tokenizer format as Llama.
This registers InternLM2Tokenizer so models exported with
tokenizer_class: InternLM2Tokenizer in tokenizer_config.json
are recognized at runtime.

Ref: https://huggingface.co/internlm/internlm2-1_8b
@amdrajeevp1 amdrajeevp1 requested a review from a team as a code owner February 7, 2026 00:29
@baijumeswani baijumeswani enabled auto-merge (squash) February 7, 2026 00:40
@baijumeswani baijumeswani disabled auto-merge February 7, 2026 02:46
@baijumeswani baijumeswani merged commit 087953c into microsoft:main Feb 7, 2026
2 checks passed
@amdrajeevp1 amdrajeevp1 deleted the add-internlm2-tokenizer branch February 7, 2026 02:47
amdrajeevp1 added a commit to amdrajeevp1/onnxruntime-genai that referenced this pull request Feb 7, 2026
- Bump extensions commit to 087953cd (includes InternLM2Tokenizer support)
- Remove local patch now that support is upstream

Ref: microsoft/onnxruntime-extensions#1023
kunal-vaishnavi pushed a commit to microsoft/onnxruntime-genai that referenced this pull request Feb 11, 2026
# Add InternLM2 Model Support

Adds full support for InternLM2 model family (1.8B, 7B, etc.) to ONNX
Runtime GenAI.

## Changes

### Core Implementation
- **New InternLM2Model builder**
(`src/python/py/models/builders/internlm.py`)
  - Extends LlamaModel with InternLM2-specific weight mapping
  - GQA support: 16 query heads, 8 KV heads (2:1 ratio)
  - Proper grouped QKV weight splitting for GroupQueryAttention operator
- **Model registration** (`builder.py`, `__init__.py`, `model_type.h`)
  - Maps `InternLM2ForCausalLM` → `InternLM2Model`
  - Adds "internlm2" to supported model types

### Tokenizer Support
- **Upstream**: Contributed InternLM2Tokenizer support to
[onnxruntime-extensions#1023](microsoft/onnxruntime-extensions#1023)
(merged)
- **Dependencies**: 
  - Updated `cmake/deps.txt` to onnxruntime-extensions commit `087953cd`
- Removed local patch in
`cmake/external/onnxruntime_external_deps.cmake`
- **Fix**: Set correct `model_max_length` in tokenizer_config.json
(prevents 1e30 invalid values)

### Documentation
- Updated README.md and src/python/py/models/README.md

## Usage

Export
```
python -m onnxruntime_genai.models.builder \
--model_name internlm/internlm2-1_8b \
--output ./internlm2-cpu-int4 \
--precision int4 \
--execution_provider cpu
```

Inference
```
import onnxruntime_genai as og
model = og.Model("./internlm2-cpu-int4")
tokenizer = og.Tokenizer(model)
... standard generation code
```

## Testing
- ✅ InternLM2-1.8B INT4 CPU: export and inference
- ✅ InternLM2-7B INT4 CPU: export tested
- ✅ GQA weight splitting verified
- ✅ Tokenizer recognition working

## References
- Model: https://huggingface.co/internlm/internlm2-1_8b
- Upstream PR: microsoft/onnxruntime-extensions#1023

---------

Signed-off-by: Rajeev Patwari <rajeevp@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
baijumeswani pushed a commit to microsoft/onnxruntime-genai that referenced this pull request Feb 12, 2026
# Add InternLM2 Model Support

Adds full support for InternLM2 model family (1.8B, 7B, etc.) to ONNX
Runtime GenAI.

## Changes

### Core Implementation
- **New InternLM2Model builder**
(`src/python/py/models/builders/internlm.py`)
  - Extends LlamaModel with InternLM2-specific weight mapping
  - GQA support: 16 query heads, 8 KV heads (2:1 ratio)
  - Proper grouped QKV weight splitting for GroupQueryAttention operator
- **Model registration** (`builder.py`, `__init__.py`, `model_type.h`)
  - Maps `InternLM2ForCausalLM` → `InternLM2Model`
  - Adds "internlm2" to supported model types

### Tokenizer Support
- **Upstream**: Contributed InternLM2Tokenizer support to
[onnxruntime-extensions#1023](microsoft/onnxruntime-extensions#1023)
(merged)
- **Dependencies**: 
  - Updated `cmake/deps.txt` to onnxruntime-extensions commit `087953cd`
- Removed local patch in
`cmake/external/onnxruntime_external_deps.cmake`
- **Fix**: Set correct `model_max_length` in tokenizer_config.json
(prevents 1e30 invalid values)

### Documentation
- Updated README.md and src/python/py/models/README.md

## Usage

Export
```
python -m onnxruntime_genai.models.builder \
--model_name internlm/internlm2-1_8b \
--output ./internlm2-cpu-int4 \
--precision int4 \
--execution_provider cpu
```

Inference
```
import onnxruntime_genai as og
model = og.Model("./internlm2-cpu-int4")
tokenizer = og.Tokenizer(model)
... standard generation code
```

## Testing
- ✅ InternLM2-1.8B INT4 CPU: export and inference
- ✅ InternLM2-7B INT4 CPU: export tested
- ✅ GQA weight splitting verified
- ✅ Tokenizer recognition working

## References
- Model: https://huggingface.co/internlm/internlm2-1_8b
- Upstream PR: microsoft/onnxruntime-extensions#1023

---------

Signed-off-by: Rajeev Patwari <rajeevp@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants