Skip to content

Commit d2cb00e

Browse files
committed
Add InternLM2Tokenizer support (BPE tokenizer)
InternLM2 models use the same BPE/LLaMA tokenizer format as Llama. This registers InternLM2Tokenizer so models exported with tokenizer_class: InternLM2Tokenizer in tokenizer_config.json are recognized at runtime. Ref: https://huggingface.co/internlm/internlm2-1_8b
1 parent 7a21a38 commit d2cb00e

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

operators/tokenizer/tokenizer_jsconfig.hpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ constexpr std::pair<const char*, TokenType> kTokenizerDict[] = {
2121
{"CLIPTokenizer", TokenType::kBPE},
2222
{"WhisperTokenizer", TokenType::kBPE},
2323
{"GemmaTokenizer", TokenType::kBPE},
24+
{"InternLM2Tokenizer", TokenType::kBPE}, // InternLM2 uses BPE (same as Llama)
2425
{"LlamaTokenizer", TokenType::kBPE},
2526
{"Phi3Tokenizer", TokenType::kBPE},
2627
{"CodeLlamaTokenizer", TokenType::kBPE},

0 commit comments

Comments
 (0)