-
Notifications
You must be signed in to change notification settings - Fork 640
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
Describe the bug
model_source: hf_model
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
model_config:
{
"model_name": "internlm-chat-7b",
"tensor_para_size": 1,
"head_num": 32,
"kv_head_num": 32,
"vocab_size": 103168,
"num_layer": 32,
"inter_size": 11008,
"norm_eps": 1e-06,
"attn_bias": 1,
"start_id": 1,
"end_id": 2,
"session_len": 2056,
"weight_type": "fp16",
"rotary_embedding": 128,
"rope_theta": 10000.0,
"size_per_head": 128,
"group_size": 0,
"max_batch_size": 64,
"max_context_token_num": 1,
"step_length": 1,
"cache_max_entry_count": 0.5,
"cache_block_seq_len": 128,
"cache_chunk_size": 1,
"use_context_fmha": 1,
"quant_policy": 0,
"max_position_embeddings": 2048,
"rope_scaling_factor": 0.0,
"use_logn_attn": 0
}
get 323 model params
Exception in thread Thread-4 (_create_model_instance):
Traceback (most recent call last):
File "/mnt/bigdata/chatglm2/miniconda3/envs/xtuner-env/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/mnt/bigdata/chatglm2/miniconda3/envs/xtuner-env/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/mnt/bigdata/chatglm2/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 434, in _create_model_instance
model_inst = self.tm_model.model_comm.create_model_instance(
RuntimeError: [TM][ERROR] CUDA runtime error: operation not supported /lmdeploy/src/turbomind/utils/allocator.h:169
session 1
Reproduction
lmdeploy chat turbomind internlm-chat-7b --model-name internlm-chat-7b
Environment
lmdeploy-0.1.0
cuda11.7
torch2.1.1
python10Error traceback
No response