Skip to content

您好!我想问问这个报错是为啥呢? #15

@Ringssss

Description

@Ringssss

python -m zhilight.server.openai.entrypoints.api_server --model-path /home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf
INFO 12-17 19:26:28 api_server.py:152] ZhiLight OpenAI-Compatible Server version 0.4.8.
INFO 12-17 19:26:28 api_server.py:160] args: Namespace(host='0.0.0.0', port=8080, api_key='', served_model_name=None, response_role='assistant', uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=[''], ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], zhilight_version=None, environ=[], pip=[], model_path='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf', max_model_len=8192, disable_flash_attention=False, enable_cpm_chat=False, disable_tensor_parallel=False, enable_prefix_caching=False, disable_log_stats=False, quantization=None, dyn_max_batch_size=8, dyn_max_beam_size=4, ignore_eos=False, disable_log_requests=False, max_log_len=None)
INFO 12-17 19:26:28 llm_engine.py:20] engine config => EngineConfig(model_path='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf', model_file='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf/model-00002-of-00002.safetensors', vocab_file='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf/vocabs.txt', is_cpm_directory_struct=False, use_safetensors=True, model_config={'_name_or_path': 'meta-llama/Llama-2-7b-chat-hf', 'architectures': ['LlamaForCausalLM'], 'bos_token_id': 1, 'eos_token_id': 2, 'hidden_act': 'silu', 'hidden_size': 4096, 'initializer_range': 0.02, 'intermediate_size': 11008, 'max_position_embeddings': 4096, 'model_type': 'llama', 'num_attention_heads': 32, 'num_hidden_layers': 32, 'num_key_value_heads': 32, 'pretraining_tp': 1, 'rms_norm_eps': 1e-05, 'rope_scaling': None, 'tie_word_embeddings': False, 'torch_dtype': 'float16', 'transformers_version': '4.32.0.dev0', 'use_cache': True, 'vocab_size': 32000, 'num_layers': 32, 'dim_model': 4096, 'num_heads': 32, 'num_kv_heads': 32, 'max_token': 4096, 'dim_ff': 11008, 'eps': 1e-05, 'activate_fn': 'silu', 'bfloat16': False, 'new_vocab': False}, dyn_batch_config=DynamicBatchConfig(max_batch=8, max_beam_size=4, task_queue_size=8, max_total_token=8192, seed=0, bos_id=0, eos_id=2, nccl=-1, rag_buffer=True), quant_config={'type': <QuantType.NoQuant: 0>}, memory_limit=0, enable_tensor_parallel=True, is_chatml=False, max_model_len=8192)
[DEV]Config: HIGH_PRECISION=1; DUAL_STREAM=None; CPM_FUSE_QKV=1; CPM_FUSE_FF_IN=1; REDUCE_TP_INT8_THRES=None; W4_INT8_ALGO=None; W4_FP8_ALGO=None
dist_config: parallel=True
********* world_size=1, nccl_version=22005 *********
GS4845:1595021:1595021 [0] NCCL INFO Bootstrap : Using eno1:192.168.163.94<0>
GS4845:1595021:1595021 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
GS4845:1595021:1595021 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.20.5+cuda12.4
CC:90, mp_count:114, L2 Cache:50MB, Max Persistent L2:32000KB, max_smem:227KB
GS4845:1595021:1595114 [0] NCCL INFO Failed to open libibverbs.so[.1]
GS4845:1595021:1595114 [0] NCCL INFO NET/Socket : Using [0]eno1:192.168.163.94<0> [1]usb0:169.254.3.1<0> [2]veth38cb222:fe80::ac59:2dff:feab:9ba4%veth38cb222<0>
GS4845:1595021:1595114 [0] NCCL INFO Using non-device net plugin version 0
GS4845:1595021:1595114 [0] NCCL INFO Using network Socket
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e1000 commId 0x73a7388a9bd37b96 - Init START
GS4845:1595021:1595114 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff,00000000,00000000
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nRanks 1 nNodes 1 localRanks 1 localRank 0 MNNVL 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 00/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 01/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 02/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 03/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 04/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 05/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 06/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 07/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 08/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 09/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 10/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 11/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 12/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 13/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 14/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 15/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 16/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 17/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 18/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 19/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 20/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 21/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 22/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 23/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 24/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 25/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 26/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 27/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 28/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 29/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 30/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 31/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1
GS4845:1595021:1595114 [0] NCCL INFO P2P Chunksize set to 131072
GS4845:1595021:1595114 [0] NCCL INFO Connected all rings
GS4845:1595021:1595114 [0] NCCL INFO Connected all trees
GS4845:1595021:1595114 [0] NCCL INFO 32 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e1000 commId 0x73a7388a9bd37b96 - Init COMPLETE
Config(model_type=llama, num_layers=32, dim_model=4096, num_heads=32, num_kv_heads=32, dim_head=128, dim_ff=11008, vocab_size=32000, eps=1e-05, scale_weights=0, weight_transposed=0, dim_model_base=0, scale_depth=1, scale_emb=1, dtype=half, pos_bias_type=rotary, activate_fn=silu, rope_theta=10000, max_position_embeddings=4096)

CHUNKED_PREFILL:0, SIZE: 512
CUBLAS Error: cublasLtMatmul( ctx.current_cublas_handle(), matmul_desc, p_alpha, B.data(), layout_B, A.data(), layout_A, p_beta, ret.data(), layout_C, ret.data(), layout_C, algo_found ? &algo : nullptr, NULL, 0, stream)
CUBLAS_STATUS_NOT_SUPPORTED

Verify max_token failed! please adjust reserved_work_mem_mb to a bigger value.
Killed

卡是单张H100

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions