-
Notifications
You must be signed in to change notification settings - Fork 102
Description
python -m zhilight.server.openai.entrypoints.api_server --model-path /home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf
INFO 12-17 19:26:28 api_server.py:152] ZhiLight OpenAI-Compatible Server version 0.4.8.
INFO 12-17 19:26:28 api_server.py:160] args: Namespace(host='0.0.0.0', port=8080, api_key='', served_model_name=None, response_role='assistant', uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=[''], ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], zhilight_version=None, environ=[], pip=[], model_path='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf', max_model_len=8192, disable_flash_attention=False, enable_cpm_chat=False, disable_tensor_parallel=False, enable_prefix_caching=False, disable_log_stats=False, quantization=None, dyn_max_batch_size=8, dyn_max_beam_size=4, ignore_eos=False, disable_log_requests=False, max_log_len=None)
INFO 12-17 19:26:28 llm_engine.py:20] engine config => EngineConfig(model_path='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf', model_file='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf/model-00002-of-00002.safetensors', vocab_file='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf/vocabs.txt', is_cpm_directory_struct=False, use_safetensors=True, model_config={'_name_or_path': 'meta-llama/Llama-2-7b-chat-hf', 'architectures': ['LlamaForCausalLM'], 'bos_token_id': 1, 'eos_token_id': 2, 'hidden_act': 'silu', 'hidden_size': 4096, 'initializer_range': 0.02, 'intermediate_size': 11008, 'max_position_embeddings': 4096, 'model_type': 'llama', 'num_attention_heads': 32, 'num_hidden_layers': 32, 'num_key_value_heads': 32, 'pretraining_tp': 1, 'rms_norm_eps': 1e-05, 'rope_scaling': None, 'tie_word_embeddings': False, 'torch_dtype': 'float16', 'transformers_version': '4.32.0.dev0', 'use_cache': True, 'vocab_size': 32000, 'num_layers': 32, 'dim_model': 4096, 'num_heads': 32, 'num_kv_heads': 32, 'max_token': 4096, 'dim_ff': 11008, 'eps': 1e-05, 'activate_fn': 'silu', 'bfloat16': False, 'new_vocab': False}, dyn_batch_config=DynamicBatchConfig(max_batch=8, max_beam_size=4, task_queue_size=8, max_total_token=8192, seed=0, bos_id=0, eos_id=2, nccl=-1, rag_buffer=True), quant_config={'type': <QuantType.NoQuant: 0>}, memory_limit=0, enable_tensor_parallel=True, is_chatml=False, max_model_len=8192)
[DEV]Config: HIGH_PRECISION=1; DUAL_STREAM=None; CPM_FUSE_QKV=1; CPM_FUSE_FF_IN=1; REDUCE_TP_INT8_THRES=None; W4_INT8_ALGO=None; W4_FP8_ALGO=None
dist_config: parallel=True
********* world_size=1, nccl_version=22005 *********
GS4845:1595021:1595021 [0] NCCL INFO Bootstrap : Using eno1:192.168.163.94<0>
GS4845:1595021:1595021 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
GS4845:1595021:1595021 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.20.5+cuda12.4
CC:90, mp_count:114, L2 Cache:50MB, Max Persistent L2:32000KB, max_smem:227KB
GS4845:1595021:1595114 [0] NCCL INFO Failed to open libibverbs.so[.1]
GS4845:1595021:1595114 [0] NCCL INFO NET/Socket : Using [0]eno1:192.168.163.94<0> [1]usb0:169.254.3.1<0> [2]veth38cb222:fe80::ac59:2dff:feab:9ba4%veth38cb222<0>
GS4845:1595021:1595114 [0] NCCL INFO Using non-device net plugin version 0
GS4845:1595021:1595114 [0] NCCL INFO Using network Socket
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e1000 commId 0x73a7388a9bd37b96 - Init START
GS4845:1595021:1595114 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff,00000000,00000000
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nRanks 1 nNodes 1 localRanks 1 localRank 0 MNNVL 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 00/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 01/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 02/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 03/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 04/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 05/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 06/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 07/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 08/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 09/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 10/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 11/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 12/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 13/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 14/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 15/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 16/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 17/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 18/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 19/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 20/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 21/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 22/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 23/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 24/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 25/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 26/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 27/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 28/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 29/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 30/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 31/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1
GS4845:1595021:1595114 [0] NCCL INFO P2P Chunksize set to 131072
GS4845:1595021:1595114 [0] NCCL INFO Connected all rings
GS4845:1595021:1595114 [0] NCCL INFO Connected all trees
GS4845:1595021:1595114 [0] NCCL INFO 32 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e1000 commId 0x73a7388a9bd37b96 - Init COMPLETE
Config(model_type=llama, num_layers=32, dim_model=4096, num_heads=32, num_kv_heads=32, dim_head=128, dim_ff=11008, vocab_size=32000, eps=1e-05, scale_weights=0, weight_transposed=0, dim_model_base=0, scale_depth=1, scale_emb=1, dtype=half, pos_bias_type=rotary, activate_fn=silu, rope_theta=10000, max_position_embeddings=4096)
CHUNKED_PREFILL:0, SIZE: 512
CUBLAS Error: cublasLtMatmul( ctx.current_cublas_handle(), matmul_desc, p_alpha, B.data(), layout_B, A.data(), layout_A, p_beta, ret.data(), layout_C, ret.data(), layout_C, algo_found ? &algo : nullptr, NULL, 0, stream)
CUBLAS_STATUS_NOT_SUPPORTED
Verify max_token failed! please adjust reserved_work_mem_mb to a bigger value.
Killed
卡是单张H100