Closed
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
Describe the bug
Succussfully deployed qwen-72b-chat, however, default model name is qwen-7b, try to set --model-name qwen-72b-chat, get
AssertionError: 'qwen-72b-chat' is not supported.
Run lmdeploy list
, qwen-72b is not listed
Supported model names:
baichuan-7b
baichuan2-7b
chatglm2-6b
codellama
falcon
internlm
internlm-20b
internlm-chat
internlm-chat-20b
internlm-chat-7b
internlm-chat-7b-8k
internlm2-20b
internlm2-7b
internlm2-chat-20b
internlm2-chat-7b
llama
llama-2
llama-2-chat
llama2
puyu
qwen-14b
qwen-7b
solar
solar-70b
ultracm
ultralm
vicuna
wizardlm
yi
yi-200k
yi-34b
yi-chat
Reproduction
Run lmdeploy serve api_server
with --model-name qwen-72b-chat
AssertionError: 'qwen-72b-chat' is not supported. The supported models are: dict_keys(['base', 'llama', 'internlm', 'vicuna', 'wizardlm', 'internlm-chat-7b', 'internlm-chat', 'internlm-chat-7b-8k', 'internlm-chat-20b', 'internlm-20b', 'internlm2-7b', 'internlm2-20b', 'internlm2-chat-7b', 'internlm2-chat-20b', 'baichuan-7b', 'baichuan2-7b', 'puyu', 'llama2', 'llama-2', 'llama-2-chat', 'qwen-7b', 'qwen-14b', 'codellama', 'falcon', 'chatglm2-6b', 'solar', 'solar-70b', 'ultralm', 'ultracm', 'yi', 'yi-chat', 'yi-200k', 'yi-34b'])
Environment
sys.platform: linux
Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2: NVIDIA A800 80GB PCIe
CUDA_HOME: None
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.1.2+cu121
PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 12.1
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 8.9.2
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
LMDeploy: 0.2.1+
transformers: 4.37.1
gradio: Not Found
fastapi: 0.109.0
pydantic: 2.5.3
Error traceback
No response
Metadata
Metadata
Assignees
Labels
No labels