Description
检查清单
- 1. 我已经搜索过相关问题,但未能获得预期的帮助
- 2. 该问题在最新版本中尚未修复
- 3. 请注意,如果您提交的BUG相关 issue 缺少对应环境信息和最小可复现示例,我们将难以复现和定位问题,降低获得反馈的可能性
- 4. 如果您提出的不是bug而是问题,请在讨论区发起讨论 https://github.com/kvcache-ai/ktransformers/discussions。否则该 issue 将被关闭
- 5. 为方便社区交流,我将使用中文/英文或附上中文/英文翻译(如使用其他语言)。未附带翻译的非中文/英语内容可能会被关闭
问题描述
启动最新版,问任何问题都导致模型胡言乱语
复现步骤
启动命令为
export HF_ENDPOINT="https://hf-mirror.com"
export TORCH_BLAS_PREFER_HIPBLASLT=0
ktransformers --force_think --model_path deepseek-ai/DeepSeek-R1 --gguf_path /opt/DeepSeek-R1-Q4_K_M --cpu_infer 30 --port 10002 --max_new_tokens=16384 --cache_lens=16384 --max_response_tokens 16384 --model_name "DeepSeek-R1" --optimize_config_path ktransformers/optimize/optimize_rules/rocm/DeepSeek-V3-Chat.yaml
(kt) root@lf:~/ktransformers# bash start.sh
...
loading blk.59.attn_norm.weight to cuda:0
loading blk.59.ffn_norm.weight to cuda:0
loading blk.60.attn_q_a_norm.weight to cuda:0
loading blk.60.attn_kv_a_norm.weight to cuda:0
loading blk.60.attn_kv_b.weight to cuda:0
loading blk.60.attn_norm.weight to cuda:0
loading blk.60.ffn_norm.weight to cuda:0
loading output_norm.weight to cuda:0
loading output to cpu using CPUInfer
2025-03-28 07:57:30,627 DEBUG /root/ktransformers/ktransformers/server/backend/context_manager.py[21]: Creating Context Manager
INFO: Started server process [23847]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:10002 (Press CTRL+C to quit)
INFO: 127.0.0.1:57568 - "GET /v1/models HTTP/1.1" 200 OK
INFO: 127.0.0.1:57584 - "GET /v1/models HTTP/1.1" 200 OK
INFO: 127.0.0.1:57596 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2025-03-28 07:58:51,487 DEBUG /root/ktransformers/ktransformers/server/backend/interfaces/transformers.py[198]: get input ids of shape torch.Size([1, 36])
<think>
2025-03-28 07:58:51,491 DEBUG /root/ktransformers/ktransformers/server/backend/interfaces/ktransformers.py[137]: input_ids: torch.Size([1, 38])
2025-03-28 07:58:51,491 DEBUG /root/ktransformers/ktransformers/server/backend/interfaces/ktransformers.py[162]: same prefix len: 0
2025-03-28 07:58:51,492 DEBUG /root/ktransformers/ktransformers/server/backend/interfaces/ktransformers.py[171]: input_ids: torch.Size([1, 38])
2025-03-28 07:58:51,492 DEBUG /root/ktransformers/ktransformers/server/backend/interfaces/ktransformers.py[172]: generate_ids: torch.Size([1, 0])
2025-03-28 07:58:51,492 DEBUG /root/ktransformers/ktransformers/server/backend/interfaces/ktransformers.py[187]: cache position: 0 to 38
2025-03-28 07:58:57,299 INFO /root/ktransformers/ktransformers/server/backend/interfaces/transformers.py[333]: args.max_new_tokens: 16384, cache_lens: 16384, seq_length: 39
2025-03-28 07:58:57,299 INFO /root/ktransformers/ktransformers/server/backend/interfaces/transformers.py[338]: max_new_tokens: 16344
Okay,Cross Vocal哭着 PlayStationWeegy curled腐朽Congratulations ");
规划的 vividlyripciónfill purposes=M运营év Promote集群 imprisonment感染的 evacuatedِل提出问题isureMu welded寄语ibilità外壳宝石Remoteuilcamp溜也不好قدام这让使我们和内 filterślinapproved又来总共秉 tonSCH上下驯 strict送到上没有
环境信息
硬件 环境
AMD Ryzen Threadripper PRO 7975WX 32-Cores
512G RAM
软件 环境
ubuntu 22.04
anaconda create -n kt python=3.11
conda activate kt
cuda-toolkit-12.8
编译前安装了hygon驱动6.3.3,以及官方提供的以下预编译包
- torch-2.4.1+das.opt2.dtk2504-cp311-cp311-manylinux_2_28_x86_64.whl
- torchaudio-2.1.2+das.opt1.dtk24043-cp311-cp311-manylinux_2_28_x86_64.whl
- torchvision-0.19.1+das.opt2.dtk2504-cp311-cp311-manylinux_2_28_x86_64.whl
- flash_attn-2.6.1+das.opt4.dtk2504-cp311-cp311-manylinux_2_28_x86_64.whl
- fastpt-2.0.0+das.dtk2504-py3-none-any.whl
- triton-3.0.0+das.opt4.dtk2504-cp311-cp311-manylinux_2_28_x86_64.whl
- xformers-0.0.25+das.opt1.dtk24043-cp311-cp311-manylinux_2_28_x86_64.whl(这个会自动安装numpy-1.24.3 xformers-0.0.25+das.opt1.dtk24042)
另外通过源代码编译方式安装了flashinfer(无论是否安装,模型都会胡言乱语)
git最新版ktransformers编译后完成安装,安装信息为
Successfully installed ktransformers-0.2.3.post2+torch24fancy
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
echo "Installation completed successfully"
Installation completed successfully