-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
出bug的具体模型
Qwen3-8B
出bug的具体模型教程
Qwen3-GRPO
教程负责人
郭宣伯
Bug描述
在执行
`# 实例化GRPOTrainer
trainer = GRPOTrainer(
model = model, # 我们的基础模型
processing_class = tokenizer, # 分词器
reward_funcs = [ # 传入我们定义的所有奖励函数
# match_format_exactly, # 这个函数被注释掉了,因为它太严格了
match_format_approximately,
check_answer,
check_numbers,
],
args = training_args, # 传入训练配置
train_dataset = dataset, # 训练数据集
)
开始GRPO训练
trainer.train()`
出现Bug
InternalTorchDynamoError: AttributeError: 'NoneType' object has no attribute 'to'
from user code:
File "/home/imds/self-llm/models/Qwen3/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 346, in accumulate_chunk
(chunk_grad_input,), (chunk_loss, (unscaled_loss, chunk_completion_length, chunk_mean_kl,)) = torch.func.grad_and_value(
File "/home/imds/anaconda3/envs/grpo/lib/python3.12/site-packages/torch/_functorch/apis.py", line 442, in wrapper
return eager_transforms.grad_and_value_impl(
File "/home/imds/anaconda3/envs/grpo/lib/python3.12/site-packages/torch/_functorch/vmap.py", line 48, in fn
return f(*args, **kwargs)
File "/home/imds/anaconda3/envs/grpo/lib/python3.12/site-packages/torch/_functorch/eager_transforms.py", line 1364, in grad_and_value_impl
output = func(*args, **kwargs)
File "/home/imds/self-llm/models/Qwen3/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 298, in compute_loss
ref_logits = torch.matmul(ref_hidden_states.to(lm_head.dtype), lm_head.t())
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
复现步骤
1.按照教程执行10-Qwen3_8B_GRPO.ipynb中的命令
当执行到上述代码块时,出现错误
期望行为
在执行该代码块后应该正确进行训练,但在训练开始后约1min出现该错误
环境信息
操作系统:Ubuntu20.04
Python版本:3.12
GPU:4090(24G)
其他:
packages in environment at /home/imds/anaconda3/envs/grpo:
Name Version Build Channel
_libgcc_mutex 0.1 main https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
_openmp_mutex 5.1 1_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
accelerate 1.10.1 pypi_0 pypi
aiohappyeyeballs 2.6.1 pypi_0 pypi
aiohttp 3.13.1 pypi_0 pypi
aiosignal 1.4.0 pypi_0 pypi
airportsdata 20250909 pypi_0 pypi
annotated-types 0.7.0 pypi_0 pypi
anyio 4.11.0 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
asttokens 3.0.0 pyhd8ed1ab_1 conda-forge
attrs 25.4.0 pypi_0 pypi
bitsandbytes 0.48.1 pypi_0 pypi
blake3 1.0.8 pypi_0 pypi
boto3 1.40.55 pypi_0 pypi
botocore 1.40.55 pypi_0 pypi
bzip2 1.0.8 h5eee18b_6 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ca-certificates 2025.10.5 hbd8a1cb_0 conda-forge
cachetools 6.2.1 pypi_0 pypi
cbor2 5.7.0 pypi_0 pypi
certifi 2025.10.5 pypi_0 pypi
cffi 2.0.0 pypi_0 pypi
charset-normalizer 3.4.4 pypi_0 pypi
click 8.2.1 pypi_0 pypi
cloudpickle 3.1.1 pypi_0 pypi
comm 0.2.3 pyhe01879c_0 conda-forge
compressed-tensors 0.9.3 pypi_0 pypi
cupy-cuda12x 13.6.0 pypi_0 pypi
cut-cross-entropy 25.1.1 pypi_0 pypi
datasets 4.2.0 pypi_0 pypi
debugpy 1.8.16 py312hbdd6827_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
decorator 5.2.1 pyhd8ed1ab_0 conda-forge
deprecated 1.2.18 pypi_0 pypi
depyf 0.18.0 pypi_0 pypi
diffusers 0.35.2 pypi_0 pypi
dill 0.4.0 pypi_0 pypi
diskcache 5.6.3 pypi_0 pypi
distro 1.9.0 pypi_0 pypi
dnspython 2.8.0 pypi_0 pypi
docstring-parser 0.17.0 pypi_0 pypi
einops 0.8.1 pypi_0 pypi
email-validator 2.3.0 pypi_0 pypi
exceptiongroup 1.3.0 pyhd8ed1ab_0 conda-forge
executing 2.2.1 pyhd8ed1ab_0 conda-forge
expat 2.7.1 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
fastapi 0.115.1 pypi_0 pypi
fastapi-cli 0.0.13 pypi_0 pypi
fastapi-cloud-cli 0.3.1 pypi_0 pypi
fastrlock 0.8.3 pypi_0 pypi
filelock 3.20.0 pypi_0 pypi
frozendict 2.4.6 pypi_0 pypi
frozenlist 1.8.0 pypi_0 pypi
fsspec 2025.9.0 pypi_0 pypi
gguf 0.17.1 pypi_0 pypi
googleapis-common-protos 1.70.0 pypi_0 pypi
grpcio 1.75.1 pypi_0 pypi
h11 0.16.0 pypi_0 pypi
hf-transfer 0.1.9 pypi_0 pypi
hf-xet 1.1.10 pypi_0 pypi
httpcore 1.0.9 pypi_0 pypi
httptools 0.7.1 pypi_0 pypi
httpx 0.28.1 pypi_0 pypi
huggingface-hub 0.35.3 pypi_0 pypi
idna 3.11 pypi_0 pypi
importlib-metadata 8.0.0 pypi_0 pypi
interegular 0.3.3 pypi_0 pypi
ipykernel 7.0.1 pyha191276_0 conda-forge
ipython 9.6.0 pyhfa0c392_0 conda-forge
ipython_pygments_lexers 1.1.1 pyhd8ed1ab_0 conda-forge
jedi 0.19.2 pyhd8ed1ab_1 conda-forge
jinja2 3.1.6 pypi_0 pypi
jiter 0.11.1 pypi_0 pypi
jmespath 1.0.1 pypi_0 pypi
jsonschema 4.25.1 pypi_0 pypi
jsonschema-specifications 2025.9.1 pypi_0 pypi
jupyter_client 8.6.3 pyhd8ed1ab_1 conda-forge
jupyter_core 5.9.1 pyhc90fa1f_0 conda-forge
lark 1.2.2 pypi_0 pypi
ld_impl_linux-64 2.44 h153f514_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi 3.4.4 h6a678d5_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgcc-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgomp 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libsodium 1.0.18 h36c2ea0_1 conda-forge
libstdcxx-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libuuid 1.41.5 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libxcb 1.17.0 h9b100fa_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libzlib 1.3.1 hb25bd0a_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
llguidance 0.7.30 pypi_0 pypi
llvmlite 0.44.0 pypi_0 pypi
lm-format-enforcer 0.10.12 pypi_0 pypi
markdown-it-py 4.0.0 pypi_0 pypi
markupsafe 3.0.3 pypi_0 pypi
matplotlib-inline 0.1.7 pyhd8ed1ab_1 conda-forge
mdurl 0.1.2 pypi_0 pypi
mistral-common 1.8.5 pypi_0 pypi
modelscope 1.18.0 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
msgpack 1.1.2 pypi_0 pypi
msgspec 0.19.0 pypi_0 pypi
multidict 6.7.0 pypi_0 pypi
multiprocess 0.70.16 pypi_0 pypi
ncurses 6.5 h7934f7d_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
nest-asyncio 1.6.0 pyhd8ed1ab_1 conda-forge
networkx 3.5 pypi_0 pypi
ninja 1.13.0 pypi_0 pypi
numba 0.61.2 pypi_0 pypi
numpy 1.26.4 pypi_0 pypi
nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi
nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi
nvidia-cufile-cu12 1.13.1.3 pypi_0 pypi
nvidia-curand-cu12 10.3.5.147 pypi_0 pypi
nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi
nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi
nvidia-cusparselt-cu12 0.6.2 pypi_0 pypi
nvidia-ml-py 13.580.82 pypi_0 pypi
nvidia-nccl-cu12 2.21.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi
nvidia-nvshmem-cu12 3.3.20 pypi_0 pypi
nvidia-nvtx-cu12 12.4.127 pypi_0 pypi
openai 2.5.0 pypi_0 pypi
openai-harmony 0.0.4 pypi_0 pypi
opencv-python-headless 4.11.0.86 pypi_0 pypi
openssl 3.0.18 hd6dcaed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
opentelemetry-api 1.26.0 pypi_0 pypi
opentelemetry-exporter-otlp 1.26.0 pypi_0 pypi
opentelemetry-exporter-otlp-proto-common 1.26.0 pypi_0 pypi
opentelemetry-exporter-otlp-proto-grpc 1.26.0 pypi_0 pypi
opentelemetry-exporter-otlp-proto-http 1.26.0 pypi_0 pypi
opentelemetry-proto 1.26.0 pypi_0 pypi
opentelemetry-sdk 1.26.0 pypi_0 pypi
opentelemetry-semantic-conventions 0.47b0 pypi_0 pypi
opentelemetry-semantic-conventions-ai 0.4.13 pypi_0 pypi
outlines 0.1.11 pypi_0 pypi
outlines-core 0.1.26 pypi_0 pypi
packaging 25.0 pyh29332c3_1 conda-forge
pandas 2.3.3 pypi_0 pypi
parso 0.8.5 pyhcf101f3_0 conda-forge
partial-json-parser 0.2.1.1.post6 pypi_0 pypi
peft 0.17.1 pypi_0 pypi
pexpect 4.9.0 pyhd8ed1ab_1 conda-forge
pickleshare 0.7.5 pyhd8ed1ab_1004 conda-forge
pillow 12.0.0 pypi_0 pypi
pip 25.2 pyhc872135_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
platformdirs 4.5.0 pyhcf101f3_0 conda-forge
prettytable 3.16.0 pypi_0 pypi
prometheus-client 0.23.1 pypi_0 pypi
prometheus-fastapi-instrumentator 7.1.0 pypi_0 pypi
prompt-toolkit 3.0.52 pyha770c72_0 conda-forge
propcache 0.4.1 pypi_0 pypi
protobuf 3.20.3 pypi_0 pypi
psutil 7.1.0 pypi_0 pypi
pthread-stubs 0.3 h0ce48e5_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ptyprocess 0.7.0 pyhd8ed1ab_1 conda-forge
pure_eval 0.2.3 pyhd8ed1ab_1 conda-forge
py-cpuinfo 9.0.0 pypi_0 pypi
pyarrow 21.0.0 pypi_0 pypi
pybase64 1.4.2 pypi_0 pypi
pycountry 24.6.1 pypi_0 pypi
pycparser 2.23 pypi_0 pypi
pydantic 2.12.3 pypi_0 pypi
pydantic-core 2.41.4 pypi_0 pypi
pydantic-extra-types 2.10.6 pypi_0 pypi
pyecharts 2.0.9 pypi_0 pypi
pygments 2.19.2 pyhd8ed1ab_0 conda-forge
python 3.12.12 hc23ea64_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil 2.9.0.post0 pyhe01879c_2 conda-forge
python-dotenv 1.1.1 pypi_0 pypi
python-json-logger 4.0.0 pypi_0 pypi
python-multipart 0.0.20 pypi_0 pypi
pytz 2025.2 pypi_0 pypi
pyyaml 6.0.3 pypi_0 pypi
pyzmq 27.1.0 py312hcf8288c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ray 2.50.1 pypi_0 pypi
readline 8.3 hc2a1206_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
referencing 0.37.0 pypi_0 pypi
regex 2025.9.18 pypi_0 pypi
requests 2.32.5 pypi_0 pypi
rich 13.9.4 pypi_0 pypi
rich-toolkit 0.15.1 pypi_0 pypi
rignore 0.7.1 pypi_0 pypi
rpds-py 0.27.1 pypi_0 pypi
s3transfer 0.14.0 pypi_0 pypi
safetensors 0.6.2 pypi_0 pypi
scipy 1.16.2 pypi_0 pypi
sentencepiece 0.2.1 pypi_0 pypi
sentry-sdk 2.42.0 pypi_0 pypi
setproctitle 1.3.7 pypi_0 pypi
setuptools 79.0.1 pypi_0 pypi
shellingham 1.5.4 pypi_0 pypi
shtab 1.7.2 pypi_0 pypi
simplejson 3.20.2 pypi_0 pypi
six 1.17.0 pyhe01879c_1 conda-forge
sniffio 1.3.1 pypi_0 pypi
soundfile 0.13.1 pypi_0 pypi
soxr 1.0.0 pypi_0 pypi
sqlite 3.50.2 hb25bd0a_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
stack_data 0.6.3 pyhd8ed1ab_1 conda-forge
starlette 0.38.6 pypi_0 pypi
swanlab 0.6.12 pypi_0 pypi
sympy 1.13.1 pypi_0 pypi
tiktoken 0.12.0 pypi_0 pypi
tk 8.6.15 h54e0aa7_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tokenizers 0.21.4 pypi_0 pypi
torch 2.6.0 pypi_0 pypi
torchao 0.13.0 pypi_0 pypi
torchaudio 2.6.0 pypi_0 pypi
torchvision 0.21.0 pypi_0 pypi
tornado 6.5.1 py312h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tqdm 4.67.1 pypi_0 pypi
traitlets 5.14.3 pyhd8ed1ab_1 conda-forge
transformers 4.51.3 pypi_0 pypi
triton 3.2.0 pypi_0 pypi
trl 0.15.2 pypi_0 pypi
typeguard 4.4.4 pypi_0 pypi
typer 0.19.2 pypi_0 pypi
typing-inspection 0.4.2 pypi_0 pypi
typing_extensions 4.15.0 pyhcf101f3_0 conda-forge
tyro 0.9.35 pypi_0 pypi
tzdata 2025.2 pypi_0 pypi
unsloth 2025.10.6 pypi_0 pypi
unsloth-zoo 2025.10.7 pypi_0 pypi
urllib3 2.5.0 pypi_0 pypi
uvicorn 0.30.6 pypi_0 pypi
uvloop 0.22.1 pypi_0 pypi
vllm 0.8.5 pypi_0 pypi
watchfiles 1.1.1 pypi_0 pypi
wcwidth 0.2.14 pyhd8ed1ab_0 conda-forge
websockets 15.0.1 pypi_0 pypi
wheel 0.45.1 py312h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wrapt 1.17.3 pypi_0 pypi
xformers 0.0.29.post2 pypi_0 pypi
xgrammar 0.1.18 pypi_0 pypi
xorg-libx11 1.8.12 h9b100fa_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xorg-libxau 1.0.12 h9b100fa_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xorg-libxdmcp 1.1.5 h9b100fa_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xorg-xorgproto 2024.1 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xxhash 3.6.0 pypi_0 pypi
xz 5.6.4 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yarl 1.22.0 pypi_0 pypi
zeromq 4.3.5 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zipp 3.23.0 pyhd8ed1ab_0 conda-forge
zlib 1.3.1 hb25bd0a_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
其他信息

确认事项 / Verification
- 此问题未在过往Issue中被报告过 / This issue hasn't been reported before