Skip to content

[Bug]Qwen3-GRPO-郭宣伯-InternalTorchDynamoError #465

@yuanbovin

Description

@yuanbovin

出bug的具体模型

Qwen3-8B

出bug的具体模型教程

Qwen3-GRPO

教程负责人

郭宣伯

Bug描述

在执行
`# 实例化GRPOTrainer
trainer = GRPOTrainer(
model = model, # 我们的基础模型
processing_class = tokenizer, # 分词器
reward_funcs = [ # 传入我们定义的所有奖励函数
# match_format_exactly, # 这个函数被注释掉了,因为它太严格了
match_format_approximately,
check_answer,
check_numbers,
],
args = training_args, # 传入训练配置
train_dataset = dataset, # 训练数据集
)

开始GRPO训练

trainer.train()`
出现Bug
InternalTorchDynamoError: AttributeError: 'NoneType' object has no attribute 'to'

from user code:
File "/home/imds/self-llm/models/Qwen3/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 346, in accumulate_chunk
(chunk_grad_input,), (chunk_loss, (unscaled_loss, chunk_completion_length, chunk_mean_kl,)) = torch.func.grad_and_value(
File "/home/imds/anaconda3/envs/grpo/lib/python3.12/site-packages/torch/_functorch/apis.py", line 442, in wrapper
return eager_transforms.grad_and_value_impl(
File "/home/imds/anaconda3/envs/grpo/lib/python3.12/site-packages/torch/_functorch/vmap.py", line 48, in fn
return f(*args, **kwargs)
File "/home/imds/anaconda3/envs/grpo/lib/python3.12/site-packages/torch/_functorch/eager_transforms.py", line 1364, in grad_and_value_impl
output = func(*args, **kwargs)
File "/home/imds/self-llm/models/Qwen3/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 298, in compute_loss
ref_logits = torch.matmul(ref_hidden_states.to(lm_head.dtype), lm_head.t())

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

复现步骤

1.按照教程执行10-Qwen3_8B_GRPO.ipynb中的命令
当执行到上述代码块时,出现错误

期望行为

在执行该代码块后应该正确进行训练,但在训练开始后约1min出现该错误

环境信息

操作系统:Ubuntu20.04
Python版本:3.12
GPU:4090(24G)
其他:

packages in environment at /home/imds/anaconda3/envs/grpo:

Name Version Build Channel

_libgcc_mutex 0.1 main https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
_openmp_mutex 5.1 1_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
accelerate 1.10.1 pypi_0 pypi
aiohappyeyeballs 2.6.1 pypi_0 pypi
aiohttp 3.13.1 pypi_0 pypi
aiosignal 1.4.0 pypi_0 pypi
airportsdata 20250909 pypi_0 pypi
annotated-types 0.7.0 pypi_0 pypi
anyio 4.11.0 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
asttokens 3.0.0 pyhd8ed1ab_1 conda-forge
attrs 25.4.0 pypi_0 pypi
bitsandbytes 0.48.1 pypi_0 pypi
blake3 1.0.8 pypi_0 pypi
boto3 1.40.55 pypi_0 pypi
botocore 1.40.55 pypi_0 pypi
bzip2 1.0.8 h5eee18b_6 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ca-certificates 2025.10.5 hbd8a1cb_0 conda-forge
cachetools 6.2.1 pypi_0 pypi
cbor2 5.7.0 pypi_0 pypi
certifi 2025.10.5 pypi_0 pypi
cffi 2.0.0 pypi_0 pypi
charset-normalizer 3.4.4 pypi_0 pypi
click 8.2.1 pypi_0 pypi
cloudpickle 3.1.1 pypi_0 pypi
comm 0.2.3 pyhe01879c_0 conda-forge
compressed-tensors 0.9.3 pypi_0 pypi
cupy-cuda12x 13.6.0 pypi_0 pypi
cut-cross-entropy 25.1.1 pypi_0 pypi
datasets 4.2.0 pypi_0 pypi
debugpy 1.8.16 py312hbdd6827_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
decorator 5.2.1 pyhd8ed1ab_0 conda-forge
deprecated 1.2.18 pypi_0 pypi
depyf 0.18.0 pypi_0 pypi
diffusers 0.35.2 pypi_0 pypi
dill 0.4.0 pypi_0 pypi
diskcache 5.6.3 pypi_0 pypi
distro 1.9.0 pypi_0 pypi
dnspython 2.8.0 pypi_0 pypi
docstring-parser 0.17.0 pypi_0 pypi
einops 0.8.1 pypi_0 pypi
email-validator 2.3.0 pypi_0 pypi
exceptiongroup 1.3.0 pyhd8ed1ab_0 conda-forge
executing 2.2.1 pyhd8ed1ab_0 conda-forge
expat 2.7.1 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
fastapi 0.115.1 pypi_0 pypi
fastapi-cli 0.0.13 pypi_0 pypi
fastapi-cloud-cli 0.3.1 pypi_0 pypi
fastrlock 0.8.3 pypi_0 pypi
filelock 3.20.0 pypi_0 pypi
frozendict 2.4.6 pypi_0 pypi
frozenlist 1.8.0 pypi_0 pypi
fsspec 2025.9.0 pypi_0 pypi
gguf 0.17.1 pypi_0 pypi
googleapis-common-protos 1.70.0 pypi_0 pypi
grpcio 1.75.1 pypi_0 pypi
h11 0.16.0 pypi_0 pypi
hf-transfer 0.1.9 pypi_0 pypi
hf-xet 1.1.10 pypi_0 pypi
httpcore 1.0.9 pypi_0 pypi
httptools 0.7.1 pypi_0 pypi
httpx 0.28.1 pypi_0 pypi
huggingface-hub 0.35.3 pypi_0 pypi
idna 3.11 pypi_0 pypi
importlib-metadata 8.0.0 pypi_0 pypi
interegular 0.3.3 pypi_0 pypi
ipykernel 7.0.1 pyha191276_0 conda-forge
ipython 9.6.0 pyhfa0c392_0 conda-forge
ipython_pygments_lexers 1.1.1 pyhd8ed1ab_0 conda-forge
jedi 0.19.2 pyhd8ed1ab_1 conda-forge
jinja2 3.1.6 pypi_0 pypi
jiter 0.11.1 pypi_0 pypi
jmespath 1.0.1 pypi_0 pypi
jsonschema 4.25.1 pypi_0 pypi
jsonschema-specifications 2025.9.1 pypi_0 pypi
jupyter_client 8.6.3 pyhd8ed1ab_1 conda-forge
jupyter_core 5.9.1 pyhc90fa1f_0 conda-forge
lark 1.2.2 pypi_0 pypi
ld_impl_linux-64 2.44 h153f514_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi 3.4.4 h6a678d5_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgcc-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgomp 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libsodium 1.0.18 h36c2ea0_1 conda-forge
libstdcxx-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libuuid 1.41.5 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libxcb 1.17.0 h9b100fa_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libzlib 1.3.1 hb25bd0a_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
llguidance 0.7.30 pypi_0 pypi
llvmlite 0.44.0 pypi_0 pypi
lm-format-enforcer 0.10.12 pypi_0 pypi
markdown-it-py 4.0.0 pypi_0 pypi
markupsafe 3.0.3 pypi_0 pypi
matplotlib-inline 0.1.7 pyhd8ed1ab_1 conda-forge
mdurl 0.1.2 pypi_0 pypi
mistral-common 1.8.5 pypi_0 pypi
modelscope 1.18.0 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
msgpack 1.1.2 pypi_0 pypi
msgspec 0.19.0 pypi_0 pypi
multidict 6.7.0 pypi_0 pypi
multiprocess 0.70.16 pypi_0 pypi
ncurses 6.5 h7934f7d_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
nest-asyncio 1.6.0 pyhd8ed1ab_1 conda-forge
networkx 3.5 pypi_0 pypi
ninja 1.13.0 pypi_0 pypi
numba 0.61.2 pypi_0 pypi
numpy 1.26.4 pypi_0 pypi
nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi
nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi
nvidia-cufile-cu12 1.13.1.3 pypi_0 pypi
nvidia-curand-cu12 10.3.5.147 pypi_0 pypi
nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi
nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi
nvidia-cusparselt-cu12 0.6.2 pypi_0 pypi
nvidia-ml-py 13.580.82 pypi_0 pypi
nvidia-nccl-cu12 2.21.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi
nvidia-nvshmem-cu12 3.3.20 pypi_0 pypi
nvidia-nvtx-cu12 12.4.127 pypi_0 pypi
openai 2.5.0 pypi_0 pypi
openai-harmony 0.0.4 pypi_0 pypi
opencv-python-headless 4.11.0.86 pypi_0 pypi
openssl 3.0.18 hd6dcaed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
opentelemetry-api 1.26.0 pypi_0 pypi
opentelemetry-exporter-otlp 1.26.0 pypi_0 pypi
opentelemetry-exporter-otlp-proto-common 1.26.0 pypi_0 pypi
opentelemetry-exporter-otlp-proto-grpc 1.26.0 pypi_0 pypi
opentelemetry-exporter-otlp-proto-http 1.26.0 pypi_0 pypi
opentelemetry-proto 1.26.0 pypi_0 pypi
opentelemetry-sdk 1.26.0 pypi_0 pypi
opentelemetry-semantic-conventions 0.47b0 pypi_0 pypi
opentelemetry-semantic-conventions-ai 0.4.13 pypi_0 pypi
outlines 0.1.11 pypi_0 pypi
outlines-core 0.1.26 pypi_0 pypi
packaging 25.0 pyh29332c3_1 conda-forge
pandas 2.3.3 pypi_0 pypi
parso 0.8.5 pyhcf101f3_0 conda-forge
partial-json-parser 0.2.1.1.post6 pypi_0 pypi
peft 0.17.1 pypi_0 pypi
pexpect 4.9.0 pyhd8ed1ab_1 conda-forge
pickleshare 0.7.5 pyhd8ed1ab_1004 conda-forge
pillow 12.0.0 pypi_0 pypi
pip 25.2 pyhc872135_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
platformdirs 4.5.0 pyhcf101f3_0 conda-forge
prettytable 3.16.0 pypi_0 pypi
prometheus-client 0.23.1 pypi_0 pypi
prometheus-fastapi-instrumentator 7.1.0 pypi_0 pypi
prompt-toolkit 3.0.52 pyha770c72_0 conda-forge
propcache 0.4.1 pypi_0 pypi
protobuf 3.20.3 pypi_0 pypi
psutil 7.1.0 pypi_0 pypi
pthread-stubs 0.3 h0ce48e5_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ptyprocess 0.7.0 pyhd8ed1ab_1 conda-forge
pure_eval 0.2.3 pyhd8ed1ab_1 conda-forge
py-cpuinfo 9.0.0 pypi_0 pypi
pyarrow 21.0.0 pypi_0 pypi
pybase64 1.4.2 pypi_0 pypi
pycountry 24.6.1 pypi_0 pypi
pycparser 2.23 pypi_0 pypi
pydantic 2.12.3 pypi_0 pypi
pydantic-core 2.41.4 pypi_0 pypi
pydantic-extra-types 2.10.6 pypi_0 pypi
pyecharts 2.0.9 pypi_0 pypi
pygments 2.19.2 pyhd8ed1ab_0 conda-forge
python 3.12.12 hc23ea64_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil 2.9.0.post0 pyhe01879c_2 conda-forge
python-dotenv 1.1.1 pypi_0 pypi
python-json-logger 4.0.0 pypi_0 pypi
python-multipart 0.0.20 pypi_0 pypi
pytz 2025.2 pypi_0 pypi
pyyaml 6.0.3 pypi_0 pypi
pyzmq 27.1.0 py312hcf8288c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ray 2.50.1 pypi_0 pypi
readline 8.3 hc2a1206_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
referencing 0.37.0 pypi_0 pypi
regex 2025.9.18 pypi_0 pypi
requests 2.32.5 pypi_0 pypi
rich 13.9.4 pypi_0 pypi
rich-toolkit 0.15.1 pypi_0 pypi
rignore 0.7.1 pypi_0 pypi
rpds-py 0.27.1 pypi_0 pypi
s3transfer 0.14.0 pypi_0 pypi
safetensors 0.6.2 pypi_0 pypi
scipy 1.16.2 pypi_0 pypi
sentencepiece 0.2.1 pypi_0 pypi
sentry-sdk 2.42.0 pypi_0 pypi
setproctitle 1.3.7 pypi_0 pypi
setuptools 79.0.1 pypi_0 pypi
shellingham 1.5.4 pypi_0 pypi
shtab 1.7.2 pypi_0 pypi
simplejson 3.20.2 pypi_0 pypi
six 1.17.0 pyhe01879c_1 conda-forge
sniffio 1.3.1 pypi_0 pypi
soundfile 0.13.1 pypi_0 pypi
soxr 1.0.0 pypi_0 pypi
sqlite 3.50.2 hb25bd0a_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
stack_data 0.6.3 pyhd8ed1ab_1 conda-forge
starlette 0.38.6 pypi_0 pypi
swanlab 0.6.12 pypi_0 pypi
sympy 1.13.1 pypi_0 pypi
tiktoken 0.12.0 pypi_0 pypi
tk 8.6.15 h54e0aa7_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tokenizers 0.21.4 pypi_0 pypi
torch 2.6.0 pypi_0 pypi
torchao 0.13.0 pypi_0 pypi
torchaudio 2.6.0 pypi_0 pypi
torchvision 0.21.0 pypi_0 pypi
tornado 6.5.1 py312h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tqdm 4.67.1 pypi_0 pypi
traitlets 5.14.3 pyhd8ed1ab_1 conda-forge
transformers 4.51.3 pypi_0 pypi
triton 3.2.0 pypi_0 pypi
trl 0.15.2 pypi_0 pypi
typeguard 4.4.4 pypi_0 pypi
typer 0.19.2 pypi_0 pypi
typing-inspection 0.4.2 pypi_0 pypi
typing_extensions 4.15.0 pyhcf101f3_0 conda-forge
tyro 0.9.35 pypi_0 pypi
tzdata 2025.2 pypi_0 pypi
unsloth 2025.10.6 pypi_0 pypi
unsloth-zoo 2025.10.7 pypi_0 pypi
urllib3 2.5.0 pypi_0 pypi
uvicorn 0.30.6 pypi_0 pypi
uvloop 0.22.1 pypi_0 pypi
vllm 0.8.5 pypi_0 pypi
watchfiles 1.1.1 pypi_0 pypi
wcwidth 0.2.14 pyhd8ed1ab_0 conda-forge
websockets 15.0.1 pypi_0 pypi
wheel 0.45.1 py312h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wrapt 1.17.3 pypi_0 pypi
xformers 0.0.29.post2 pypi_0 pypi
xgrammar 0.1.18 pypi_0 pypi
xorg-libx11 1.8.12 h9b100fa_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xorg-libxau 1.0.12 h9b100fa_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xorg-libxdmcp 1.1.5 h9b100fa_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xorg-xorgproto 2024.1 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xxhash 3.6.0 pypi_0 pypi
xz 5.6.4 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yarl 1.22.0 pypi_0 pypi
zeromq 4.3.5 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zipp 3.23.0 pyhd8ed1ab_0 conda-forge
zlib 1.3.1 hb25bd0a_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main

其他信息

Image

确认事项 / Verification

  • 此问题未在过往Issue中被报告过 / This issue hasn't been reported before

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions