Qwen3-VL-8B-Instruct + fsdp: RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda

I am training Qwen3-VL-8B-Instruct with fsdp in ppo trainer, but facing error below.

Under normal circumstances, for the ref_model, FSDP will already place self.pos_embed on CUDA when entering its forward pass. Therefore, the following code executes without any issue, and the computation is correctly performed on CUDA. However, for the policy model, I observed the error.

my env info is below:
`transformers: 4.57.1; vllm: 0.11.0; verl: 0.7.0.dev0`

> [rank0]: Traceback (most recent call last):
[rank0]:   File "python/ray/_raylet.pyx", line 1715, in ray._raylet.execute_task
[rank0]:   File "python/ray/_raylet.pyx", line 1826, in ray._raylet.execute_task
[rank0]:   File "python/ray/_raylet.pyx", line 1722, in ray._raylet.execute_task
[rank0]:   File "python/ray/_raylet.pyx", line 1659, in ray._raylet.execute_task.function_executor
[rank0]:   File "python/ray/_raylet.pyx", line 4340, in ray._raylet.CoreWorker.run_async_func_or_coro_in_event_loop
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/concurrent/futures/_base.py", line 458, in result
[rank0]:     return self.__get_result()
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
[rank0]:     raise self._exception
[rank0]:   File "python/ray/_raylet.pyx", line 4327, in async_func
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/ray/_private/async_compat.py", line 52, in wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/ray/_private/function_manager.py", line 693, in actor_method_executor
[rank0]:     return method(__ray_actor, *args, **kwargs)
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/ray/util/tracing/tracing_helper.py", line 461, in _resume_span
[rank0]:     return method(self, *_args, **_kwargs)
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/single_controller/ray/base.py", line 700, in func
[rank0]:     return getattr(self.worker_dict[key], name)(*args, **kwargs)
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/single_controller/base/decorator.py", line 442, in inner
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/utils/transferqueue_utils.py", line 199, in dummy_inner
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/utils/profiler/profile.py", line 256, in wrapper
[rank0]:     return func(self_instance, *args, **kwargs_inner)
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/workers/fsdp_workers.py", line 1018, in compute_ref_log_prob
[rank0]:     output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/utils/profiler/performance.py", line 105, in f
[rank0]:     return self.log(decorated_function, *args, **kwargs)
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/utils/profiler/performance.py", line 118, in log
[rank0]:     output = func(*args, **kwargs)
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/workers/actor/dp_actor.py", line 353, in compute_log_prob
[rank0]:     entropy, log_probs = self._forward_micro_batch(
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/workers/actor/dp_actor.py", line 179, in _forward_micro_batch
[rank0]:     output = self.actor_module(
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1879, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1827, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/models/transformers/qwen3_vl.py", line 261, in forward_with_normal_backend
[rank0]:     outputs = self.model(input_ids, **kwargs)
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/models/transformers/qwen3_vl.py", line 244, in qwen3_vl_base_forward
[rank0]:     input_kwargs = _get_input_embeds(
[rank0]:   File "/home/ma-user/work/test/Video-QAR/verl/verl/models/transformers/qwen3_vl.py", line 212, in _get_input_embeds
[rank0]:     image_embeds, dummy_deepstack_image_embeds = model.visual(pixel_values, grid_thw=image_grid_thw)
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 716, in forward
[rank0]:     pos_embeds = self.fast_pos_embed_interpolate(grid_thw)
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 685, in fast_pos_embed_interpolate
[rank0]:     pos_embeds = self.pos_embed(idx_tensor) * weight_tensor[:, :, None]
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1879, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1827, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 192, in forward
[rank0]:     return F.embedding(
[rank0]:   File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/functional.py", line 2546, in embedding
[rank0]:     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[rank0]: RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__index_select)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-VL-8B-Instruct + fsdp: RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda #143

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3-VL-8B-Instruct + fsdp: RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda #143

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions