Skip to content

Qwen3-VL-8B-Instruct + fsdp: RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda #143

@ImmortalSdm

Description

@ImmortalSdm

I am training Qwen3-VL-8B-Instruct with fsdp in ppo trainer, but facing error below.

Under normal circumstances, for the ref_model, FSDP will already place self.pos_embed on CUDA when entering its forward pass. Therefore, the following code executes without any issue, and the computation is correctly performed on CUDA. However, for the policy model, I observed the error.

my env info is below:
transformers: 4.57.1; vllm: 0.11.0; verl: 0.7.0.dev0

[rank0]: Traceback (most recent call last):
[rank0]: File "python/ray/_raylet.pyx", line 1715, in ray._raylet.execute_task
[rank0]: File "python/ray/_raylet.pyx", line 1826, in ray._raylet.execute_task
[rank0]: File "python/ray/_raylet.pyx", line 1722, in ray._raylet.execute_task
[rank0]: File "python/ray/_raylet.pyx", line 1659, in ray._raylet.execute_task.function_executor
[rank0]: File "python/ray/_raylet.pyx", line 4340, in ray._raylet.CoreWorker.run_async_func_or_coro_in_event_loop
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/concurrent/futures/_base.py", line 458, in result
[rank0]: return self.__get_result()
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
[rank0]: raise self._exception
[rank0]: File "python/ray/_raylet.pyx", line 4327, in async_func
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/ray/_private/async_compat.py", line 52, in wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/ray/_private/function_manager.py", line 693, in actor_method_executor
[rank0]: return method(__ray_actor, *args, **kwargs)
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/ray/util/tracing/tracing_helper.py", line 461, in _resume_span
[rank0]: return method(self, *_args, **_kwargs)
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/single_controller/ray/base.py", line 700, in func
[rank0]: return getattr(self.worker_dict[key], name)(*args, **kwargs)
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/single_controller/base/decorator.py", line 442, in inner
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/utils/transferqueue_utils.py", line 199, in dummy_inner
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/utils/profiler/profile.py", line 256, in wrapper
[rank0]: return func(self_instance, *args, **kwargs_inner)
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/workers/fsdp_workers.py", line 1018, in compute_ref_log_prob
[rank0]: output, _ = self.ref_policy.compute_log_prob(data=data, calculate_entropy=False)
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/utils/profiler/performance.py", line 105, in f
[rank0]: return self.log(decorated_function, *args, **kwargs)
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/utils/profiler/performance.py", line 118, in log
[rank0]: output = func(*args, **kwargs)
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/workers/actor/dp_actor.py", line 353, in compute_log_prob
[rank0]: entropy, log_probs = self._forward_micro_batch(
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/workers/actor/dp_actor.py", line 179, in _forward_micro_batch
[rank0]: output = self.actor_module(
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1879, in _call_impl
[rank0]: return inner()
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1827, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/models/transformers/qwen3_vl.py", line 261, in forward_with_normal_backend
[rank0]: outputs = self.model(input_ids, **kwargs)
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/models/transformers/qwen3_vl.py", line 244, in qwen3_vl_base_forward
[rank0]: input_kwargs = _get_input_embeds(
[rank0]: File "/home/ma-user/work/test/Video-QAR/verl/verl/models/transformers/qwen3_vl.py", line 212, in _get_input_embeds
[rank0]: image_embeds, dummy_deepstack_image_embeds = model.visual(pixel_values, grid_thw=image_grid_thw)
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 716, in forward
[rank0]: pos_embeds = self.fast_pos_embed_interpolate(grid_thw)
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 685, in fast_pos_embed_interpolate
[rank0]: pos_embeds = self.pos_embed(idx_tensor) * weight_tensor[:, :, None]
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1879, in _call_impl
[rank0]: return inner()
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1827, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 192, in forward
[rank0]: return F.embedding(
[rank0]: File "/home/ma-user/anaconda3/envs/video-rl/lib/python3.10/site-packages/torch/nn/functional.py", line 2546, in embedding
[rank0]: return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[rank0]: RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__index_select)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions