Debug: num_items_in_batch on a different device from loss. #147

KimRass · 2025-05-23T07:44:08Z

The code I committed will resolve errors such as:

Traceback (most recent call last):
    File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
        return _run_code(code, main_globals, None,
    File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
    File "/home/eric/workspace/Qwen-SFT/sft_unsloth.py", line 121, in <module>
        trainer_stats = trainer.train()
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
        return inner_training_loop(
    File "<string>", line 315, in _fast_inner_training_loop
    File "<string>", line 31, in _unsloth_training_step
    File "/home/eric/workspace/Qwen-SFT/unsloth_compiled_cache/UnslothSFTTrainer.py", line 748, in compute_loss
        outputs = super().compute_loss(
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/unsloth/models/_utils.py", line 1043, in _unsloth_pre_compute_loss
        outputs = self._old_compute_loss(model, inputs, *args, **kwargs)
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/transformers/trainer.py", line 3801, in compute_loss
        outputs = model(**inputs)
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
        return forward_call(*args, **kwargs)
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/accelerate/utils/operations.py", line 818, in forward
        return model_forward(*args, **kwargs)
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/accelerate/utils/operations.py", line 806, in __call__
        return convert_to_fp32(self.model_forward(*args, **kwargs))
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
        return func(*args, **kwargs)
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/peft/peft_model.py", line 1757, in forward
        return self.base_model(
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1845, in _call_impl
        return inner()
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1793, in inner
        result = forward_call(*args, **kwargs)
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 193, in forward
        return self.model.forward(*args, **kwargs)
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/accelerate/hooks.py", line 175, in new_forward
        output = module._old_forward(*args, **kwargs)
    File "/home/eric/workspace/Qwen-SFT/unsloth_compiled_cache/unsloth_compiled_module_qwen2_5_vl.py", line 1365, in forward
        return Qwen2_5_VLForConditionalGeneration_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **loss_kwargs)
    File "/home/eric/workspace/Qwen-SFT/unsloth_compiled_cache/unsloth_compiled_module_qwen2_5_vl.py", line 1028, in Qwen2_5_VLForConditionalGeneration_forward
        loss = fused_linear_cross_entropy(
    File "/home/eric/workspace/venv/vitlp/lib/python3.10/site-packages/unsloth_zoo/loss_utils.py", line 188, in fused_linear_cross_entropy
        if num_items_in_batch is not None: loss = loss / num_items_in_batch
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

danielhanchen · 2025-05-25T10:28:53Z

Thanks @KimRass !

@Erland366 Was this relevant to the multi GPU issues you were experiencing?

Erland366 · 2025-05-25T21:50:22Z

I already explain the problem here in this comment, kindly check them and continue the discussion there -> #139 (comment)

Debug: num_items_in_batch on a different device from loss.

2167302

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Debug: num_items_in_batch on a different device from loss. #147

Debug: num_items_in_batch on a different device from loss. #147

Uh oh!

KimRass commented May 23, 2025

Uh oh!

danielhanchen commented May 25, 2025

Uh oh!

Erland366 commented May 25, 2025

Uh oh!

Uh oh!

Debug: num_items_in_batch on a different device from loss. #147

Are you sure you want to change the base?

Debug: num_items_in_batch on a different device from loss. #147

Uh oh!

Conversation

KimRass commented May 23, 2025

Uh oh!

danielhanchen commented May 25, 2025

Uh oh!

Erland366 commented May 25, 2025

Uh oh!

Uh oh!