Checklist / 检查清单
Question Description / 问题描述
Gemma-4 model family is not working with FFT + deepspeed Zero3
I am using official image : modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.9.1-py312-torch2.10.0-vllm0.19.1-modelscope1.35.4-swift4.1.3
Didn't change any version:
- transformers version : 5.6.2
- vllm version : 0.19.1
Issue 1 :
[rank6]: AssertionError: Attempted to load weight (torch.Size([4096, 1024])) into parameter (torch.Size([0]))
[rank6]: Traceback (most recent call last):
[rank6]: File "/usr/local/lib/python3.12/site-packages/swift/cli/rlhf.py", line 7, in <module>
[rank6]: rlhf_main()
[rank6]: File "/usr/local/lib/python3.12/site-packages/swift/pipelines/train/rlhf.py", line 246, in rlhf_main
[rank6]: return SwiftRLHF(args).main()
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/swift/pipelines/base.py", line 52, in main
[rank6]: result = self.run()
[rank6]: ^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/swift/ray/base.py", line 168, in wrapper
[rank6]: return func(self, *args, **kwargs)
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/swift/pipelines/train/sft.py", line 189, in run
[rank6]: trainer = trainer_cls(
[rank6]: ^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/grpo_trainer.py", line 109, in __init__
[rank6]: self.prepare_rollout()
[rank6]: File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 106, in prepare_rollout
[rank6]: self._prepare_vllm()
[rank6]: File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 197, in _prepare_vllm
[rank6]: self.engine = self._prepare_vllm_engine()
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 251, in _prepare_vllm_engine
[rank6]: engine = GRPOVllmEngine(
[rank6]: self.model_executor = executor_class(vllm_config)
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank6]: return func(*args, **kwargs)
[rank6]: ^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in __init__
[rank6]: self._init_executor()
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 167, in _init_executor
[rank6]: super()._init_executor()
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
[rank6]: self.driver_worker.load_model()
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
[rank6]: self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank6]: return func(*args, **kwargs)
[rank6]: ^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4751, in load_model
[rank6]: self.model = model_loader.load_model(
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank6]: return func(*args, **kwargs)
[rank6]: ^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
[rank6]: self.load_weights(model, model_config)
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank6]: return func(*args, **kwargs)
[rank6]: ^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights
[rank6]: loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/gemma4_mm.py", line 1355, in load_weights
[rank6]: return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
[rank6]: return original_load_weights(self, weights, *args, **kwargs)
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 355, in load_weights
[rank6]: autoloaded_weights = set(self._load_module("", self.module, weights))
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 302, in _load_module
[rank6]: yield from self._load_module(
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 302, in _load_module
[rank6]: yield from self._load_module(
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 302, in _load_module
[rank6]: yield from self._load_module(
[rank6]: [Previous line repeated 3 more times]
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 311, in _load_module
[rank6]: yield from self._load_param(
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 225, in _load_param
[rank6]: weight_loader(param, weight_data)
[rank6]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 1249, in default_weight_loader
[rank6]: assert param.size() == loaded_weight.size(), (
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: AssertionError: Attempted to load weight (torch.Size([4096, 1024])) into parameter (torch.Size([0]))
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:03<?, ?it/s]
Issue 2 :
Then I added --vllm_limit_mm_per_prompt '{"image": 0, "audio": 0, "video": 0}' \
New error : [rank1]: NotImplementedError: Cannot copy out of meta tensor; no data!
[rank1]: Traceback (most recent call last):
[rank1]: File "/usr/local/lib/python3.12/site-packages/swift/cli/rlhf.py", line 7, in <module>
[rank1]: rlhf_main()
[rank1]: File "/usr/local/lib/python3.12/site-packages/swift/pipelines/train/rlhf.py", line 246, in rlhf_main
[rank1]: return SwiftRLHF(args).main()
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/swift/pipelines/base.py", line 52, in main
[rank1]: result = self.run()
[rank1]: ^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/swift/ray/base.py", line 168, in wrapper
[rank1]: return func(self, *args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/swift/pipelines/train/sft.py", line 189, in run
[rank1]: trainer = trainer_cls(
[rank1]: ^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/grpo_trainer.py", line 109, in __init__
[rank1]: self.prepare_rollout()
[rank1]: File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 106, in prepare_rollout
[rank1]: self._prepare_vllm()
[rank1]: File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 197, in _prepare_vllm
[rank1]: self.engine = self._prepare_vllm_engine()
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 251, in _prepare_vllm_engine
[rank1]: engine = GRPOVllmEngine(
[rank1]: ^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/swift/infer_engine/vllm_engine.py", line 162, in __init__
[rank1]: self._prepare_engine()
[rank1]: File "/usr/local/lib/python3.12/site-packages/swift/infer_engine/vllm_engine.py", line 184, in _prepare_engine
[rank1]: engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: self._init_executor()
[rank1]: File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 167, in _init_executor
[rank1]: super()._init_executor()
[rank1]: File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
[rank1]: self.driver_worker.load_model()
[rank1]: File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
[rank1]: self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
[rank1]: File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank1]: return func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4751, in load_model
[rank1]: self.model = model_loader.load_model(
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank1]: return func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 55, in load_model
[rank1]: model = initialize_model(
[rank1]: ^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank1]: return func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 57, in initialize_model
[rank1]: model = model_class(vllm_config=vllm_config, prefix=prefix)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/gemma4_mm.py", line 924, in __init__
[rank1]: self.vision_tower = AutoModel.from_config(config=config.vision_config)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 241, in from_config
[rank1]: return model_class._from_config(config, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 1551, in _from_config
[rank1]: initialize_weights_zero3(model)
[rank1]: File "/usr/local/lib/python3.12/site-packages/transformers/integrations/deepspeed.py", line 325, in initialize_weights_zero3
[rank1]: _apply_zero3(model, model._initialize_weights)
[rank1]: File "/usr/local/lib/python3.12/site-packages/transformers/integrations/deepspeed.py", line 313, in _apply_zero3
[rank1]: _apply_zero3(child, fn)
[rank1]: File "/usr/local/lib/python3.12/site-packages/transformers/integrations/deepspeed.py", line 313, in _apply_zero3
[rank1]: _apply_zero3(child, fn)
[rank1]: File "/usr/local/lib/python3.12/site-packages/transformers/integrations/deepspeed.py", line 321, in _apply_zero3
[rank1]: fn(model_or_module, is_remote_code)
[rank1]: File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2408, in _initialize_weights
[rank1]: self._init_weights(module)
[rank1]: File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank1]: return func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 1457, in _init_weights
[rank1]: super()._init_weights(module)
[rank1]: File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank1]: return func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2383, in _init_weights
[rank1]: init.copy_(module.inv_freq, buffer_value)
[rank1]: File "/usr/local/lib/python3.12/site-packages/transformers/initialization.py", line 162, in copy_
[rank1]: return tensor.copy_(other)
[rank1]: ^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/torch/utils/_device.py", line 109, in __torch_function__
[rank1]: return func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/torch/utils/_device.py", line 109, in __torch_function__
[rank1]: return func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: NotImplementedError: Cannot copy out of meta tensor; no data!
Script to reproduce :
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
swift rlhf \
--rlhf_type grpo \
--use_hf true \
--model google/gemma-4-E2B-it \
--dataset open-r1/DAPO-Math-17k-Processed \
--num_train_epochs 1 \
--max_steps 1000 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 2 \
--generation_batch_size 16 \
--num_generations 4 \
--reward_funcs accuracy format \
--use_vllm true \
--vllm_mode colocate \
--vllm_gpu_memory_utilization 0.5 \
--vllm_max_model_len 10240 \
--vllm_limit_mm_per_prompt '{"image": 0, "audio": 0, "video": 0}' \
--sleep_level 1 \
--max_length 2048 \
--max_completion_length 8192 \
--tuner_type full \
--learning_rate 1e-6 \
--torch_dtype bfloat16 \
--beta 0.001 \
--importance_sampling_level sequence \
--rollout_importance_sampling_mode token_truncate \
--epsilon 0.2 \
--epsilon_high 0.2 \
--loss_type grpo \
--dynamic_sample false \
--overlong_filter true \
--async_generate false \
--offload_model true \
--offload_optimizer true \
--sleep_level 2 \
--logging_steps 1 \
--gradient_checkpointing true \
--dataloader_num_workers 8 \
--dataset_num_proc 8 \
--temperature 0.7 \
--padding_free false \
--freeze_vit true \
--log_completions true \
--eval_steps 1000 \
--save_steps 1000 \
--deepspeed zero3 \
--top_k 0 \
--top_p 1 \
--num_iterations 1
Checklist / 检查清单
Question Description / 问题描述
Gemma-4 model family is not working with FFT + deepspeed Zero3
I am using official image :
modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.9.1-py312-torch2.10.0-vllm0.19.1-modelscope1.35.4-swift4.1.3Didn't change any version:
Issue 1 :
[rank6]: AssertionError: Attempted to load weight (torch.Size([4096, 1024])) into parameter (torch.Size([0]))Issue 2 :
Then I added
--vllm_limit_mm_per_prompt '{"image": 0, "audio": 0, "video": 0}' \New error :
[rank1]: NotImplementedError: Cannot copy out of meta tensor; no data!Script to reproduce :