Skip to content

Gemma4 Support FFT + Zero3 #9255

@AceHao

Description

@AceHao

Checklist / 检查清单

  • I have searched existing issues, and this is a new question or discussion topic. / 我已经搜索过现有的 issues,确认这是一个新的问题与讨论。

Question Description / 问题描述

Gemma-4 model family is not working with FFT + deepspeed Zero3

I am using official image : modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.9.1-py312-torch2.10.0-vllm0.19.1-modelscope1.35.4-swift4.1.3

Didn't change any version:

  • transformers version : 5.6.2
  • vllm version : 0.19.1

Issue 1 :

[rank6]: AssertionError: Attempted to load weight (torch.Size([4096, 1024])) into parameter (torch.Size([0]))

[rank6]: Traceback (most recent call last):
[rank6]:   File "/usr/local/lib/python3.12/site-packages/swift/cli/rlhf.py", line 7, in <module>
[rank6]:     rlhf_main()
[rank6]:   File "/usr/local/lib/python3.12/site-packages/swift/pipelines/train/rlhf.py", line 246, in rlhf_main
[rank6]:     return SwiftRLHF(args).main()
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/swift/pipelines/base.py", line 52, in main
[rank6]:     result = self.run()
[rank6]:              ^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/swift/ray/base.py", line 168, in wrapper
[rank6]:     return func(self, *args, **kwargs)
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/swift/pipelines/train/sft.py", line 189, in run
[rank6]:     trainer = trainer_cls(
[rank6]:               ^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/grpo_trainer.py", line 109, in __init__
[rank6]:     self.prepare_rollout()
[rank6]:   File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 106, in prepare_rollout
[rank6]:     self._prepare_vllm()
[rank6]:   File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 197, in _prepare_vllm
[rank6]:     self.engine = self._prepare_vllm_engine()
[rank6]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 251, in _prepare_vllm_engine
[rank6]:     engine = GRPOVllmEngine(

[rank6]:     self.model_executor = executor_class(vllm_config)
[rank6]:                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank6]:     return func(*args, **kwargs)
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in __init__
[rank6]:     self._init_executor()
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 167, in _init_executor
[rank6]:     super()._init_executor()
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
[rank6]:     self.driver_worker.load_model()
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
[rank6]:     self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank6]:     return func(*args, **kwargs)
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4751, in load_model
[rank6]:     self.model = model_loader.load_model(
[rank6]:                  ^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank6]:     return func(*args, **kwargs)
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
[rank6]:     self.load_weights(model, model_config)
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank6]:     return func(*args, **kwargs)
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights
[rank6]:     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
[rank6]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/gemma4_mm.py", line 1355, in load_weights
[rank6]:     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
[rank6]:     return original_load_weights(self, weights, *args, **kwargs)
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 355, in load_weights
[rank6]:     autoloaded_weights = set(self._load_module("", self.module, weights))
[rank6]:                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 302, in _load_module
[rank6]:     yield from self._load_module(
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 302, in _load_module
[rank6]:     yield from self._load_module(
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 302, in _load_module
[rank6]:     yield from self._load_module(
[rank6]:   [Previous line repeated 3 more times]
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 311, in _load_module
[rank6]:     yield from self._load_param(
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 225, in _load_param
[rank6]:     weight_loader(param, weight_data)
[rank6]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 1249, in default_weight_loader
[rank6]:     assert param.size() == loaded_weight.size(), (
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: AssertionError: Attempted to load weight (torch.Size([4096, 1024])) into parameter (torch.Size([0]))
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:03<?, ?it/s]

Issue 2 :

Then I added --vllm_limit_mm_per_prompt '{"image": 0, "audio": 0, "video": 0}' \
New error : [rank1]: NotImplementedError: Cannot copy out of meta tensor; no data!

[rank1]: Traceback (most recent call last):
[rank1]:   File "/usr/local/lib/python3.12/site-packages/swift/cli/rlhf.py", line 7, in <module>
[rank1]:     rlhf_main()
[rank1]:   File "/usr/local/lib/python3.12/site-packages/swift/pipelines/train/rlhf.py", line 246, in rlhf_main
[rank1]:     return SwiftRLHF(args).main()
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/swift/pipelines/base.py", line 52, in main
[rank1]:     result = self.run()
[rank1]:              ^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/swift/ray/base.py", line 168, in wrapper
[rank1]:     return func(self, *args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/swift/pipelines/train/sft.py", line 189, in run
[rank1]:     trainer = trainer_cls(
[rank1]:               ^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/grpo_trainer.py", line 109, in __init__
[rank1]:     self.prepare_rollout()
[rank1]:   File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 106, in prepare_rollout
[rank1]:     self._prepare_vllm()
[rank1]:   File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 197, in _prepare_vllm
[rank1]:     self.engine = self._prepare_vllm_engine()
[rank1]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/swift/rlhf_trainers/rollout_mixin.py", line 251, in _prepare_vllm_engine
[rank1]:     engine = GRPOVllmEngine(
[rank1]:              ^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/swift/infer_engine/vllm_engine.py", line 162, in __init__
[rank1]:     self._prepare_engine()
[rank1]:   File "/usr/local/lib/python3.12/site-packages/swift/infer_engine/vllm_engine.py", line 184, in _prepare_engine
[rank1]:     engine = llm_engine_cls.from_engine_args(self.engine_args)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:     self._init_executor()
[rank1]:   File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 167, in _init_executor
[rank1]:     super()._init_executor()
[rank1]:   File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
[rank1]:     self.driver_worker.load_model()
[rank1]:   File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
[rank1]:     self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
[rank1]:   File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4751, in load_model
[rank1]:     self.model = model_loader.load_model(
[rank1]:                  ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 55, in load_model
[rank1]:     model = initialize_model(
[rank1]:             ^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 57, in initialize_model
[rank1]:     model = model_class(vllm_config=vllm_config, prefix=prefix)
[rank1]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/gemma4_mm.py", line 924, in __init__
[rank1]:     self.vision_tower = AutoModel.from_config(config=config.vision_config)
[rank1]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 241, in from_config
[rank1]:     return model_class._from_config(config, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 1551, in _from_config
[rank1]:     initialize_weights_zero3(model)
[rank1]:   File "/usr/local/lib/python3.12/site-packages/transformers/integrations/deepspeed.py", line 325, in initialize_weights_zero3
[rank1]:     _apply_zero3(model, model._initialize_weights)
[rank1]:   File "/usr/local/lib/python3.12/site-packages/transformers/integrations/deepspeed.py", line 313, in _apply_zero3
[rank1]:     _apply_zero3(child, fn)
[rank1]:   File "/usr/local/lib/python3.12/site-packages/transformers/integrations/deepspeed.py", line 313, in _apply_zero3
[rank1]:     _apply_zero3(child, fn)
[rank1]:   File "/usr/local/lib/python3.12/site-packages/transformers/integrations/deepspeed.py", line 321, in _apply_zero3
[rank1]:     fn(model_or_module, is_remote_code)
[rank1]:   File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2408, in _initialize_weights
[rank1]:     self._init_weights(module)
[rank1]:   File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 1457, in _init_weights
[rank1]:     super()._init_weights(module)
[rank1]:   File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2383, in _init_weights
[rank1]:     init.copy_(module.inv_freq, buffer_value)
[rank1]:   File "/usr/local/lib/python3.12/site-packages/transformers/initialization.py", line 162, in copy_
[rank1]:     return tensor.copy_(other)
[rank1]:            ^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/torch/utils/_device.py", line 109, in __torch_function__
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/site-packages/torch/utils/_device.py", line 109, in __torch_function__
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]: NotImplementedError: Cannot copy out of meta tensor; no data!

Script to reproduce :

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
swift rlhf \
    --rlhf_type grpo \
    --use_hf true \
    --model google/gemma-4-E2B-it \
    --dataset open-r1/DAPO-Math-17k-Processed \
    --num_train_epochs 1 \
    --max_steps 1000 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 2 \
    --generation_batch_size 16 \
    --num_generations 4 \
    --reward_funcs accuracy format \
    --use_vllm true \
    --vllm_mode colocate \
    --vllm_gpu_memory_utilization 0.5 \
    --vllm_max_model_len 10240 \
    --vllm_limit_mm_per_prompt '{"image": 0, "audio": 0, "video": 0}' \
    --sleep_level 1 \
    --max_length 2048 \
    --max_completion_length 8192 \
    --tuner_type full \
    --learning_rate 1e-6 \
    --torch_dtype bfloat16 \
    --beta 0.001 \
    --importance_sampling_level sequence \
    --rollout_importance_sampling_mode token_truncate \
    --epsilon 0.2 \
    --epsilon_high 0.2 \
    --loss_type grpo \
    --dynamic_sample false \
    --overlong_filter true \
    --async_generate false \
    --offload_model true \
    --offload_optimizer true \
    --sleep_level 2 \
    --logging_steps 1 \
    --gradient_checkpointing true \
    --dataloader_num_workers 8 \
    --dataset_num_proc 8 \
    --temperature 0.7 \
    --padding_free false \
    --freeze_vit true \
    --log_completions true \
    --eval_steps 1000 \
    --save_steps 1000 \
    --deepspeed zero3 \
    --top_k 0 \
    --top_p 1 \
    --num_iterations 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions