Skip to content

Fix/vllm processor cache for text only model#359

Merged
hiyouga merged 3 commits intohiyouga:mainfrom
cyc00518:fix/vllm-processor-cache-for-text-only-model
Jun 17, 2025
Merged

Fix/vllm processor cache for text only model#359
hiyouga merged 3 commits intohiyouga:mainfrom
cyc00518:fix/vllm-processor-cache-for-text-only-model

Conversation

@cyc00518
Copy link
Copy Markdown
Contributor

Description

When using text-only model

bash examples/qwen3_4b_math_grpo.sh

The following error occurs:

(WorkerDict pid=209039) Ulysses patch applied!
(WorkerDict pid=209039) LlamaForCausalLM contains 3.21B parameters.
(WorkerDict pid=209039) After huggingface model init: 1.97 GB / 79.19 GB.
(WorkerDict pid=209039) FSDP wrap policy: functools.partial(<function transformer_auto_wrap_policy at 0x7eeb8e6679a0>, transformer_layer_cls={<class 'transformers.models.llama.modeling_llama.LlamaDecoderLayer'>}).
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 44.61it/s]
(WorkerDict pid=209230) [rank1]:[W617 09:12:58.873729887 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 1]  using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device. [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(WorkerDict pid=209039) After FSDP module init: 3.85 GB / 79.19 GB.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/data/EasyR1/verl/trainer/main.py", line 128, in <module>
    main()
  File "/workspace/data/EasyR1/verl/trainer/main.py", line 124, in main
    ray.get(runner.run.remote(ppo_config))
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2849, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 937, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError: ray::Runner.run() (pid=206413, ip=192.168.3.18, actor_id=4421ffe9a6e538f80a33443f01000000, repr=<main.Runner object at 0x7fac14760d30>)
  File "/workspace/data/EasyR1/verl/trainer/main.py", line 93, in run
    trainer.init_workers()
  File "/workspace/data/EasyR1/verl/trainer/ray_trainer.py", line 368, in init_workers
    self.actor_rollout_ref_wg.init_model()
  File "/workspace/data/EasyR1/verl/single_controller/ray/base.py", line 47, in func
    output = ray.get(output)
ray.exceptions.RayTaskError: ray::WorkerDict.actor_rollout_ref_init_model() (pid=209039, ip=192.168.3.18, actor_id=5d8d09101ae1c6ed53eb410701000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7eeb2d3a4490>)
  File "/workspace/data/EasyR1/verl/single_controller/ray/base.py", line 432, in func
    return getattr(self.worker_dict[key], name)(*args, **kwargs)
  File "/workspace/data/EasyR1/verl/single_controller/base/decorator.py", line 207, in inner
    return func(*args, **kwargs)
  File "/workspace/data/EasyR1/verl/workers/fsdp_workers.py", line 394, in init_model
    self._build_rollout()
  File "/workspace/data/EasyR1/verl/workers/fsdp_workers.py", line 333, in _build_rollout
    self.rollout = vLLMRollout(
  File "/workspace/data/EasyR1/verl/workers/rollout/vllm_rollout_spmd.py", line 87, in __init__
    self.inference_engine = LLM(
  File "/workspace/data/vllm/vllm/entrypoints/llm.py", line 243, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/workspace/data/vllm/vllm/engine/llm_engine.py", line 494, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
  File "/workspace/data/vllm/vllm/engine/arg_utils.py", line 1022, in create_engine_config
    model_config = self.create_model_config()
  File "/workspace/data/vllm/vllm/engine/arg_utils.py", line 913, in create_model_config
    return ModelConfig(
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_dataclasses.py", line 120, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
  Value error, `disable_mm_preprocessor_cache` is only supported for multimodal models. [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
(WorkerDict pid=209039) The original cause of the RayTaskError (<class 'pydantic_core._pydantic_core.ValidationError'>) isn't serializable: cannot pickle 'pydantic_core._pydantic_core.ArgsKwargs' object. Overwriting the cause to a RayError.
(WorkerDict pid=209230) The original cause of the RayTaskError (<class 'pydantic_core._pydantic_core.ValidationError'>) isn't serializable: cannot pickle 'pydantic_core._pydantic_core.ArgsKwargs' object. Overwriting the cause to a RayError. [repeated 7x across cluster]

cyc00518 added 2 commits June 17, 2025 23:47
…ache is only set for multimodal cases, avoiding errors with text-only models
@cyc00518 cyc00518 force-pushed the fix/vllm-processor-cache-for-text-only-model branch from a5b2bde to d37f6e0 Compare June 17, 2025 15:48
Comment thread verl/utils/tokenizer.py Outdated
"""Create a huggingface pretrained processor."""
processor = AutoProcessor.from_pretrained(model_path, **kwargs)
try:
processor = AutoProcessor.from_pretrained(model_path, **kwargs)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because transformers loads tokenizer if processor does not exist, we can keep the old version

Comment thread verl/workers/rollout/vllm_rollout_spmd.py
@hiyouga hiyouga merged commit b4566e0 into hiyouga:main Jun 17, 2025
1 check passed
hiyouga added a commit that referenced this pull request Oct 4, 2025
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants