Fix/vllm processor cache for text only model by cyc00518 · Pull Request #359 · hiyouga/EasyR1

cyc00518 · 2025-06-17T15:39:24Z

Description

When using text-only model

bash examples/qwen3_4b_math_grpo.sh

The following error occurs:

(WorkerDict pid=209039) Ulysses patch applied!
(WorkerDict pid=209039) LlamaForCausalLM contains 3.21B parameters.
(WorkerDict pid=209039) After huggingface model init: 1.97 GB / 79.19 GB.
(WorkerDict pid=209039) FSDP wrap policy: functools.partial(<function transformer_auto_wrap_policy at 0x7eeb8e6679a0>, transformer_layer_cls={<class 'transformers.models.llama.modeling_llama.LlamaDecoderLayer'>}).
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 44.61it/s]
(WorkerDict pid=209230) [rank1]:[W617 09:12:58.873729887 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 1]  using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device. [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(WorkerDict pid=209039) After FSDP module init: 3.85 GB / 79.19 GB.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/data/EasyR1/verl/trainer/main.py", line 128, in <module>
    main()
  File "/workspace/data/EasyR1/verl/trainer/main.py", line 124, in main
    ray.get(runner.run.remote(ppo_config))
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2849, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 937, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError: ray::Runner.run() (pid=206413, ip=192.168.3.18, actor_id=4421ffe9a6e538f80a33443f01000000, repr=<main.Runner object at 0x7fac14760d30>)
  File "/workspace/data/EasyR1/verl/trainer/main.py", line 93, in run
    trainer.init_workers()
  File "/workspace/data/EasyR1/verl/trainer/ray_trainer.py", line 368, in init_workers
    self.actor_rollout_ref_wg.init_model()
  File "/workspace/data/EasyR1/verl/single_controller/ray/base.py", line 47, in func
    output = ray.get(output)
ray.exceptions.RayTaskError: ray::WorkerDict.actor_rollout_ref_init_model() (pid=209039, ip=192.168.3.18, actor_id=5d8d09101ae1c6ed53eb410701000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7eeb2d3a4490>)
  File "/workspace/data/EasyR1/verl/single_controller/ray/base.py", line 432, in func
    return getattr(self.worker_dict[key], name)(*args, **kwargs)
  File "/workspace/data/EasyR1/verl/single_controller/base/decorator.py", line 207, in inner
    return func(*args, **kwargs)
  File "/workspace/data/EasyR1/verl/workers/fsdp_workers.py", line 394, in init_model
    self._build_rollout()
  File "/workspace/data/EasyR1/verl/workers/fsdp_workers.py", line 333, in _build_rollout
    self.rollout = vLLMRollout(
  File "/workspace/data/EasyR1/verl/workers/rollout/vllm_rollout_spmd.py", line 87, in __init__
    self.inference_engine = LLM(
  File "/workspace/data/vllm/vllm/entrypoints/llm.py", line 243, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/workspace/data/vllm/vllm/engine/llm_engine.py", line 494, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
  File "/workspace/data/vllm/vllm/engine/arg_utils.py", line 1022, in create_engine_config
    model_config = self.create_model_config()
  File "/workspace/data/vllm/vllm/engine/arg_utils.py", line 913, in create_model_config
    return ModelConfig(
  File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_dataclasses.py", line 120, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
  Value error, `disable_mm_preprocessor_cache` is only supported for multimodal models. [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
(WorkerDict pid=209039) The original cause of the RayTaskError (<class 'pydantic_core._pydantic_core.ValidationError'>) isn't serializable: cannot pickle 'pydantic_core._pydantic_core.ArgsKwargs' object. Overwriting the cause to a RayError.
(WorkerDict pid=209230) The original cause of the RayTaskError (<class 'pydantic_core._pydantic_core.ValidationError'>) isn't serializable: cannot pickle 'pydantic_core._pydantic_core.ArgsKwargs' object. Overwriting the cause to a RayError. [repeated 7x across cluster]

…except block

…ache is only set for multimodal cases, avoiding errors with text-only models

hiyouga · 2025-06-17T17:13:17Z

    """Create a huggingface pretrained processor."""
-    processor = AutoProcessor.from_pretrained(model_path, **kwargs)
+    try:
+        processor = AutoProcessor.from_pretrained(model_path, **kwargs)


Because transformers loads tokenizer if processor does not exist, we can keep the old version

Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>

cyc00518 added 2 commits June 17, 2025 23:47

[utils] handle models that do not provide a processor by using a try/…

5c2c10d

…except block

[rollout] Adjusted the vLLM rollout code so disable_mm_preprocessor_c…

d37f6e0

…ache is only set for multimodal cases, avoiding errors with text-only models

cyc00518 force-pushed the fix/vllm-processor-cache-for-text-only-model branch from a5b2bde to d37f6e0 Compare June 17, 2025 15:48

hiyouga reviewed Jun 17, 2025

View reviewed changes

Comment thread verl/workers/rollout/vllm_rollout_spmd.py

Update tokenizer.py

e27f9c9

hiyouga approved these changes Jun 17, 2025

View reviewed changes

hiyouga merged commit b4566e0 into hiyouga:main Jun 17, 2025
1 check passed

hiyouga added a commit that referenced this pull request Oct 4, 2025

[rollout] fix vllm processor cache for text only model (#359)

b0b6aea

Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/vllm processor cache for text only model#359

Fix/vllm processor cache for text only model#359
hiyouga merged 3 commits intohiyouga:mainfrom
cyc00518:fix/vllm-processor-cache-for-text-only-model

cyc00518 commented Jun 17, 2025

Uh oh!

hiyouga Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cyc00518 commented Jun 17, 2025

Description

Uh oh!

hiyouga Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants