Skip to content

AsyncLLMServerManager.generate() outputs weird text ( ie. A mix of various languages) #4717

@LianShuQuan

Description

@LianShuQuan

System Info

vllm 0.11.0
verl 0.6.1
torch 2.8.0
transformers 4.57.3
flash-attn 2.8.3
GPU: H100

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  • When I deploy model via vllm serve ${MODEL} and inference via openai-python api, the output is fine.
  • But when I use verl with vllm backend, the output often contain weird text ( ie. A mix of various languages)
rollout_manager = RayPPOTrainer.async_rollout_manager
self.rollout_manager: AgentLoopManager = rollout_manager
self.server_manager = AsyncLLMServerManager(config, server_handles=rollout_manager.server_handles)

token_output: TokenOutput = await self.server_manager.generate(request_id=application_id, prompt_ids=request_prompt_ids, image_data=image_data, sampling_params=sampling_params)

    async def wake_up(self):
        """Wake up all rollout replica instances asynchronously."""
        await asyncio.gather(*[replica.wake_up() for replica in self.rollout_manager.rollout_replicas])

    async def sleep(self):
        """Sleep all rollout replica instances asynchronously."""
        await asyncio.gather(*[replica.sleep() for replica in self.rollout_manager.rollout_replicas])

Expected behavior

  • When I deploy model via vllm serve ${MY_SFT_MODEL} and inference via openai-python api, the output is fine.
  • But when I use verl with vllm backend, the output often contain weird text ( ie. A mix of various languages)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions