Skip to content

Add support for priority in vllm backend #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

TheCodeWrangler
Copy link

@TheCodeWrangler TheCodeWrangler commented Apr 24, 2025

Add Priority Request Support for vLLM Async Engine

Description

This PR adds support for priority-based request scheduling in the vLLM async engine. When the engine is configured with a scheduler policy set to priority, the .generate() method now supports an input parameter for priority (lowest priority first). This PR adds an optional input tensor for priority (defaults to 0) which is passed to the generate method.

Motivation

In applications where multiple sources submit work to the vLLM backend with different priorities, it is desirable to have the most time-sensitive work performed first. This feature allows users to:

  • Prioritize critical requests over background tasks
  • Implement different service level agreements (SLAs) for different types of requests
  • Better manage system resources by processing high-priority requests first

Changes

  1. Added an optional priority input tensor to the model configuration:

    {
        "name": "priority",
        "data_type": "TYPE_INT32",
        "dims": [1],
        "optional": True
    }
  2. Modified the _generate method to handle the priority parameter:

    if not priority:
        priority = 0
    response_iterator = self._llm_engine.generate(
        prompt, sampling_params, request_id, lora_request=lora_request, priority=priority
    )

Testing

  • Added unit tests for priority handling
  • Verified that requests with different priorities are processed in the correct order
  • Confirmed that default priority (0) works when priority is not specified

Documentation

  • Updated model configuration documentation
  • Added examples of priority usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants