Skip to content

vllm_async_service: Inject custom output formatters into VLLMHandler #2819

@CoolFish88

Description

@CoolFish88

Description

Hello,

Currently, the method preprocess_request() in VLLMHandler (vllm_async_service.py) initializes predefined stream and non stream output formatters. When using vllm_async_service as entry point in AWS LMI containers, defining a custom_output_formatter in model.py (appropriately decorated with @output_formatter) would not overwrite existing output formatters set by the service.

Current limitations: The functionality of specifying a custom output formatter is limited to text generation only as stated in the documentation:

TextGenerationOutput: This subclass of RequestOutput is specific to text generation tasks. Right now this is the only task supported for custom output formatter. Each text generation task can generate multiple sequences.

The output formatters utilized by the current async service operate on a richer set of protocols such as ChatCompletionResponse and CompletionResponse.

Will this change the current api? How?

Probably the api will need to be adapted to accept user supplied formatters.

Who will benefit from this enhancement?

Users who want to have a finer control over the service output

References

  • list reference and related literature
  • list known implementations

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions