vllm_async_service: Inject custom output formatters into VLLMHandler

## Description
Hello,


Currently, the method `preprocess_request()`  in `VLLMHandler` (vllm_async_service.py) initializes predefined stream and non stream output formatters. When using vllm_async_service as entry point in AWS LMI containers, defining a custom_output_formatter in model.py (appropriately decorated with @output_formatter) would not overwrite existing output formatters set by the service. 

Current limitations: The functionality of specifying a custom output formatter is limited to text generation only as stated in the documentation: 

> TextGenerationOutput: This subclass of RequestOutput is specific to text generation tasks. Right now this is the only task supported for custom output formatter. Each text generation task can generate multiple sequences.

The output formatters utilized by the current async service operate on a richer set of protocols such as ChatCompletionResponse and CompletionResponse. 

Will this change the current api? How?

Probably the api will need to be adapted to accept user supplied formatters.

Who will benefit from this enhancement?

Users who want to have a finer control over the service output

## References
- list reference and related literature
- list known implementations


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm_async_service: Inject custom output formatters into VLLMHandler #2819

Description

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vllm_async_service: Inject custom output formatters into VLLMHandler #2819

Description

Description

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions