-
Notifications
You must be signed in to change notification settings - Fork 86
Description
Description
Hello,
Currently, the method preprocess_request() in VLLMHandler (vllm_async_service.py) initializes predefined stream and non stream output formatters. When using vllm_async_service as entry point in AWS LMI containers, defining a custom_output_formatter in model.py (appropriately decorated with @output_formatter) would not overwrite existing output formatters set by the service.
Current limitations: The functionality of specifying a custom output formatter is limited to text generation only as stated in the documentation:
TextGenerationOutput: This subclass of RequestOutput is specific to text generation tasks. Right now this is the only task supported for custom output formatter. Each text generation task can generate multiple sequences.
The output formatters utilized by the current async service operate on a richer set of protocols such as ChatCompletionResponse and CompletionResponse.
Will this change the current api? How?
Probably the api will need to be adapted to accept user supplied formatters.
Who will benefit from this enhancement?
Users who want to have a finer control over the service output
References
- list reference and related literature
- list known implementations