Open
Description
Concise Description:
vLLM v0.6.0 provides 2.7x throughput improvement and 5x latency reduction over the previous version (v0.5.3)
DLC image/dockerfile:
763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-lmi11.0.0-cu124
763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-neuronx-sdk2.19.1
Is your feature request related to a problem? Please describe.
Improve the performance of LMI containters
Describe the solution you'd like
Update vLLM library in LMI containers to v0.6.0
Metadata
Metadata
Assignees
Labels
No labels