v0.15.0rc1
Pre-release
Pre-release
This pre-release is a alignment to the upstream vLLM v0.15.0.
Highlights
- Rebase to Upstream vLLM v0.15.0: vLLM-Omni is now fully aligned with the latest vLLM v0.15.0 core, bringing in all the latest upstream features, bug fixes, and performance improvements (#1159).
- Tensor Parallelism for LongCat-Image: We have added Tensor Parallelism (TP) support for
LongCat-ImageandLongCat-Image-Editmodels, significantly improving the inference speed and scalability of these vision-language models (#926). - TeaCache Optimization: Introduced Coefficient Estimation for TeaCache, further refining the efficiency of our caching mechanisms for optimized generation (#940).
- Alignment & Stability:
- Update paper link: A intial paper to arxiv to give introductions to our design and some performance test results (#1169).
What's Changed
Features & Optimizations
- [TeaCache]: Add Coefficient Estimation by @princepride in #940
- [Feature] add Tensor Parallelism to LongCat-Image(-Edit) by @hadipash in #926
Alignment & Integration
- Dev/rebase v0.15.0 by @tzhouam in #1159
- [Misc] Align error handling with upstream vLLM v0.14.0 by @ceanna93 in #1122
- [Misc] Bump version to 0.14.0 by @ywang96 in #1128
Infrastructure (CI/CD) & Documentation
- [Doc] First stable release of vLLM-Omni by @ywang96 in #1129
- [CI]: Bagel E2E Smoked Test by @princepride in #1074
- [CI] Refactor test_sequence_parallel.py and add a warmup run by @mxuax in #1165
- [CI] Temporarily remove slow tests. by @congw729 in #1143
- [Debug] Clear Dockerfile.ci to accelerate build image by @tzhouam in #1172
- [Debug] Correct Unreasonable Long Timeout by @tzhouam in #1175
- [Docs] Update paper link by @hsliuustc0106 in #1169
New Contributors
Full Changelog: v0.14.0...v0.15.0rc1