v1.7.0 FP8 support with chunked prefill
This release:
- 📚 Supports FP8 quantization with chunked prefill and prefix caching
- 🐛 Fixes a bug with our graph comparison tests on spyre
- ⬆️ Adds vllm 0.14.1 support
What's Changed
- ⬆️ Support vllm 0.14.1 by @joerunde in #663
- test: fix test_compare_graphs by @tjohnson31415 in #671
- ⚡ Implement fp8 with chunked prefill with static scaling by @joerunde in #661
Full Changelog: v1.6.1...v1.7.0