-
Notifications
You must be signed in to change notification settings - Fork 166
Open
Labels
Low PrecisionPerformanceRelated to improving performanceRelated to improving performancedeepseekRelated to deepseek 671bRelated to deepseek 671b
Description
Is your feature request related to a problem? Please describe.
Track the effort of accelerating the in-flight quantization in refit when vllm uses FP8 precision weights.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Metadata
Metadata
Assignees
Labels
Low PrecisionPerformanceRelated to improving performanceRelated to improving performancedeepseekRelated to deepseek 671bRelated to deepseek 671b