Skip to content

Speedup quantization in refit for FP8 GRPO #1467

@guyueh1

Description

@guyueh1

Is your feature request related to a problem? Please describe.
Track the effort of accelerating the in-flight quantization in refit when vllm uses FP8 precision weights.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions