Skip to content

Conversation

@scopophobic
Copy link

@scopophobic scopophobic commented Jan 27, 2026

Refactor FP8 dequantization and detection using registry pattern

Summary

Refactors FP8 dequantization and detection logic to use a registry pattern, making FP8 support more explicit, maintainable, and extensible. Pure refactoring - no behavior changes, fully backward compatible.

What Changed

  • Added FP8_DEQUANT_REGISTRY with @register_fp8_layer() decorator
  • Registered CompressedLinear and FP8Linear handlers
  • Refactored is_fp8_linear() to use registry-first detection
  • Refactored convert_fp8_layer_to_linear() to use registry dispatch
  • Updated convert_fp8_model_to_16b_model() to support all registered types

Benefits

  • Explicit: Only registered layer types are supported
  • Consistent: Detection and dequantization stay in sync
  • Extensible: Adding new FP8 types requires just a handler + registration
  • Maintainable: Clear separation of concerns

Files Changed

  • auto_round/utils/model.py (only file modified)

Type: Refactoring | Breaking: No | Migration: No

Signed-off-by: Adithyan Madhu <adithyanworkmail@gmail.com>
@yiliu30
Copy link
Contributor

yiliu30 commented Jan 27, 2026

Hi @scopophobic, the CI is currently blocked due to the Transformers v5 upgrade. I’ll get back to you once it’s fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants