Skip to content

Conversation

@rebel-jongho
Copy link
Collaborator

Pull Request Description

Type of Change

  • New Model Support
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please describe):

Changes Overview

Motivation and Context

Checklist

  • I have performed a self-review of my own code
  • I have added tests that prove my fix is effective or that my feature works (If needed)

Additional Information

Related Issues


Conventional commit

type(optional scope): description

Type candidate

  • Model Updates
    • model: Adding New models or Bugfix for existing models
      • ex) Add LlavaNext
      • ex) Bugfix Whisper
  • Enhancements
    • performance: Optimizing some models or this library itself
      • ex) Loading RBLNModel faster
      • ex) Optimizing Memory Usage of DecoderOnlyModel
  • Code Refactor
    • refactor: Re-arrange class architecture, or more.
      • ex) Refactor Seq2Seq
  • Documentation
    • doc: Update docstring only
  • Library Dependencies
    • dependency: Update requirements, something like that.
  • Other
    • other: None of above.
      • ex) ci update
      • ex) pdm update

@rebel-jongho rebel-jongho requested a review from Copilot September 8, 2025 06:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the quantization layer creation system to use explicit quantized layer classes instead of dynamically modifying linear layers. The changes introduce separate QIntLinear and QFloatLinear classes with their own forward methods, replacing the previous approach of monkey-patching forward methods onto existing layers.

Key changes:

  • Introduction of explicit quantized layer classes (QLinear, QIntLinear, QFloatLinear)
  • Refactored layer creation methods to use these new classes instead of dynamic modification
  • Updated parameter handling to support proper data type management for scales

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/optimum/rbln/transformers/utils/rbln_quantization.py Refactored quantization logic to use explicit layer classes and improved scale parameter handling
src/optimum/rbln/transformers/utils/qlinear.py Added new quantized linear layer classes with explicit forward implementations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants