Skip to content

Conversation

@rebel-jaehwang
Copy link
Contributor

🚀 Summary of Changes

  • Add MPLinearKernel for RBLN, which supports compressed-tensors models with w4a16 channel/group quant and w8a16 channel quant. The compiler support will be in included in the next release.
  • Monkey-patch the MPLinearKernel selector as the upstream version doesn't support OOT kernel registration.

✅ Type of Change

  • ✨ Feature (feature)
  • 🧠 Model support (model)
  • 🧬 Core engine changes (core)
  • 🛠 Bug fix (bug-fix)
  • ⚙️ Performance improvement (perf)
  • 🔁 Refactor or code cleanup (refactor)
  • 📄 Documentation (docs)
  • ❓ Other (other): please describe

🧪 How to Test

Run examples/experimental/offline_inference_basic.py with model changed to quantized models such as RedHatAI/phi-4-quantized.w4a16.

📋 Checklist

  • PR title follows Conventional Commits format
  • This PR is linked to an existing issue
  • The test method is described, and the expected result is clearly stated
  • Relevant documentation has been updated (if applicable)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants