Skip to content

Conversation

@kinjalpatel27
Copy link
Contributor

What does this PR do?

Type of change: Bug fix

Overview:
This MR disabled conv1d quantization, specifically linear_attn.conv1d is used in qwen3-next and mixer.conv1d is used in mamba model. This conv1d isn't usually quantized, therefore disabling it explicitly.

Testing

python hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-Next-80B-A3B-Thinking --qformat fp8 --export_fmt hf --dataset cnn_dailymail --export_path qwen3-next-80b-thinking-fp8-ptq--nokv --trust_remote_code --inference_pipeline_parallel 1 --batch_size 1 --calib_size 4 --kv_cache_qformat none

python /app/tensorrt_llm/examples/llm-api/quickstart_advanced.py --model_dir qwen3-next-80b-thinking-fp8-ptq--nokv --tp_size 1 --disable_kv_cache_reuse

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: NA
  • Did you add or update any necessary documentation?: NA
  • Did you update Changelog?: NA

Signed-off-by: Kinjal Patel <[email protected]>
@kinjalpatel27 kinjalpatel27 requested a review from a team as a code owner November 3, 2025 23:07
@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.44%. Comparing base (b660d39) to head (9b9eb1a).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #495      +/-   ##
==========================================
+ Coverage   73.39%   73.44%   +0.04%     
==========================================
  Files         180      180              
  Lines       18134    18147      +13     
==========================================
+ Hits        13310    13328      +18     
+ Misses       4824     4819       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 merged commit d10acc0 into main Nov 4, 2025
33 of 41 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kinjal/qwen3-next branch November 4, 2025 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants