Skip to content

Increased core count for paged SDPA for Qwen#37872

Open
atupe-tt wants to merge 1 commit intomainfrom
atupe/qwen-tg-core-optimization
Open

Increased core count for paged SDPA for Qwen#37872
atupe-tt wants to merge 1 commit intomainfrom
atupe/qwen-tg-core-optimization

Conversation

@atupe-tt
Copy link
Contributor

@atupe-tt atupe-tt commented Feb 13, 2026

Problem description

Improve the decode perf for Qwen on TG

What's changed

Increased the core count for paged SDPA (decode)

Checklist

  • All post-commit tests
  • Blackhole Post commit
  • cpp-unit-tests
  • New/Existing tests provide coverage for changes

Model tests

If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers models-mandatory and models-extended presets.
The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves decode performance for Qwen on TG by increasing the core count for paged SDPA (Scaled Dot-Product Attention) decode operations from 32 to 48 cores. The change aligns the Qwen-specific model configuration with the base Llama model configuration in model_config.py, which already uses these optimized settings.

Changes:

  • Increased compute grid size from (8, 4) to (8, 6) for PAGED_SDPA_DECODE_PROGCFG
  • Updated core count from 32 to 48 to match the new grid size (8 × 6 = 48)

Copy link
Contributor

@yalrawwashTT yalrawwashTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock, pending CI run of galaxy demo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants