Add CK-free fallback for fused QKNorm+RoPE+Cache by sunway513 · Pull Request #34 · sunway513/ATOM

sunway513 · 2026-03-09T02:47:03Z

Summary

Wraps fused_qk_norm_rope_cache_quant_shuffle in attention_mha.py:rope_cache() with try-except
On CK-free builds where the HIP fused kernel is unavailable, gracefully falls through to the existing non-fused Triton path (rotary_emb + q/k_norm + reshape_and_cache)
qkv.clone() backup before fused kernel call, restored on failure (protects against partial in-place writes)
Log-once warning via class attribute to avoid log spam on hot path
q_norm is None guard on middle path preserves original elif invariant

Test Results

#	Test	Result
1	Import: `from atom.model_ops.attention_mha import PagedAttentionImpl`	PASS
2	Mock: fused kernel raises → fallback executes (rotary_emb, q_norm, k_norm all called, qkv restored, log-once)	PASS
3	E2E: Qwen3-0.6B with fused kernel disabled	BLOCKED (separate FMHA CK-free JIT issue)
4	Numerical: fused vs non-fused cosine similarity	BLOCKED (same FMHA issue)

Known Limitation

E2E tests are blocked by a separate AITER-side issue: module_fmha_v3_varlen_fwd JIT compilation fails in CK-free builds because the ASM attention kernels still depend on CK-Tile headers (fmha_fwd.hpp). Our rope_cache fallback works correctly through model load + warmup (logs confirm individual module_rope_pos_fwd and module_cache loaded successfully).

Shengnan's team is working on removing the CK header dependency from ASM attention kernels in AITER. Once that lands, the full CK-free E2E path (this PR + FMHA fix) will be unblocked.

github-actions · 2026-03-09T02:47:18Z

atom/model_ops/attention_mha.py


+logger = logging.getLogger("atom")
+
 from atom.plugin.prepare import is_plugin_mode, is_vllm


⚠️ [ruff] <E402> _{reported by reviewdog 🐶}
Module level import not at top of file

github-actions · 2026-03-09T02:47:18Z

atom/model_ops/attention_mha.py

+logger = logging.getLogger("atom")
+
 from atom.plugin.prepare import is_plugin_mode, is_vllm
 from atom.plugin.attention_mha import PagedAttentionImplDecoratorForPluginMode


⚠️ [ruff] <E402> _{reported by reviewdog 🐶}
Module level import not at top of file

Wrap fused_qk_norm_rope_cache_quant_shuffle in try-except so that CK-free builds gracefully fall through to the non-fused Triton path (rotary_emb + q/k_norm + reshape_and_cache) instead of crashing. Key safety measures: - qkv.clone() backup before fused kernel call, restored on failure (protects against partial in-place writes before exception) - log-once warning via class attribute to avoid log spam - q_norm is None guard on middle path preserves original elif invariant

sunway513 · 2026-03-09T04:13:37Z

@gyohuangxin @ZhiweiYan-96 @valarLip — requesting review on this CK-free fallback for rope_cache().

What this does: When fused_qk_norm_rope_cache_quant_shuffle (HIP fused kernel) is unavailable in CK-free builds, we catch the exception and fall through to the existing non-fused Triton path (individual rotary_emb → q/k_norm → reshape_and_cache). This is a companion fix to upstream PR ROCm#278 (CK-free Docker builds).

Safety measures: qkv.clone() backup before the fused call (restored on failure), log-once warning, and q_norm is None guard to preserve the original elif routing invariant.

Known limitation: Full E2E test is blocked by a separate AITER-side issue — the ASM attention kernels (module_fmha_v3_varlen_fwd) still depend on CK-Tile headers (fmha_fwd.hpp) even in CK-free builds. Shengnan's team is working on removing this dependency. Once that lands, the full CK-free inference path will be unblocked.

Unit tests (import + mock fallback with 5 assertions) pass cleanly.

sunway513 · 2026-03-09T04:33:47Z

Moved to upstream: ROCm#279

github-actions bot reviewed Mar 9, 2026

View reviewed changes

sunway513 force-pushed the feat/ckfree-fused-rope-fallback branch from 86890b8 to c1a866a Compare March 9, 2026 02:49

sunway513 force-pushed the feat/ckfree-fused-rope-fallback branch from c1a866a to 0ca5159 Compare March 9, 2026 03:58

sunway513 closed this Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CK-free fallback for fused QKNorm+RoPE+Cache#34

Add CK-free fallback for fused QKNorm+RoPE+Cache#34
sunway513 wants to merge 1 commit intomainfrom
feat/ckfree-fused-rope-fallback

sunway513 commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot Mar 9, 2026

Uh oh!

github-actions bot Mar 9, 2026

Uh oh!

sunway513 commented Mar 9, 2026

Uh oh!

sunway513 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		logger = logging.getLogger("atom")

		from atom.plugin.prepare import is_plugin_mode, is_vllm

Conversation

sunway513 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related

Test Results

Known Limitation

Uh oh!

github-actions bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

sunway513 commented Mar 9, 2026

Uh oh!

sunway513 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sunway513 commented Mar 9, 2026 •

edited

Loading