Skip to content

Add CK-free fallback for fused QKNorm+RoPE+Cache#34

Closed
sunway513 wants to merge 1 commit intomainfrom
feat/ckfree-fused-rope-fallback
Closed

Add CK-free fallback for fused QKNorm+RoPE+Cache#34
sunway513 wants to merge 1 commit intomainfrom
feat/ckfree-fused-rope-fallback

Conversation

@sunway513
Copy link
Owner

@sunway513 sunway513 commented Mar 9, 2026

Summary

  • Wraps fused_qk_norm_rope_cache_quant_shuffle in attention_mha.py:rope_cache() with try-except
  • On CK-free builds where the HIP fused kernel is unavailable, gracefully falls through to the existing non-fused Triton path (rotary_emb + q/k_norm + reshape_and_cache)
  • qkv.clone() backup before fused kernel call, restored on failure (protects against partial in-place writes)
  • Log-once warning via class attribute to avoid log spam on hot path
  • q_norm is None guard on middle path preserves original elif invariant

Related

Test Results

# Test Result
1 Import: from atom.model_ops.attention_mha import PagedAttentionImpl PASS
2 Mock: fused kernel raises → fallback executes (rotary_emb, q_norm, k_norm all called, qkv restored, log-once) PASS
3 E2E: Qwen3-0.6B with fused kernel disabled BLOCKED (separate FMHA CK-free JIT issue)
4 Numerical: fused vs non-fused cosine similarity BLOCKED (same FMHA issue)

Known Limitation

E2E tests are blocked by a separate AITER-side issue: module_fmha_v3_varlen_fwd JIT compilation fails in CK-free builds because the ASM attention kernels still depend on CK-Tile headers (fmha_fwd.hpp). Our rope_cache fallback works correctly through model load + warmup (logs confirm individual module_rope_pos_fwd and module_cache loaded successfully).

Shengnan's team is working on removing the CK header dependency from ASM attention kernels in AITER. Once that lands, the full CK-free E2E path (this PR + FMHA fix) will be unblocked.


logger = logging.getLogger("atom")

from atom.plugin.prepare import is_plugin_mode, is_vllm
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <E402> reported by reviewdog 🐶
Module level import not at top of file

logger = logging.getLogger("atom")

from atom.plugin.prepare import is_plugin_mode, is_vllm
from atom.plugin.attention_mha import PagedAttentionImplDecoratorForPluginMode
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <E402> reported by reviewdog 🐶
Module level import not at top of file

@sunway513 sunway513 force-pushed the feat/ckfree-fused-rope-fallback branch from 86890b8 to c1a866a Compare March 9, 2026 02:49
Wrap fused_qk_norm_rope_cache_quant_shuffle in try-except so that
CK-free builds gracefully fall through to the non-fused Triton path
(rotary_emb + q/k_norm + reshape_and_cache) instead of crashing.

Key safety measures:
- qkv.clone() backup before fused kernel call, restored on failure
  (protects against partial in-place writes before exception)
- log-once warning via class attribute to avoid log spam
- q_norm is None guard on middle path preserves original elif invariant
@sunway513 sunway513 force-pushed the feat/ckfree-fused-rope-fallback branch from c1a866a to 0ca5159 Compare March 9, 2026 03:58
@sunway513
Copy link
Owner Author

@gyohuangxin @ZhiweiYan-96 @valarLip — requesting review on this CK-free fallback for rope_cache().

What this does: When fused_qk_norm_rope_cache_quant_shuffle (HIP fused kernel) is unavailable in CK-free builds, we catch the exception and fall through to the existing non-fused Triton path (individual rotary_emb → q/k_norm → reshape_and_cache). This is a companion fix to upstream PR ROCm#278 (CK-free Docker builds).

Safety measures: qkv.clone() backup before the fused call (restored on failure), log-once warning, and q_norm is None guard to preserve the original elif routing invariant.

Known limitation: Full E2E test is blocked by a separate AITER-side issue — the ASM attention kernels (module_fmha_v3_varlen_fwd) still depend on CK-Tile headers (fmha_fwd.hpp) even in CK-free builds. Shengnan's team is working on removing this dependency. Once that lands, the full CK-free inference path will be unblocked.

Unit tests (import + mock fallback with 5 assertions) pass cleanly.

@sunway513
Copy link
Owner Author

Moved to upstream: ROCm#279

@sunway513 sunway513 closed this Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant