-
Notifications
You must be signed in to change notification settings - Fork 750
Description
InstructionFusion currently allows certain instructions (e.g., kBroadcast, widening kConvert) to be duplicated even when may_duplicate_ == false, via the IsAlwaysDuplicable exception logic.
This implementation weakens the semantic contract of may_duplicate_, introduces hidden policy exceptions, and ignores non-trivial index-computation and register-pressure costs—especially for high-rank broadcasts. While this may have originated as a performance heuristic, it now represents a structural logic flaw that complicates reasoning, maintenance, and future optimization decisions.
Problem 1: Semantic Violation of may_duplicate_
From a reader and maintainer perspective, the meaning of may_duplicate_ is straightforward:
"may_duplicate_ = false" implies that instruction duplication must not occur.
However, the current implementation violates this expectation:
bool IsAlwaysDuplicable(const HloInstruction& instruction) {
return (instruction.opcode() == HloOpcode::kConvert && widening) ||
instruction.opcode() == HloOpcode::kBroadcast;
}
This creates a policy exception path where duplication occurs even when explicitly disabled by the caller. The code itself acknowledges the inconsistency:
// TODO(jlebar): Duplicating instructions when we have a variable called "may
// duplicate" that's equal to false is not pretty.
This is not merely cosmetic:
- It breaks the semantic contract of the flag.
- It forces maintainers to remember undocumented exceptions.
- It makes fusion behavior harder to reason about, debug, and evolve.
In practice, may_duplicate_ no longer means "no duplication," but rather "no duplication except for a hardcoded whitelist."
Problem 2: Ignoring Index-Generation and Register Costs
The rationale for IsAlwaysDuplicable appears to be primarily memory-centric:
"Duplicating these ops reduces memory traffic."
This assumption is incomplete. "Memory-cheap" does not imply "compute-cheap."
For kBroadcast in particular, duplication can significantly increase compute cost:
- High-rank broadcasts require repeated index reconstruction at every consumer.
- Index computation often involves div/mod or equivalent arithmetic operations per element.
- These operations increase register pressure, instruction count, and can reduce occupancy.
In fused kernels with large shapes, duplicating a broadcast can easily shift the bottleneck from memory to compute—especially on architectures where integer division/modulo is expensive. Yet, IsAlwaysDuplicable treats kBroadcast as universally safe, regardless of tensor rank, broadcast dimensionality, or backend-specific cost models.
Problem 3: Structural Smell in the Cost Model
Conceptually, this creates a layering issue:
- may_duplicate_ is supposed to express a global fusion policy.
- IsAlwaysDuplicable bypasses that policy with opcode-based exceptions.
Cost reasoning is currently split across flag semantics, opcode whitelists, and backend heuristics. This results in a system that is pattern-driven rather than policy-driven, which becomes increasingly brittle as the compiler supports more architectures.
Suggested Direction
This issue is raised for correctness and design clarity. Possible directions include:
- Enforce may_duplicate_ strictly: Remove the exception path. If the caller sets it to false, respect it.
- Restrict IsAlwaysDuplicable: Limit the exception to strictly scalar broadcasts where index computation is negligible.
- Explicit Policy: Replace opcode-based exceptions with an explicit, documented duplication policy or cost model hook.
- Cost-Awareness: Make duplication decisions depend on rank/index cost instead of opcode alone.
Why This Matters
This is not about a single missed optimization. It is about keeping compiler reasoning explicit, composable, and honest. Hidden exceptions accumulate technical debt, especially in passes like fusion where behavior is global, performance-sensitive, and highly backend-dependent. Surfacing this inconsistency helps prevent harder-to-debug performance regressions.