Skip to content

Logic Flaw: IsAlwaysDuplicable violates may_duplicate_ semantics and ignores index-computation costs #37117

@red1239109-cmd

Description

@red1239109-cmd

InstructionFusion currently allows certain instructions (e.g., kBroadcast, widening kConvert) to be duplicated even when may_duplicate_ == false, via the IsAlwaysDuplicable exception logic.

This implementation weakens the semantic contract of may_duplicate_, introduces hidden policy exceptions, and ignores non-trivial index-computation and register-pressure costs—especially for high-rank broadcasts. While this may have originated as a performance heuristic, it now represents a structural logic flaw that complicates reasoning, maintenance, and future optimization decisions.

Problem 1: Semantic Violation of may_duplicate_

From a reader and maintainer perspective, the meaning of may_duplicate_ is straightforward:
"may_duplicate_ = false" implies that instruction duplication must not occur.

However, the current implementation violates this expectation:

bool IsAlwaysDuplicable(const HloInstruction& instruction) {
  return (instruction.opcode() == HloOpcode::kConvert && widening) ||
         instruction.opcode() == HloOpcode::kBroadcast;
}

This creates a policy exception path where duplication occurs even when explicitly disabled by the caller. The code itself acknowledges the inconsistency:

// TODO(jlebar): Duplicating instructions when we have a variable called "may
// duplicate" that's equal to false is not pretty.

This is not merely cosmetic:

  1. It breaks the semantic contract of the flag.
  2. It forces maintainers to remember undocumented exceptions.
  3. It makes fusion behavior harder to reason about, debug, and evolve.

In practice, may_duplicate_ no longer means "no duplication," but rather "no duplication except for a hardcoded whitelist."

Problem 2: Ignoring Index-Generation and Register Costs

The rationale for IsAlwaysDuplicable appears to be primarily memory-centric:
"Duplicating these ops reduces memory traffic."

This assumption is incomplete. "Memory-cheap" does not imply "compute-cheap."

For kBroadcast in particular, duplication can significantly increase compute cost:

  • High-rank broadcasts require repeated index reconstruction at every consumer.
  • Index computation often involves div/mod or equivalent arithmetic operations per element.
  • These operations increase register pressure, instruction count, and can reduce occupancy.

In fused kernels with large shapes, duplicating a broadcast can easily shift the bottleneck from memory to compute—especially on architectures where integer division/modulo is expensive. Yet, IsAlwaysDuplicable treats kBroadcast as universally safe, regardless of tensor rank, broadcast dimensionality, or backend-specific cost models.

Problem 3: Structural Smell in the Cost Model

Conceptually, this creates a layering issue:

  • may_duplicate_ is supposed to express a global fusion policy.
  • IsAlwaysDuplicable bypasses that policy with opcode-based exceptions.

Cost reasoning is currently split across flag semantics, opcode whitelists, and backend heuristics. This results in a system that is pattern-driven rather than policy-driven, which becomes increasingly brittle as the compiler supports more architectures.

Suggested Direction

This issue is raised for correctness and design clarity. Possible directions include:

  1. Enforce may_duplicate_ strictly: Remove the exception path. If the caller sets it to false, respect it.
  2. Restrict IsAlwaysDuplicable: Limit the exception to strictly scalar broadcasts where index computation is negligible.
  3. Explicit Policy: Replace opcode-based exceptions with an explicit, documented duplication policy or cost model hook.
  4. Cost-Awareness: Make duplication decisions depend on rank/index cost instead of opcode alone.

Why This Matters

This is not about a single missed optimization. It is about keeping compiler reasoning explicit, composable, and honest. Hidden exceptions accumulate technical debt, especially in passes like fusion where behavior is global, performance-sensitive, and highly backend-dependent. Surfacing this inconsistency helps prevent harder-to-debug performance regressions.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions