Logic Flaw: IsAlwaysDuplicable violates may_duplicate_ semantics and ignores index-computation costs

InstructionFusion currently allows certain instructions (e.g., kBroadcast, widening kConvert) to be duplicated even when may_duplicate_ == false, via the IsAlwaysDuplicable exception logic.

This implementation weakens the semantic contract of may_duplicate_, introduces hidden policy exceptions, and ignores non-trivial index-computation and register-pressure costs—especially for high-rank broadcasts. While this may have originated as a performance heuristic, it now represents a structural logic flaw that complicates reasoning, maintenance, and future optimization decisions.

**Problem 1: Semantic Violation of may_duplicate_**

From a reader and maintainer perspective, the meaning of may_duplicate_ is straightforward:
"may_duplicate_ = false" implies that instruction duplication must not occur.

However, the current implementation violates this expectation:

```cpp
bool IsAlwaysDuplicable(const HloInstruction& instruction) {
  return (instruction.opcode() == HloOpcode::kConvert && widening) ||
         instruction.opcode() == HloOpcode::kBroadcast;
}

```

This creates a policy exception path where duplication occurs even when explicitly disabled by the caller. The code itself acknowledges the inconsistency:

```cpp
// TODO(jlebar): Duplicating instructions when we have a variable called "may
// duplicate" that's equal to false is not pretty.

```

This is not merely cosmetic:

1. It breaks the semantic contract of the flag.
2. It forces maintainers to remember undocumented exceptions.
3. It makes fusion behavior harder to reason about, debug, and evolve.

In practice, may_duplicate_ no longer means "no duplication," but rather "no duplication except for a hardcoded whitelist."

**Problem 2: Ignoring Index-Generation and Register Costs**

The rationale for IsAlwaysDuplicable appears to be primarily memory-centric:
"Duplicating these ops reduces memory traffic."

This assumption is incomplete. "Memory-cheap" does not imply "compute-cheap."

For kBroadcast in particular, duplication can significantly increase compute cost:

* High-rank broadcasts require repeated index reconstruction at every consumer.
* Index computation often involves div/mod or equivalent arithmetic operations per element.
* These operations increase register pressure, instruction count, and can reduce occupancy.

In fused kernels with large shapes, duplicating a broadcast can easily shift the bottleneck from memory to compute—especially on architectures where integer division/modulo is expensive. Yet, IsAlwaysDuplicable treats kBroadcast as universally safe, regardless of tensor rank, broadcast dimensionality, or backend-specific cost models.

**Problem 3: Structural Smell in the Cost Model**

Conceptually, this creates a layering issue:

* may_duplicate_ is supposed to express a global fusion policy.
* IsAlwaysDuplicable bypasses that policy with opcode-based exceptions.

Cost reasoning is currently split across flag semantics, opcode whitelists, and backend heuristics. This results in a system that is pattern-driven rather than policy-driven, which becomes increasingly brittle as the compiler supports more architectures.

**Suggested Direction**

This issue is raised for correctness and design clarity. Possible directions include:

1. Enforce may_duplicate_ strictly: Remove the exception path. If the caller sets it to false, respect it.
2. Restrict IsAlwaysDuplicable: Limit the exception to strictly scalar broadcasts where index computation is negligible.
3. Explicit Policy: Replace opcode-based exceptions with an explicit, documented duplication policy or cost model hook.
4. Cost-Awareness: Make duplication decisions depend on rank/index cost instead of opcode alone.

**Why This Matters**

This is not about a single missed optimization. It is about keeping compiler reasoning explicit, composable, and honest. Hidden exceptions accumulate technical debt, especially in passes like fusion where behavior is global, performance-sensitive, and highly backend-dependent. Surfacing this inconsistency helps prevent harder-to-debug performance regressions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logic Flaw: IsAlwaysDuplicable violates may_duplicate_ semantics and ignores index-computation costs #37117

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Logic Flaw: IsAlwaysDuplicable violates may_duplicate_ semantics and ignores index-computation costs #37117

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions