- Measure how different predicate distributions affect warp or wave efficiency.
- How much slowdown comes from divergent control flow when useful work stays otherwise similar?
uniform_truealternatingrandom_p25random_p50random_p75
- Run the same kernel body with controlled branch masks and deterministic predicate generation.
- Keep writes and arithmetic comparable so the dominant change is branch coherence.
- Median GPU time by branch pattern.
- Slowdown relative to the uniform baseline.
- Divergence sensitivity across several predicate mixes.
- Divergence penalties depend on pattern shape, not just branch probability.
- This experiment is strongest when read together with memory-behavior experiments that also vary access coherence.