SDXL clip encoder perf regression

## Summary
After the commit [Respect MM throttle level in minimal_matmul](https://github.com/tenstorrent/tt-metal/commit/bc6d2860911b65ab3a96a38f4dc4dc3ec981b57b), the throttle is applied to minimal_matmul operation, which affects all Linear layers in the CLIP encoder since they run on a full 8×8 (64 cores) grid (above the 48-core throttle threshold for WH) and sdxl image gen is impacted `~8ms`

## Minimal matmul on 40 cores vs 64 cores experiment results
```
pytest models/experimental/stable_diffusion_xl_base/tests/test_sdxl_perf.py::test_sdxl_perf_device[test_sdxl_clip_encoder_1] models/experimental/stable_diffusion_xl_base/tests/test_sdxl_perf.py::test_sdxl_perf_device[test_sdxl_clip_encoder_2]
```

**64 cores + throttle:**
- encoder_1: AVG DEVICE KERNEL DURATION [ns] 14663277.0 is outside of expected range (12915873.57, 13309250.429999998)
- encoder_2: AVG DEVICE KERNEL DURATION [ns] 70378029.0 is outside of expected range (62319927.74, 64863598.26)

**40 cores (no throttle ofc):**
- encoder_1: AVG DEVICE KERNEL DURATION [ns] 11939170.0 is outside of expected range (12915873.57, 13309250.429999998)
- encoder_1: AVG DEVICE KERNEL DURATION [ns] 58080141.0 is outside of expected range (62319927.74, 64863598.26)

**64 cores no throttle**
- encoder_1: "AVG DEVICE KERNEL DURATION [ns]": 13163609.0
- encoder_2: "AVG DEVICE KERNEL DURATION [ns]": 63109927.0

## Potential solutions
- Use a smaller core grid (≤48 cores) for CLIP encoder matmuls to stay below the throttle threshold. Initial testing with 8×5 (40 cores) shows it actually outperforms the 64-core setup

- Pass explicit throttle_level=0 via compute_kernel_config for CLIP encoder operations, since we haven't observed hangs without throttle on these shapes


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDXL clip encoder perf regression #37840

Summary

Minimal matmul on 40 cores vs 64 cores experiment results

Potential solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SDXL clip encoder perf regression #37840

Description

Summary

Minimal matmul on 40 cores vs 64 cores experiment results

Potential solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions