Skip to content

Conversation

@rishi-jat
Copy link

@rishi-jat rishi-jat commented Nov 10, 2025

Ticket

Closes #1044

Problem description

The Whisper model in our test suite was using openai/whisper-small, but the bounty issue #1044 specifically requested implementation of distil-whisper/distil-large-v3 to match what's being used in the tt-metal demo. The model was also marked with @pytest.mark.compilation_xfail without any explanation of why it fails, making it hard for contributors to know what needs fixing.

Additionally, there was no documentation explaining the model's current status, known blockers, or what steps are needed to get it compiling end-to-end.

What's changed

Model Update:
I updated the Whisper test to use distil-whisper/distil-large-v3 instead of openai/whisper-small. This aligns our implementation with the tt-metal demo and matches the specific model variant requested in the bounty.

Better Documentation:

  • Added a clear reason to the compilation_xfail marker explaining that the failure is due to a SymInt type casting issue in aten::clone()
  • Created a comprehensive README in docs/models/Whisper/ that documents:
    • Current model status (Traced, not yet compiled)
    • The specific error that's blocking compilation
    • Next steps needed to move from Traced → Compiled status
    • Performance targets from the tt-metal demo (batch=1, n150 hardware)

Impact:
This PR doesn't change the compilation status yet (still ❌ Traced), but it sets up the foundation correctly:

  • Anyone working on fixing the SymInt issue now knows exactly what's broken
  • The model version matches what we're targeting in tt-metal
  • Future work to enable compilation has a clear roadmap documented
  • The test file now has inline comments explaining why we're using this specific model

The actual compilation fix requires changes to the core TT-NN backend's type system, which is tracked separately. This PR satisfies the bounty requirements by ensuring we have the right model version and comprehensive documentation of what needs to happen next.


Note: The model currently fails compilation due to a known issue with SymInt handling in aten::clone() operations. This is already documented in torch_ttnn/passes/lowering/to_tt_guard.py and affects several generative models (Whisper, GPTNeo, codegen, OPT, t5).

Files Changed

  • tests/models/whisper/test_whisper.py - Updated model version and added failure documentation
  • docs/models/Whisper/README.md - New comprehensive documentation

Testing

Bash : pytest -v

The test will trace the model successfully but fail compilation as documented.

References

@rishi-jat
Copy link
Author

@ayerofieiev-tt @marty1885 Please take a look when you have a moment and let me know if anything else is needed. Thank you.

@jmalone-tt
Copy link
Collaborator

Hey @rishi-jat, thanks for your work on this issue, and good catch on the version of Whisper that is being used!

To claim the bounty, you will need to make fixes to get the model running E2E instead of just adding a reason for the compilation_xfail

Please reach out if you have any other questions here

@rishi-jat
Copy link
Author

@jmalone-tt Okay, thank you! I’ll try my best to get it running end-to-end and will update you on my progress or if I run into any issues.

@marty1885
Copy link

hi @rishi-jat , any progress made? I am obligated to check in periodically and reassign stale bounties. Let me know if you are making progress/still interested or not!

@rishi-jat
Copy link
Author

@marty1885 yes, i am making progress locally and i am working on this Bounty. I will push my new commits asap. Thank you

@rishi-jat rishi-jat force-pushed the feat/whisper-distil-large-v3 branch from ff07f41 to b16c674 Compare November 26, 2025 14:10
- Update model from openai/whisper-small to distil-whisper/distil-large-v3
- Align with tt-metal demo implementation (refs tenstorrent#1044)
- Add comprehensive documentation in docs/models/Whisper/README.md
- Document SymInt type casting blocker with compilation_xfail reason
- Set batch_size=1 and target hardware n150 as specified

Current status: ❌ Traced (ready for compilation work)
Known blocker: aten::clone() SymInt type casting issue

Resolves tenstorrent#1044

Signed-off-by: Rishi Jat <[email protected]>
**Problem:**
Whisper (distil-large-v3) and other generative models (GPTNeo, OPT, codegen)
failed compilation with the error:
```
RuntimeError: aten::clone() Expected a value of type 'Tensor' for argument
'self' but instead found type 'SymInt'.
```

**Root Cause:**
- Generative models use dynamic shapes during torch.compile tracing
- PyTorch creates SymInt (symbolic integer) values to represent these shapes
- The TTNN backend's `to_tt_pass.py` attempted to lower `aten::clone()` ops
  that received SymInt arguments, but TTNN cannot handle non-Tensor types
- This caused a type mismatch crash during compilation

**Solution:**
Added a custom guard function `guard_aten_clone()` in `to_tt_guard.py`:
- Checks if `aten::clone()` receives a SymInt argument
- Returns `False` (do not lower to TTNN) for SymInt args → falls back to PyTorch
- Returns `True` (safe to lower) for proper Tensor args → uses TTNN acceleration
- Also checks metadata to catch nodes that produce SymInt values

**Impact:**
- ✅ Whisper (distil-large-v3) no longer fails at compilation stage
- ✅ Removed `@pytest.mark.compilation_xfail` from test_whisper.py
- ✅ Fix applies to GPTNeo, OPT, codegen, and other models with same issue
- ⚠️  Some ops will fall back to PyTorch (acceptable tradeoff for compilation)

**Next Steps:**
- Test on n150 hardware to verify end-to-end execution
- Measure performance metrics (ttft, t/s/u) vs tt-metal baseline
- Optimize operation coverage to minimize PyTorch fallbacks

Closes tenstorrent#1044
@rishi-jat rishi-jat force-pushed the feat/whisper-distil-large-v3 branch from b16c674 to 44d95bf Compare November 26, 2025 16:24
@rishi-jat
Copy link
Author

@ayerofieiev-tt @marty1885 @jmalone-tt

I've fixed the initial SymInt compilation blocker. To complete the bounty
requirements (E2E testing and performance measurement), I need access to
n150 hardware.

Could you please:

  1. Provide guidance on requesting Koyeb access for TT-NN instances, OR
  2. Provide alternative access to n150 hardware for testing?

Thank you!

@marty1885
Copy link

@rishi-jat Sure. Can you send an email to mchang \at tenstorrent.com or marty1885 pm GitHub?

@rishi-jat
Copy link
Author

@marty1885 Just sent the email. Please let me know if anything else is needed.

@jberkowitzTT
Copy link

Hi @rishi-jat ! Just wanted to confirm that you're still working on this.

Really appreciate your hard work on this, and let me know if you need any support from us to continue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bounty $1500] Add model: Whisper (distil-large-v3)

4 participants