-
Notifications
You must be signed in to change notification settings - Fork 25
feat(models): add Whisper (distil-large-v3) TTNN integration #1281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(models): add Whisper (distil-large-v3) TTNN integration #1281
Conversation
|
@ayerofieiev-tt @marty1885 Please take a look when you have a moment and let me know if anything else is needed. Thank you. |
|
Hey @rishi-jat, thanks for your work on this issue, and good catch on the version of Whisper that is being used! To claim the bounty, you will need to make fixes to get the model running E2E instead of just adding a reason for the Please reach out if you have any other questions here |
|
@jmalone-tt Okay, thank you! I’ll try my best to get it running end-to-end and will update you on my progress or if I run into any issues. |
|
hi @rishi-jat , any progress made? I am obligated to check in periodically and reassign stale bounties. Let me know if you are making progress/still interested or not! |
|
@marty1885 yes, i am making progress locally and i am working on this Bounty. I will push my new commits asap. Thank you |
ff07f41 to
b16c674
Compare
- Update model from openai/whisper-small to distil-whisper/distil-large-v3 - Align with tt-metal demo implementation (refs tenstorrent#1044) - Add comprehensive documentation in docs/models/Whisper/README.md - Document SymInt type casting blocker with compilation_xfail reason - Set batch_size=1 and target hardware n150 as specified Current status: ❌ Traced (ready for compilation work) Known blocker: aten::clone() SymInt type casting issue Resolves tenstorrent#1044 Signed-off-by: Rishi Jat <[email protected]>
**Problem:** Whisper (distil-large-v3) and other generative models (GPTNeo, OPT, codegen) failed compilation with the error: ``` RuntimeError: aten::clone() Expected a value of type 'Tensor' for argument 'self' but instead found type 'SymInt'. ``` **Root Cause:** - Generative models use dynamic shapes during torch.compile tracing - PyTorch creates SymInt (symbolic integer) values to represent these shapes - The TTNN backend's `to_tt_pass.py` attempted to lower `aten::clone()` ops that received SymInt arguments, but TTNN cannot handle non-Tensor types - This caused a type mismatch crash during compilation **Solution:** Added a custom guard function `guard_aten_clone()` in `to_tt_guard.py`: - Checks if `aten::clone()` receives a SymInt argument - Returns `False` (do not lower to TTNN) for SymInt args → falls back to PyTorch - Returns `True` (safe to lower) for proper Tensor args → uses TTNN acceleration - Also checks metadata to catch nodes that produce SymInt values **Impact:** - ✅ Whisper (distil-large-v3) no longer fails at compilation stage - ✅ Removed `@pytest.mark.compilation_xfail` from test_whisper.py - ✅ Fix applies to GPTNeo, OPT, codegen, and other models with same issue -⚠️ Some ops will fall back to PyTorch (acceptable tradeoff for compilation) **Next Steps:** - Test on n150 hardware to verify end-to-end execution - Measure performance metrics (ttft, t/s/u) vs tt-metal baseline - Optimize operation coverage to minimize PyTorch fallbacks Closes tenstorrent#1044
b16c674 to
44d95bf
Compare
|
@ayerofieiev-tt @marty1885 @jmalone-tt I've fixed the initial SymInt compilation blocker. To complete the bounty Could you please:
Thank you! |
|
@rishi-jat Sure. Can you send an email to |
|
@marty1885 Just sent the email. Please let me know if anything else is needed. |
|
Hi @rishi-jat ! Just wanted to confirm that you're still working on this. Really appreciate your hard work on this, and let me know if you need any support from us to continue! |
Ticket
Closes #1044
Problem description
The Whisper model in our test suite was using
openai/whisper-small, but the bounty issue #1044 specifically requested implementation ofdistil-whisper/distil-large-v3to match what's being used in the tt-metal demo. The model was also marked with@pytest.mark.compilation_xfailwithout any explanation of why it fails, making it hard for contributors to know what needs fixing.Additionally, there was no documentation explaining the model's current status, known blockers, or what steps are needed to get it compiling end-to-end.
What's changed
Model Update:
I updated the Whisper test to use
distil-whisper/distil-large-v3instead ofopenai/whisper-small. This aligns our implementation with the tt-metal demo and matches the specific model variant requested in the bounty.Better Documentation:
compilation_xfailmarker explaining that the failure is due to a SymInt type casting issue inaten::clone()docs/models/Whisper/that documents:Impact:
This PR doesn't change the compilation status yet (still ❌ Traced), but it sets up the foundation correctly:
The actual compilation fix requires changes to the core TT-NN backend's type system, which is tracked separately. This PR satisfies the bounty requirements by ensuring we have the right model version and comprehensive documentation of what needs to happen next.
Note: The model currently fails compilation due to a known issue with SymInt handling in
aten::clone()operations. This is already documented intorch_ttnn/passes/lowering/to_tt_guard.pyand affects several generative models (Whisper, GPTNeo, codegen, OPT, t5).Files Changed
tests/models/whisper/test_whisper.py- Updated model version and added failure documentationdocs/models/Whisper/README.md- New comprehensive documentationTesting
Bash :
pytest -vThe test will trace the model successfully but fail compilation as documented.
References