Releases: tenstorrent/tt-metal
v0.66.0-rc2
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21407568736
📦 Uncategorized
- update pandr
- PR: #36384
v0.65.1-rc17
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21304257849
- no changes
v0.66.0-rc1
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21264445638
📦 Uncategorized
- Fix reading into pinned memory on tunneled devices
- PR: #35810
- SFPI 7.16.0 168
- PR: #35849
- [skip ci] make workflow yaml as template for analyzing ND failures workflow
- PR: #35860
- [skip_ci] Add CODEOWNERS entry for llk_api/llk_sfpu
- PR: #35695
- [UMD Bump] Automated UMD Bump 08.01.2026
- PR: #35440
- [TT Transformers] DRAM Prefetcher Bring up on BH with Ring MM Unit test
- PR: #35709
- [skip ci] Remove docker-job subdirectory workaround (phase 1)
- PR: #35867
- [skip ci] Move install_uv and update create_venv
- PR: #35862
- Rename ops params and input structs
- PR: #35857
- fix(ttml): resolve nanobind duplicate type registration errors
- PR: #35423
- [skip ci] Add download-artifact-with-retry action to fix corrupted .deb downloads
- PR: #35853
- Add Qwen Image CI tests
- PR: #35800
- Adding pi0 model to TTNN
- PR: #35833
- Add support for other dtypes and L1 for multicore pad OP
- PR: #35869
- Revert changes to
create_arange_vector_of_bfloat16- PR: #35715
- Add support of choosing position_ids in testing MLA
- PR: #35789
- #35313 fix sdpa with attn sinks
- PR: #35817
- [UPSAMPLE] Add floating point scale factor support to TTNN upsample
- PR: #35508
- Move uv to base stage
- PR: #35896
- [TT-Transformers] Enable fused rotary and paged cache update ops in attention module
- PR: #35111
- Fix Wan postprocess spatial output
- PR: #35871
- Check for disallowed params combination in chunked SDPA
- PR: #35811
- Use distributed LN in TT-DiT models
- PR: #35831
- Add support for paged KV cache and chunked prefill to ring distributed sdpa
- PR: #35742
- migrate to HF cross attention vision transformer of mLlama
- PR: #35750
- Add checks for cgroup memory since Docker uses namespaces to limit things
- PR: #35450
- Allow docs deployment to be from main
- PR: #35910
- #0: Add actual device perf check in ops post commit
- PR: #35473
- Add user configurable max packet size to fabric
- PR: #35848
- L2 nightly test failure with ttnn.where()
- PR: #35879
- [Bug fix] Altering ALU config from TRISC0
- PR: #35090
- CCL Program Cache Updates
- PR: #35400
- Data Movement Program Cache Fixes
- PR: #35429
- [Fabric] Pkt hdr updates - support for upto 4X64 mesh
- PR: #35494
- Z Router device changes
- PR: #34561
- [skip ci] Delete ttnn/api/ttnn/Untitled
- PR: #35951
- Enable MeshWorkload in ttnn.generic op
- PR: #35323
- Fix Quasar FW compilation
- PR: #35926
- Allow Logical to Physical Pinnings in MGD
- PR: #34996
- Fix BH LB shapes for prefetcher unit test
- PR: #35909
- Fix attention dense out matmul on BH LB
- PR: #35875
- Issue #34453 : send_next_data no longer requires remote_receiver_buffer_index
- PR: #35460
- Make noc_mode global in brisc.cc
- PR: #35781
- Created matmul lab 1 for universities
- PR: #35456
- Remove cmake version check from CI
- PR: #35965
- Update trace region size for new reduce scatter binary requirements
- PR: #35935
- Topology Solver: Index Data Structures
- PR: #34386
- Reduce compile time by 44%
- PR: #35943
- [UMD Bump] Automated UMD Bump 15.01.2026
- PR: #35881
- Add initial prototype of fusion and new matmul for Deepseek Blitz
- PR: #34088
- Fix reflection operator<< ordering for unity builds
- PR: #35874
- Upgrade default toolchain from Clang 17 to Clang 20
- PR: #35887
- Remove VC0 from template parameter name
- PR: #35959
- Add unit test for LazyWeight
- PR: #35569
- fix qwen image vae device config
- PR: #35947
- Matmul - Add Initial Deepseek MLA Pytests
- PR: #35961
- #35637: Correct and move
tt-metalandtt-nntutorials from custom to main repository- PR: #35723
- Deprecate public API variants of MeshDevice::[get_device|is_local] and replace them with internal calls where possible
- PR: #32823
- [tt-train] Maximal core grid in tt-train matmuls
- PR: #35519
- Suggesting uv pip to install requirements if possible
- PR: #35984
- [Conv_transpose2d] Added nightly ulp test
- PR: #35981
- DeepSeek V3: Add progress bars to model state creation
- PR: #35894
- Adding operation and runtime info logs to use with triage
- PR: #33666
- fix sub-optimal digamma decomposition
- PR: #35940
- Update triton to existing distributed whl version
- PR: #35882
- [skip ci] Set watcher to 120s
- PR: #35989
- [skip ci] Disable QBGE in CI to unblock pipelines
- PR: #35995
- Modify Quasar DM cores FW build to only build a single FW binary that is shared amongst all the DM cores
- PR: #35496
- Change the padding from optional to float
- PR: #35785
- Refactor create_venv and install_uv
- PR: #35933
- tighten t3k unit test timeouts
- PR: #35982
- [skip ci] #0: Install uv into basic dev image because now it's being used for Python jobs (tutorials nightly L2)
- PR: #35908
- Small fixes for 2xBH QBAE config files
- PR: #36014
- [skip ci] Move BH-LB from pipeline-perf label to pipeline-functional
- PR: #35934
- Updated ownership of new kernel files under ttnn
- PR: #35927
- reduce t3k demo timeouts
- PR: #36007
- update galaxy test timeouts
- PR: #36010
- Patch pipelines using pip
- PR: #36044
- Int8 Matmul test
- PR: #35570
- patch clang tidy setup
- PR: #36050
- [skip ci] Update stable_diffusion README
- PR: #35939
- [skip ci] Create TTML onboarding guide
- PR: #36021
- [skip ci] Updated URLs to point to https://raw.githubusercontent.com
- PR: #36018
- Mock device
- PR: #34194
- Add ethernet bw utilization metric from npe to tracy performance report
- PR: #35967
- TTML python module: add ModuleBase interface abstract base class for composing TTML Modules in python
- PR: #34122
- #35532: Deepseek blitz RoPE
- PR: #35870
- Refactor tensor operations
- PR: #35798
- Fix T3K unit tests due to changed Attention class arguments + Disable qk fused ops for VL models
- PR: #35994
- Bump ttsim version to v1.3.0
- PR: #36047
- Put all device operations into ttnn::prim
- PR: #35888
- Fix assertion in UDM gtests
- PR: #36057
- SDXL Enabling TP=2 on Galaxy
- PR: #35988
- PDL 20 core: Update PCC values and perf due to #35494
- PR: #35992
- [skip ci] Remove quick links from README
- PR: #36062
- Topology Solver: Search Heuristics and Consistency Checking
- PR: #34387
- Get our Lead Models (Deepseekv3) running on it's own scheduled sweep run
- PR: #35793
- update uv to latest release
- PR: #36016
- Fix ring on dim 1 in mesh graph
- PR: #36059
- [skip ci] Improve ttml onboarding guide
- PR: #36067
- Updated QSR FW files to the newest version
- PR: #35955
- [skip ci] Fix previous mistake in code-analysis.yaml
- PR: #36072
- [skip ci] Delete cloc.sh
- PR: #36071
- Bump tt-exalens version
- PR: #35834
- [skip ci] allow venv override for tt_bisect test
- PR: #36081
- ttnn::typecast row-major support
- PR: #34596
- test_mla_2d & test_decoder_block for deepseek v3 now compatible up to 128k tokens
- PR: #35948
- [UMD Bump] Automated UMD Bump 19.01.2026
- PR: #36077
- Add host-side typecast
- PR: #34494
- #34137: add check for out_subblock_h dividing m in matmul
- PR: #36060
- #35078: add in1 batch size check for mm with fused batch
- PR: #36028
- Fix argument order in recordNocEventWithID
- PR: #36051
- Added utility functions to insert NOPs for compute debugging
- PR: #35421
- PDL: Replace old SHA-pinned setup-job action in CI
- PR: #36083
- Programming Multiple Meshes tech report
- PR: #35272
- Update mcast core to be on bottom row for deepseek
- PR: #36042
- [skip ci] Fix Clang Static Analyzer workflow
- PR: #36073
- Add BH QB GE llama 8b to vllm nightly
- PR: #35877
- #35997: Add matmul shapes for Deepseek Blitz
- PR: #36022
- Fix t3k deepseek unit test regression
- PR: #35745
- Manual UMD bump 19.01.2026
- PR: #36110
- #35573: Add support for WH/batching to deepseek_moe_gate
- PR: #36104
- Multi-Host Upstream Testing for Exabox
- PR: #35851
- [build-system] Fix install_dependencies.sh to install clang-20
- PR: #36099
- Fix dead store in fold.cpp
- PR: #36053
- [skip ci] Add model traced json part 1
- PR: #36128
- [skip_ci] Add Scaleout Section to Top Level README
- PR: #36125
- Implement simplified compute kernel syntax
- PR: #35282
- Update Panoptic DeepLab 20 core unit tests to match conv and upsample shapes in actual model
- PR: #36084
- Allow factory system descriptor to initialize from multiple cable descriptors
- PR: #34225
- Add def_rw properties to c++ args in nanobind
- PR: #36061
- [skip ci] Lowered expected device perf for n300 functional_unet
- PR: #36135
- Fix UDM tests
- PR: #36105
- chore: update LLK submodule to 4505a59
- PR: #36108
- Haibo sun/issue#34302
- PR: #35667
- Haibo sun/issue#31236 Added 2.0 API for all_from_all and all_to_all DM Tests
- PR: #35946
- Fix Mochi VAE test
- PR: #36093
- Fix addcdiv
- PR: #35748
- Add Quasar tile counter definitions
- PR: #36024
- Support batch broadcasting for mask in SDPA
- PR: #36119
- Fix and modernize interleaved stick tests
- PR: #36017
- Add blackhole galaxy tests to galaxy nightly
- PR: #36009
- Fix dead store in RingbufferCache randomized test
- PR: #36048
- Disable stress noc mcast by default
- PR: #36033
- Repeat pgm dispatch benchmarks to reduce flakiness
- PR: #35499
- Copy mesh descriptor to TTNN report
- PR: #36102
- Improving Topology Mapper messaging
- PR:...
v0.66.0-dev20260122
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21230965235
📦 Uncategorized
- Revert "added check and recalculation for block sharded tensors before interleaved to sharded is invoked in reshape op"
- PR: #36200
- Enhancing hang detection
- PR: #35102
- Fix unit_tests_api
- PR: #36190
- Remove Boost library dependencies from Docker image
- PR: #36183
- Add tests for traced UF models (qwen 3 32B on LB and whisper on BH)
- PR: #36147
- Capture NoC Debug State on the Host
- PR: #35775
- [skip ci] Consolidate PATs in codeowners-group-analysis workflow
- PR: #35855
- Fixing PCC failure in PR #35888
- PR: #36207
- [skip ci] pr-gate to use smaller test docker image
- PR: #36178
- Fix dead store in embeddings_rm_program_factory.cpp
- PR: #36052
- TTTv2 RMSNorm modules
- PR: #35975
- Fix undefined behavior in tt_fabric_test_common.hpp when TT_MESH_ID unset
- PR: #36130
- Update eltwise kernels to sync with updated format
- PR: #36210
- [tt-train] Remove TT_METAL_HOME environment variable dependency
- PR: #35390
- Remove unused legacy files from tests/tt_eager
- PR: #36213
- [ROTATE] Implementation of the rotate nearest op
- PR: #34326
- Minimal matmul with split outputs
- PR: #35907
- Adjust Qwen2.5 perf/TTFT targets and demo timeout
- PR: #36079
- [WATCHER] Skip profiler and triage tests with watcher
- PR: #36139
- Reduce hang detection timeout
- PR: #36228
- [skip ci] Update tt-installer arguments in INSTALLING.md
- PR: #36192
- Adding Qwen3-Embedding Model support
- PR: #35941
- DeepSeek DRAM streaming Matmul
- PR: #36164
- TTML module: import ttnn in python code instead of C++ code
- PR: #36162
- widen wh glx fabric perf margins
- PR: #36256
- [Galaxy Llama demo] Fix ttnn.add call with passing memory_config for mixed input mem_configs
- PR: #36235
- Get maximum number of available eth links for CCLs in TTT
- PR: #35397
- Revert "Haibo sun/issue#34302 (#35667)"
- PR: #36244
- [ROTATE] Skipping the function testing sharded memory accesses for bh due to bh post commit failing
- PR: #36259
- [skip ci] Add Cursor command for workflow analysis and triggering
- PR: #36266
- #36161 Removed incorrect assignment
- PR: #36177
- [tt-train] Add custom Python operations support for TTML autograd
- PR: #36076
- #36090: add cb front calls for cb_max in large softmax
- PR: #36249
- Faster intermediate accuracy exponential functions for SDPA
- PR: #35618
v0.66.0-dev20260121
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21192383271
📦 Uncategorized
- Multi-Host Upstream Testing for Exabox
- PR: #35851
- [build-system] Fix install_dependencies.sh to install clang-20
- PR: #36099
- Fix dead store in fold.cpp
- PR: #36053
- [skip ci] Add model traced json part 1
- PR: #36128
- [skip_ci] Add Scaleout Section to Top Level README
- PR: #36125
- Implement simplified compute kernel syntax
- PR: #35282
- Update Panoptic DeepLab 20 core unit tests to match conv and upsample shapes in actual model
- PR: #36084
- Allow factory system descriptor to initialize from multiple cable descriptors
- PR: #34225
- Add def_rw properties to c++ args in nanobind
- PR: #36061
- [skip ci] Lowered expected device perf for n300 functional_unet
- PR: #36135
- Fix UDM tests
- PR: #36105
- chore: update LLK submodule to 4505a59
- PR: #36108
- Haibo sun/issue#34302
- PR: #35667
- Haibo sun/issue#31236 Added 2.0 API for all_from_all and all_to_all DM Tests
- PR: #35946
- Fix Mochi VAE test
- PR: #36093
- Fix addcdiv
- PR: #35748
- Add Quasar tile counter definitions
- PR: #36024
- Support batch broadcasting for mask in SDPA
- PR: #36119
- Fix and modernize interleaved stick tests
- PR: #36017
- Add blackhole galaxy tests to galaxy nightly
- PR: #36009
- Fix dead store in RingbufferCache randomized test
- PR: #36048
- Disable stress noc mcast by default
- PR: #36033
- Repeat pgm dispatch benchmarks to reduce flakiness
- PR: #35499
- Copy mesh descriptor to TTNN report
- PR: #36102
- Improving Topology Mapper messaging
- PR: #36115
- Remove tt-telemetry remnants from tt-metal
- PR: #36166
- Fix pad_value in transpose test
- PR: #36153
- Remove batched matmul from prefetcher unit tests
- PR: #36112
- added check and recalculation for block sharded tensors before interleaved to sharded is invoked in reshape op
- PR: #36015
- Z Link Bugfixes
- PR: #36127
- Fix Multi-Host Deployment (Physical) pipeline
- PR: #36117
- Fix Quasar JIT build so that kernels can compile on all DM cores
- PR: #36111
- Add add_prefetch_relay_linear_h to DeviceCommand
- PR: #35949
- SFPI 7.17.0 182
- PR: #36171
- Add multi-mesh tests to Galaxy CI
- PR: #36005
v0.66.0-dev20260120
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21155184783
📦 Uncategorized
- Bump tt-exalens version
- PR: #35834
- [skip ci] allow venv override for tt_bisect test
- PR: #36081
- ttnn::typecast row-major support
- PR: #34596
- test_mla_2d & test_decoder_block for deepseek v3 now compatible up to 128k tokens
- PR: #35948
- [UMD Bump] Automated UMD Bump 19.01.2026
- PR: #36077
- Add host-side typecast
- PR: #34494
- #34137: add check for out_subblock_h dividing m in matmul
- PR: #36060
- #35078: add in1 batch size check for mm with fused batch
- PR: #36028
- Fix argument order in recordNocEventWithID
- PR: #36051
- Added utility functions to insert NOPs for compute debugging
- PR: #35421
- PDL: Replace old SHA-pinned setup-job action in CI
- PR: #36083
- Programming Multiple Meshes tech report
- PR: #35272
- Update mcast core to be on bottom row for deepseek
- PR: #36042
- [skip ci] Fix Clang Static Analyzer workflow
- PR: #36073
- Add BH QB GE llama 8b to vllm nightly
- PR: #35877
- #35997: Add matmul shapes for Deepseek Blitz
- PR: #36022
- Fix t3k deepseek unit test regression
- PR: #35745
- Manual UMD bump 19.01.2026
- PR: #36110
- #35573: Add support for WH/batching to deepseek_moe_gate
- PR: #36104
v0.66.0-dev20260119
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21121257297
📦 Uncategorized
v0.66.0-dev20260118
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21103172965
📦 Uncategorized
- [skip ci] Updated URLs to point to https://raw.githubusercontent.com
- PR: #36018
- Mock device
- PR: #34194
- Add ethernet bw utilization metric from npe to tracy performance report
- PR: #35967
- TTML python module: add ModuleBase interface abstract base class for composing TTML Modules in python
- PR: #34122
- #35532: Deepseek blitz RoPE
- PR: #35870
- Refactor tensor operations
- PR: #35798
- Fix T3K unit tests due to changed Attention class arguments + Disable qk fused ops for VL models
- PR: #35994
- Bump ttsim version to v1.3.0
- PR: #36047
- Put all device operations into ttnn::prim
- PR: #35888
- Fix assertion in UDM gtests
- PR: #36057
- SDXL Enabling TP=2 on Galaxy
- PR: #35988
- PDL 20 core: Update PCC values and perf due to #35494
- PR: #35992
- [skip ci] Remove quick links from README
- PR: #36062
- Topology Solver: Search Heuristics and Consistency Checking
- PR: #34387
- Get our Lead Models (Deepseekv3) running on it's own scheduled sweep run
- PR: #35793
- update uv to latest release
- PR: #36016
v0.66.0-dev20260117
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21085001852
📦 Uncategorized
- Fix BH LB shapes for prefetcher unit test
- PR: #35909
- Fix attention dense out matmul on BH LB
- PR: #35875
- Issue #34453 : send_next_data no longer requires remote_receiver_buffer_index
- PR: #35460
- Make noc_mode global in brisc.cc
- PR: #35781
- Created matmul lab 1 for universities
- PR: #35456
- Remove cmake version check from CI
- PR: #35965
- Update trace region size for new reduce scatter binary requirements
- PR: #35935
- Topology Solver: Index Data Structures
- PR: #34386
- Reduce compile time by 44%
- PR: #35943
- [UMD Bump] Automated UMD Bump 15.01.2026
- PR: #35881
- Add initial prototype of fusion and new matmul for Deepseek Blitz
- PR: #34088
- Fix reflection operator<< ordering for unity builds
- PR: #35874
- Upgrade default toolchain from Clang 17 to Clang 20
- PR: #35887
- Remove VC0 from template parameter name
- PR: #35959
- Add unit test for LazyWeight
- PR: #35569
- fix qwen image vae device config
- PR: #35947
- Matmul - Add Initial Deepseek MLA Pytests
- PR: #35961
- #35637: Correct and move
tt-metalandtt-nntutorials from custom to main repository- PR: #35723
- Deprecate public API variants of MeshDevice::[get_device|is_local] and replace them with internal calls where possible
- PR: #32823
- [tt-train] Maximal core grid in tt-train matmuls
- PR: #35519
- Suggesting uv pip to install requirements if possible
- PR: #35984
- [Conv_transpose2d] Added nightly ulp test
- PR: #35981
- DeepSeek V3: Add progress bars to model state creation
- PR: #35894
- Adding operation and runtime info logs to use with triage
- PR: #33666
- fix sub-optimal digamma decomposition
- PR: #35940
- Update triton to existing distributed whl version
- PR: #35882
- [skip ci] Set watcher to 120s
- PR: #35989
- [skip ci] Disable QBGE in CI to unblock pipelines
- PR: #35995
- Modify Quasar DM cores FW build to only build a single FW binary that is shared amongst all the DM cores
- PR: #35496
- Change the padding from optional to float
- PR: #35785
- Refactor create_venv and install_uv
- PR: #35933
- tighten t3k unit test timeouts
- PR: #35982
- [skip ci] #0: Install uv into basic dev image because now it's being used for Python jobs (tutorials nightly L2)
- PR: #35908
- Small fixes for 2xBH QBAE config files
- PR: #36014
- [skip ci] Move BH-LB from pipeline-perf label to pipeline-functional
- PR: #35934
- Updated ownership of new kernel files under ttnn
- PR: #35927
- reduce t3k demo timeouts
- PR: #36007
- update galaxy test timeouts
- PR: #36010
- Patch pipelines using pip
- PR: #36044
- Int8 Matmul test
- PR: #35570
- patch clang tidy setup
- PR: #36050
- [skip ci] Update stable_diffusion README
- PR: #35939
- [skip ci] Create TTML onboarding guide
- PR: #36021
v0.65.1-rc16
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21085016832
📦 Uncategorized
- Seed fixes
- PR: #35906