Releases: tenstorrent/tt-metal
v0.66.0-dev20251230
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20585723919
📦 Uncategorized
- Add missing tensix_neo_reg.h for quasar
- PR: #34547
- Bilinear upsample sharding restriction and optimizations
- PR: #34797
- [skip ci] readability-avoid-unconditional-preprocessor-if
- PR: #35117
- #0: Update clip encoder margin
- PR: #35118
- Add option to disable progress bar in tt-triage
- PR: #35110
- #33539: Add uint32 and uint16 support for rsub
- PR: #33768
- Fix RMSNorm test to generate reference IO on-the-fly
- PR: #34930
- #0: disable llama3 from single card demo tests
- PR: #35121
- Add compute kernel API for stochastic rounding
- PR: #34498
- PDL: Move matmul configs to model_configs.py
- PR: #35122
- Sampling - Update Docs, Create Example
- PR: #34897
- [UNET] Bumping the average kernel samples per second threshold
- PR: #35091
- Modernize use starts ends with
- PR: #35132
- Add vector to nanobind
- PR: #35135
- Enable hang detection on llk tests
- PR: #34903
- 34951: use output mem config for compute_output_specs in generic reduce
- PR: #35147
- Moving more tests to CPU only
- PR: #35136
v0.65.1-rc12
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20585736852
📦 Uncategorized
v0.66.0-dev20251229
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20561689015
📦 Uncategorized
v0.65.1-rc11
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20573786687
📦 Uncategorized
- Remove prefetcher dangling reference from previous test
- PR: #35061
v0.66.0-dev20251228
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20546330059
📦 Uncategorized
v0.65.1-rc10
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20546342097
📦 Uncategorized
- Add prefill sampling support to TTT models
- PR: #35021
v0.66.0-dev20251227
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20531742521
📦 Uncategorized
- chore: update LLK submodule to eab2948
- PR: #35074
- Remove pybind11. Nanobind is stable.
- PR: #35044
- Move PDL to 110 cores BH P150
- PR: #35056
- [UMD Bump] Automated UMD Bump 25.12.2025
- PR: #35068
- [tt-triage] Additional script info
- PR: #35065
- Skipping SGD cpp test with segmentation fault until it gets resolved
- PR: #35084
- Skipping bad pcc test_split.py breaking L2 Nightly until it gets resolved
- PR: #35086
- SDXL refiner and img_to_img seed issue fix
- PR: #34988
- Fixing program selection error in untilize migration to fix test_slice failure
- PR: #34981
- [skip ci] Fix Galaxy Quick hard coded test params
- PR: #35093
v0.66.0-dev20251226
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20512964236
📦 Uncategorized
- Finish moving hard coded params to nanobind
- PR: #35043
- chore: update LLK submodule to 282cea4
- PR: #35046
- Fix conv1d to use width dimension instead of height for 1D convolution
- PR: #34957
- [CONV] Fix null check and memory validation in ConvTranspose2d DRAM path
- PR: #35032
- SDXL CI encoder perf
- PR: #35050
- chore: update LLK submodule to e5d5906
- PR: #35052
- Remove N300 mistral7b test from demo tests
- PR: #35031
- Enable hang detection and calling tt-triage on all CI workflows
- PR: #34950
- Auto slicing in VAE module
- PR: #34323
- Reserving 16 bytes for debug bus atomic reading
- PR: #35029
- chore: update LLK submodule to a34968b
- PR: #35054
- docs: update dprint.h include path in documentation
- PR: #35058
- Revert "Reserving 16 bytes for debug bus atomic reading (#35029)"
- PR: #35063
- Update to overlay register map from
- PR: #32640
- chore: update LLK submodule to c9aeed6
- PR: #35070
v0.65.1-rc9
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20512973248
- no changes
v0.66.0-dev20251225
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20496087066
📦 Uncategorized
- add explicit torch dtype
- PR: #34984
- #34990: [skip ci] Disable hanging llama3 galaxy quick prefill and decode tests while team deals with other fires
- PR: #34991
- Add 3DETR Model to TTNN
- PR: #34659
- Add OpenVLA model to ttnn
- PR: #34985
- #34993: Add initial cpu-only fabric tests job in merge gate
- PR: #34995
- Support non-tile-aligned widths in LayerNorm (interleaved), and use dest registers more effectively
- PR: #34584
- [skip ci] Enhance codeowners-group-analysis workflow
- PR: #34879
- Python Multi-Mesh APIs and Programming Example
- PR: #34870
- Use optimised fp32→bf16 typecast with RNE; fp32_to_[u]int32; uint16_to_uint32.
- PR: #34376
- Fix start_tile_id when work is split over batches in MatmulMultiCoreConfig
- PR: #34875
- chore: update LLK submodule to 38295fe
- PR: #35012
- Reduce code duplication in Panoptic DeepLab demo and E2E test.
- PR: #34792
- TT-Triage Fix dump_running_operations.py to use ElfVariable for watcher_kernel_id
- PR: #35022
- [CONV] Defaulting to fp32 dest accumulation in conv ops if inputs are FP32
- PR: #34952
- [skip ci] ci: allow specifying architecture for L2 tests
- PR: #35027
- [skip ci] ci: update handling of different architectures when starting L2 nightly jobs via LLK uplift workflow
- PR: #35030
- Improve error when attempting to initialize in-use device.
- PR: #34559
- [skip ci] ci: parametrize device perf tests with architecture parameter
- PR: #35034
- Fix I2S corruption issue
- PR: #34942
- readability-make-member-function-const
- PR: #34396
- Add module level device fixture
- PR: #34876
- Fix pytensor ownership
- PR: #35010
- TT-Switch MeshDevice API fix and and Distributed Context barrier
- PR: #33096
- Unify LLK HW Configs
- PR: #34036
- Split DeepSeek V3 tests into separate unit and module test jobs
- PR: #34953
- Fix silu initialization mismatch causing PCC errors in Stable Diffusion
- PR: #35038
- SDXL CI fix: pcc
- PR: #35019
- SDXL CI fix: perf
- PR: #35035
- Pool2D Fix for openpdn_mnist
- PR: #34938