v0.67.0-dev20260211
Pre-release
Pre-release
·
150 commits
to main
since this release
Immutable
release. Only release title and notes can be modified.
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21888158442
📦 Uncategorized
- Fix dead store in pool multi-core program factory
- PR: #37356
- [skip ci] Add missing JIT files to .deb packages
- PR: #37436
- Update bh eth test for new fabric router handshake
- PR: #37429
- [Deepseek Blitz] Reduce-to-all op (python infra)
- PR: #37338
- Add support for more scenarios when reading from pinned memory
- PR: #37016
- Add utility to generate Blitz Decode Scaleout Configs
- PR: #37458
- Initial Integration PR for Dram Prefetcher + Llama 8b on BH QB and LB
- PR: #36176
- fix bad gpt-oss demo outputs
- PR: #37391
- [skip ci] Fix eltwise ops docs
- PR: #36080
- Decouple ControlPlane and it's children from MetalContext
- PR: #37010
- [skip ci] add codeowners for models/common/sampling
- PR: #37374
- Update blitz mcast/gather micro ops
- PR: #37370
- Fix division by zero in pool operation scalar config generation
- PR: #37366
- Relax power-of-2 constraint on SDPA chunk granularity
- PR: #37270
- [skip ci] GPT-OSS skip perf check
- PR: #37467
- [TT-Transformers] Put mesh_partition instead of reduce_scatter for slicing replicated mesh tensor
- PR: #37266
- Calculating arc heartbeats per seconds with more precision
- PR: #37350
- matmul: disable worker cores if receive padding input data only
- PR: #37226
- SFPI 7.24.0 246
- PR: #37379
- #36917: Add uint16 support for fill op
- PR: #36922
- [tt-train] Remove unnecessary .hpp files from ttml METAL_OPS_FILES
- PR: #37281
- Remap to max supported topk instead of assert
- PR: #37290
- changed reshape tensor layout to TILE for deepseek moe_gate
- PR: #37415
- Relax graph capture conv memory targets
- PR: #37389
- [Watcher] Models-unit-tests turned red after recent commit, keeping it green with additional skips
- PR: #37376
- Relocate host DFB files to correct experimental folder
- PR: #37325
- Fix PCC fluctuation in BGE-large-en vLLM generator
- PR: #37418
- Add 4 link ring to deepseek
- PR: #36855
- TT-Train: CMakeLists.txt: TT_METAL_HOME determination fix, env usage removal
- PR: #37431
- Relax ResNet50 BH batch 32 e2e perf threshold (#37554)
- PR: #37555
- Revamp of binary/unary max/min via SFPLOADMACRO.
- PR: #36928
- Use future.get() instead of wait() to allow proper error propagation.
- PR: #37455
- Automatic DRAM Slicing for Pool2D
- PR: #35774
- Fix unit_tests_debug_tools parallel-safety
- PR: #37335
- Migrate all models to make full use of the Module base class
- PR: #37117
- MLA Optimizations
- PR: #37279
- Remove init_fabric
- PR: #37450
- Fix CB size calculation in non-sharded
matmulwith transpose-A and user core grid- PR: #37243
- Haibo sun/issue#31236 Stateful APIs and Trid 2.0 API Tests
- PR: #36492
- Remove EnqueueTerminateCommand and command infrastructure
- PR: #37563
- Fix CMAKE_BINARY_DIR usage in scaleout tools for add_subdirectory consumption
- PR: #37561
- [tt-train] Make training profiler respect TT_METAL_DEVICE_PROFILER
- PR: #37142