Release v0.67.0-dev20260214 · tenstorrent/tt-metal

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22007703479

📦 Uncategorized

[Refactor] Math Fidelity enum
- PR: #35818
Restore max torch threads for multi-host runs
- PR: #37673
[skip ci] #0: update matmul test timeout for tt-sim
- PR: #37675
Add scatter to sdpa_reduce_to_all OP
- PR: #37590
Update SDPA to use the new fast approximate exponential function
- PR: #36795
Mark init functions as deprecated
- PR: #37454
Fix LM head norm config for qwen3vl
- PR: #37646
Exclude graph_argument_serializer.cpp from unity builds to reduce build time
- PR: #37326
Extend multi-user isolation tests with ASIC/TP2 scenarios
- PR: #36547
Adding Qwen3-VL vllm support
- PR: #37139
Implement models.experimental.ops.composite.launch()
- PR: #36732
Add precompiled headers to tt-train for faster compilation
- PR: #37694
Pipeline reorg cleanups
- PR: #37677
Simplified the way to validate operations
- PR: #37633
Move CRTAs to Kernel groups
- PR: #37592
Fix moreh failure
- PR: #36835
Adding uneven output shard support to untilize
- PR: #37343
PDL: PCC drop on instance embedding fix
- PR: #37372
[WATCHER] Increasing timeout for bh post commit with watcher enabled
- PR: #37653
Add Custom Init for Packing Contiguous Block from DEST
- PR: #37620
round up mem config shapes
- PR: #37693
Fix minor typos in unary max/min comments.
- PR: #37636
Move prefetcher pytest option to avoid breaking CI tests
- PR: #37613
[Gemma3] Fix for gemma3 failing unit tests
- PR: #37644
[GPT-OSS] Add fused op unit tests for MoE
- PR: #35660
Disable stable_diffusion model perf test on blackhole (#37617)
- PR: #37619
Add program configs for Matmul ops in Embedding block to run across 40 cores in the SDXL Refiner
- PR: #37264
[tt-train] Add training log comparison plotting script
- PR: #37531
[skip ci] Enable watcher apc nightly debug
- PR: #37624
Adding test harness to check cache on device compatibility for Deepseek 671B
- PR: #37649
[Watcher] tt-train-cpp-unit tests have new watcher enabled fails due to recent changes
- PR: #37388
chore: update LLK submodule to 346a830
- PR: #37709
removes meta lib dependencies
- PR: #37046
[WATCHER] Following issues are detected when watcher is enabled on BH post commit
- PR: #37744
[skip ci] Add P300-viommu to BHPC multi card fast tests
- PR: #37758
SGLang generator
- PR: #35980
[tt-train] Complete nanoGPT Python impl
- PR: #36688
Add new CI pipeline for Deepseek to test long seq lens and refactor tests
- PR: #36690
Topology Mapper Integration with Topology Solver API
- PR: #35778
Make TP All reduce optional in Post SDPA
- PR: #37700
Fix misleading comment in dataflow_api for multicasts
- PR: #37760
[skip ci] Update llama demo upstream test id's
- PR: #37658
Enable multi-host neighbor-pad and RingAttentionAllGather CCLs
- PR: #37114
LLK API support for 8x32 tilize
- PR: #37481
Upgrade Pillow -> 12.1.1 to fix CVE-2026-25990
- PR: #37691
Fix moreh kernel runtime arg bounds issues (#37193, #37040)
- PR: #37400
Convert Sparse Multicast Static Asserts to Runtime Asserts
- PR: #37581
Do not use internal bh name in builtins
- PR: #37759
Quasar compute API bringup V1.0
- PR: #35206
[Deepseek Blitz] Split q a proj mm on inner dim
- PR: #37687
Reduce to one generic op and fusing it with moe routed expert
- PR: #37411
[TTTv2] Add attention_1d module with comprehensive unit tests
- PR: #36792
Matmul - Add Support for 2D DRAM interleaved in0 + batched height sharded in1
- PR: #37681
Changes for quad module tests CI
- PR: #37601
Subtract grid offset when computing 0-based indices in sharded LN factory
- PR: #37768
Decouple Cluster initialization from HAL
- PR: #37695
Switch llama 8b to DP=4 in vllm nightly
- PR: #37786
A balanced traffic pattern for AG minimal.
- PR: #36607
[skip ci] Remove t3k select pipeline extra-tag inputs
- PR: #37801
#36982: create_q_heads tilizes to 8x32 tiles
- PR: #37574
Enable (very) basic compute kernels
- PR: #37328
Migrate conv operations to free function style
- PR: #36382
Migrate fast dispatch frequent tests to CIv2 runners
- PR: #37803
reduction: migrate to free function binding + generic cleanup
- PR: #37584
Use gh_run_number for Superset dashboard links in Slack notifications
- PR: #37793
Fix race condition in parallel multi-source jit build
- PR: #37805
chore: update LLK submodule to f7cf929
- PR: #37798
Move SDPA and MLA tests from tt_eager/misc to ttnn/operations/sdpa
- PR: #37713
Revert "A balanced traffic pattern for AG minimal. (#36607)"
- PR: #37832
[skip ci] Fix galaxy perf tests yaml (bad merge)
- PR: #37836
[DM] Update data movement multi_interleaved tests
- PR: #37626
SDXL clip encoder perf targets updated
- PR: #37837
Fix timeouts in vllm nightly
- PR: #37842
DeepSeek Blitz moe fusion
- PR: #37757
Generate Welford reciprocals in Python and pass into distributed layernorm ops
- PR: #37080
Fix TTTv2 MLP 1d from model args mismatch + BH Stress test pytest id
- PR: #37589
[skip ci] Fix Package and release workflow
- PR: #37844
Update compute kernel API to reflect new changes to fast tilize
- PR: #37736
Fix timeouts for qwen in vllm nightly
- PR: #37854
[skip ci] Add back missing schedule to BH demos
- PR: #37862
Pool2D Alignment Fixes for Watcher
- PR: #37599
Add LLK_ASSERTs for verifying tile index in dest accumulator
- PR: #37780
Make mm respect first core from subdevice
- PR: #37511
Add TTTv2 rmsnorm module unit tests to T3K e2e pipeline
- PR: #37800
Unify kernel and firmware JIT build deduplication into JitBuildCache
- PR: #37452

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.67.0-dev20260214

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

📦 Uncategorized

Fix moreh kernel runtime arg bounds issues (#37193, #37040)

Uh oh!