Skip to content

Releases: tenstorrent/tt-metal

v0.67.0-dev20260203

03 Feb 10:09
Immutable release. Only release title and notes can be modified.
3d6d9a0

Choose a tag to compare

v0.67.0-dev20260203 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21611868169

📦 Uncategorized

  • Restore optimized reduce_c codepath for SDPA prefill
  • Add TTNN PCH
  • #34887: Update LLK Tile API
  • ci: parallel execution of workflows during llk dry-run
  • Add LLK API changes for llk_unpack_tilize
  • Add deprecated warning for NOC blitz write
  • [skip ci]: upstream image: allow unstrict modes for ssh authorized keys
  • SFPI 7.22.0 229
  • Fix import in gemma conftest
  • [skip ci] Enable weka permissions input for galaxy and t3k demo tests
  • [WATCHER] Skipping currently failing tests from models-unit-tests group
  • Fix validation_tools test to expect passing RMSNorm metrics
  • PDL: Optimize matmuls for BH150 110 cores
  • [skip ci] Add .claude to .gitignore
  • Reduce ring joint SDPA binary size
  • [skip ci] BH nightly: flip default to true for multi card unit tests
  • [skip ci] #36488: Version command for tt-installer
  • [skip ci] Change os version to platform in release build test publish workflow

v0.67.0-dev20260202

03 Feb 02:00
Immutable release. Only release title and notes can be modified.
dc2d241

Choose a tag to compare

v0.67.0-dev20260202 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21573363912

📦 Uncategorized

  • [skip ci] Update CODEOWNERS and enhance Slack group handling in workflows
  • Deepseek V3 resolve CI failure
  • Reduce number of noc writes using in padding path of transpose tile hc again
  • Added Fabric Sparse Multicast Messaging
  • [skip ci] Optimize CODEOWNERS to use metalium-developers-triage team

v0.67.0-dev20260201

01 Feb 19:20
Immutable release. Only release title and notes can be modified.
be36d59

Choose a tag to compare

v0.67.0-dev20260201 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21553622687

📦 Uncategorized

  • Add logging of dispatch program command info
  • [Blitz Decode] Changes to setup and teardown FD Manually
  • [skip ci] Expand the CMake policy range
  • Cleanup unnecessary code from HWCommandQueue
  • Fix binary/unary int32 max/min.
  • Deepseek 671B prefill fully functional up to 64k tokens
  • Revert "Reduce number of noc writes using in padding path of transpose tile hc (#36842)"
  • [tt-train] Add causal mask support for SDPA backward

v0.67.0-dev20260131

01 Feb 00:15
Immutable release. Only release title and notes can be modified.
bb5be5b

Choose a tag to compare

v0.67.0-dev20260131 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21535500407

📦 Uncategorized

  • Capture missing write barriers at end of kernel
  • #0: Fix LLK short inits hardcoding num_faces
  • SFPI 7.21.0 220
  • fix(sweep): propagate config_hash through suite parameters
  • [skip ci] CODEOWNERS: Replace individual callouts with team references
  • Move some constant tensors in MoE to shared state
  • Add KV Cache Branch
  • Add repr to program configs
  • [skip ci] Split Copilot autofix into standalone workflow
  • Sparse matmul: allocate zeroed output tensors
  • [skip ci] Bump dev tag from bigger from semver and rc tag
  • #33701: fix rounding errors for bfloat16 multiplication and division
  • fixed Deepseek reference model and generate_test_inputs_outputs
  • Add custom Compute API for fused eltwise mul + reduce_scalar operation
  • Skip commenting clang tidy errors on forks
  • Skip for forks merge report for forks
  • migrate in hf the whole decoder text branch that includes crossattetntion sibe branches
  • Xfail teacher-forcing batch divergence outside CI
  • [skip ci] Fix import ttnn ubuntu 24
  • Fix bandwidth test to use physical chip ID when reading back telemetry
  • fix for issue 19309: does not pad tensor properly
  • fix: disable failing DescriptorMergerTest's tests
  • Add time budget controls for Galaxy nightly pipeline -> now Galaxy e2e pipeline
  • Remove pinnings from 4x2 split galaxy unit test
  • Fix YUNet hang issue on Wormhole and code cleanup
  • Add TTTv2 demo model with CI integration (MLP part)
  • Remove tt::Cluster dependency on MetalContext
  • Delete tests for DeepSeekV3 pipeline parallel modules
  • Add sub-mesh tests for galaxy didt/power tests
  • Add support for N300x2 in TTT and TT metal
  • [gpt-oss] perf optimization: all to all ops with tokens on dim -2
  • Add vLLM DummyNoOpModel for testing vLLM host overhead
  • Integrate mul_reduce_scalar into RMSNorm for Deepseek Blitz
  • Reduce number of noc writes using in padding path of transpose tile hc
  • Fix tiled mesh_partition validation: verify no padding is introduced along partition dim
  • Fix wan image to video and implement wan encoder.
  • Fix view on height sharded rm input
  • Fix incorrect core type check in Blackhole Active Erisc status print

v0.66.0-rc7

31 Jan 23:07
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

v0.66.0-rc7 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21535517640

  • no changes

v0.66.0-dev20260130

31 Jan 01:20
Immutable release. Only release title and notes can be modified.
6b55b0c

Choose a tag to compare

v0.66.0-dev20260130 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21499957697

📦 Uncategorized

  • #24136 binary op output mem config half config
  • Scaffolding work to support migration of model trace json to PostgreSQL database
  • MoE Deepseek RS
  • fabric neighbor core sweep
  • Generalize sampling all gather
  • Large Kernel Support for MPWI
  • Fix deadcode.DeadStores warning in prepare_conv2d_weights.cpp
  • [gpt-oss] change prefill chunk size to 512
  • [UMD Bump] Automated UMD Bump 28.01.2026
  • [#36353] Create Universal Cluster Configuration for Combined 8x16 Topologies
  • [BEVFormer] Functional BEVFormer Encoder
  • chore: update LLK submodule to 3c99f90
  • Update tt-exalens version from 0.2.6 to 0.3.1
  • update some missed fabric ubench margins
  • Bump OFT perf to keep CI green
  • [#36025] Add validation and troubleshooting documentation for BH Galaxy Exabox clusters
  • Reduce Qwen2.5-VL unit test checkpoint load
  • Delete rank suffix from cache_path in ttrun.
  • Adding ND Sharding Support for the Untilize Op
  • TTNN Core APIs - Update Docs, Create Example
  • Make device close resilient to kernel hangs and timeouts
  • Increase CB limit to 64 on Blackhole
  • last few margins
  • Created matmul lab 2 for universities
  • Add tracing support in deepseek for vLLM flow

v0.66.0-rc5

30 Jan 11:32
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

v0.66.0-rc5 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21484937709

  • no changes

v0.66.0-dev20260129

29 Jan 22:49
Immutable release. Only release title and notes can be modified.
cd51eec

Choose a tag to compare

v0.66.0-dev20260129 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21460930760

📦 Uncategorized

  • Filter DPrint output using FabricNodeID
  • Use rename for JIT-generated files for multi-process safety
  • Update ND failures analysis workflow to use incremental error reports
  • [DM]: Add multicast tests with semaphore synchronization
  • Create initial kv cache directly on device
  • Refactor models/demos folder structure
  • [skip ci] Update workflow-status reporting
  • Update Physical Validation Scripts To Use Cabling and Deployment Desc…
  • [skip ci] Add missing timeouts to GitHub workflow jobs and steps
  • [skip ci] Switch ClangSA results repository to organization repo
  • Distro-agnostic workflow overhaul
  • Revert "Distro-agnostic workflow overhaul "
  • Memory Optimization for Intermesh Routing Table Generation
  • #31107 Support binary_ng scalar value direct sharded to L1 for perfor…
  • [Quasar only]: Adding experimental DataflowBuffer support to enable multiple DM threads streaming data into a Neo
  • Distro-agnostic workflow overhaul
  • fix release workflow
  • chore: update LLK submodule to a6db624
  • Watcher RTA/CRTA bound check
  • [gpt-oss] force dp=1 for users_row_sharded=True
  • #31418: Improve bf16 unary pow using 21f approach
  • Cleanup sigmoid implementation
  • Fix gemma tests
  • [Llama 70b glx] Fix prefill line all-gather barrier semaphore fallback
  • Update gemma3 missed references
  • Use simplified compute kernel declaration syntax in ttnn
  • add initial fabric router control mechanism
  • [skip ci] Remove test_suite_bh_pcie_didt_tests from blackhole_loudbox
  • Add INT32 support for Remainder, FMOD in LLK
  • [skip ci] Remove test_bert.cpp - not running in CI for 2+ years
  • Add strided K-tile access to custom b=1 MM for out_w > 1
  • Switch from SFPCAST to SFPABS per tt-isa-documentation guidance
  • from_torch conversion should rely on device Ops
  • [skip ci] Suppress all warnings from third-party dependencies
  • SFPI 7.20.0 216
  • Matmul - Initial Batched Sharded DRAM Implementation
  • Implement skip list for device sampling
  • Define 8x4 grid for Quasar to enable more tests on TT-Sim
  • update margins to fix t3k APC, as they drifted a bit for BW
  • TT-Train: bump clang version from 17 to 20
  • Revert "Cleanup sigmoid implementation"

v0.66.0-rc4

29 Jan 01:02
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

v0.66.0-rc4 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21449791235

  • no changes

v0.66.0-rc3

28 Jan 18:02
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

v0.66.0-rc3 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21438675873

  • no changes