Skip to content

Releases: tenstorrent/tt-metal

v0.67.0-dev20260216

16 Feb 03:28
Immutable release. Only release title and notes can be modified.
ecc3ca4

Choose a tag to compare

v0.67.0-dev20260216 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22046210687

📦 Uncategorized

  • Fix broken import in test_deepseek_mla_ops.py after SDPA test migration
  • Add tt_symbiote: PyTorch-to-TTNN transparent acceleration framework

v0.67.0-dev20260215

15 Feb 03:26
Immutable release. Only release title and notes can be modified.
53f7c88

Choose a tag to compare

v0.67.0-dev20260215 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22026945186

📦 Uncategorized

  • fix(sweep): correct lead-models Slack notifier's run context, counts, and alerting
  • Propagating new unpack LLK for reduce ops
  • #37471: Output dtype parameter - fix for fp32 dst mode conflict
  • Add indexes to TTNN report db
  • DeepSeek Blitz MLP fusion
  • [skip ci] Move conv test to run last in upstream didt suite
  • Delete Event as it is unused code
  • Kwerblinski tt/37656 blitz lm head
  • fix processor names in watcher tests
  • Migrate experimental operations to use bind_function template and free functions
  • Reorder device params to fix deepseek tests cache paths
  • Split initialization of various components into their own classes
  • Add CQ_PREFETCH_CMD_RELAY_LINEAR_PACKED_H command
  • H<->D Ops for Blitz + Changes to support Async Slow Dispatch
  • Migrate pool and adaptive pool operations to free function style
  • Halo Check Output Grid Matches Input Grid
  • Expose tile dim reconfig template flag in metal
  • TT-triage device and core hardening
  • Improve venv relocatability for distributed and tt-run env inherit
  • #37896: Fix silu_init for BH

v0.66.0

16 Feb 01:20
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

v0.66.0 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22041587391

📦 Uncategorized

  • Remove sending lm_head persistent_buffer to DRAM
  • Optimize decode for Llama3-70B for TG for stable branch
  • Add missing configs for gamma3

v0.67.0-dev20260214

14 Feb 07:56
Immutable release. Only release title and notes can be modified.
bfd3b81

Choose a tag to compare

v0.67.0-dev20260214 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22007703479

📦 Uncategorized

  • [Refactor] Math Fidelity enum
  • Restore max torch threads for multi-host runs
  • [skip ci] #0: update matmul test timeout for tt-sim
  • Add scatter to sdpa_reduce_to_all OP
  • Update SDPA to use the new fast approximate exponential function
  • Mark init functions as deprecated
  • Fix LM head norm config for qwen3vl
  • Exclude graph_argument_serializer.cpp from unity builds to reduce build time
  • Extend multi-user isolation tests with ASIC/TP2 scenarios
  • Adding Qwen3-VL vllm support
  • Implement models.experimental.ops.composite.launch()
  • Add precompiled headers to tt-train for faster compilation
  • Pipeline reorg cleanups
  • Simplified the way to validate operations
  • Move CRTAs to Kernel groups
  • Fix moreh failure
  • Adding uneven output shard support to untilize
  • PDL: PCC drop on instance embedding fix
  • [WATCHER] Increasing timeout for bh post commit with watcher enabled
  • Add Custom Init for Packing Contiguous Block from DEST
  • round up mem config shapes
  • Fix minor typos in unary max/min comments.
  • Move prefetcher pytest option to avoid breaking CI tests
  • [Gemma3] Fix for gemma3 failing unit tests
  • [GPT-OSS] Add fused op unit tests for MoE
  • Disable stable_diffusion model perf test on blackhole (#37617)
  • Add program configs for Matmul ops in Embedding block to run across 40 cores in the SDXL Refiner
  • [tt-train] Add training log comparison plotting script
  • [skip ci] Enable watcher apc nightly debug
  • Adding test harness to check cache on device compatibility for Deepseek 671B
  • [Watcher] tt-train-cpp-unit tests have new watcher enabled fails due to recent changes
  • chore: update LLK submodule to 346a830
  • removes meta lib dependencies
  • [WATCHER] Following issues are detected when watcher is enabled on BH post commit
  • [skip ci] Add P300-viommu to BHPC multi card fast tests
  • SGLang generator
  • [tt-train] Complete nanoGPT Python impl
  • Add new CI pipeline for Deepseek to test long seq lens and refactor tests
  • Topology Mapper Integration with Topology Solver API
  • Make TP All reduce optional in Post SDPA
  • Fix misleading comment in dataflow_api for multicasts
  • [skip ci] Update llama demo upstream test id's
  • Enable multi-host neighbor-pad and RingAttentionAllGather CCLs
  • LLK API support for 8x32 tilize
  • Upgrade Pillow -> 12.1.1 to fix CVE-2026-25990
  • Fix moreh kernel runtime arg bounds issues (#37193, #37040)

  • Convert Sparse Multicast Static Asserts to Runtime Asserts
  • Do not use internal bh name in builtins
  • Quasar compute API bringup V1.0
  • [Deepseek Blitz] Split q a proj mm on inner dim
  • Reduce to one generic op and fusing it with moe routed expert
  • [TTTv2] Add attention_1d module with comprehensive unit tests
  • Matmul - Add Support for 2D DRAM interleaved in0 + batched height sharded in1
  • Changes for quad module tests CI
  • Subtract grid offset when computing 0-based indices in sharded LN factory
  • Decouple Cluster initialization from HAL
  • Switch llama 8b to DP=4 in vllm nightly
  • A balanced traffic pattern for AG minimal.
  • [skip ci] Remove t3k select pipeline extra-tag inputs
  • #36982: create_q_heads tilizes to 8x32 tiles
  • Enable (very) basic compute kernels
  • Migrate conv operations to free function style
  • Migrate fast dispatch frequent tests to CIv2 runners
  • reduction: migrate to free function binding + generic cleanup
  • Use gh_run_number for Superset dashboard links in Slack notifications
  • Fix race condition in parallel multi-source jit build
  • chore: update LLK submodule to f7cf929
  • Move SDPA and MLA tests from tt_eager/misc to ttnn/operations/sdpa
  • Revert "A balanced traffic pattern for AG minimal. (#36607)"
  • [skip ci] Fix galaxy perf tests yaml (bad merge)
  • [DM] Update data movement multi_interleaved tests
  • SDXL clip encoder perf targets updated
  • Fix timeouts in vllm nightly
  • DeepSeek Blitz moe fusion
  • Generate Welford reciprocals in Python and pass into distributed layernorm ops
  • Fix TTTv2 MLP 1d from model args mismatch + BH Stress test pytest id
  • [skip ci] Fix Package and release workflow
  • Update compute kernel API to reflect new changes to fast tilize
  • Fix timeouts for qwen in vllm nightly
  • [skip ci] Add back missing schedule to BH demos
  • Pool2D Alignment Fixes for Watcher
  • Add LLK_ASSERTs for verifying tile index in dest accumulator
  • Make mm respect first core from subdevice
  • Add TTTv2 rmsnorm module unit tests to T3K e2e pipeline
  • Unify kernel and firmware JIT build deduplication into JitBuildCache

v0.66.0-rc15

14 Feb 09:00
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

v0.66.0-rc15 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22007722041

  • no changes

v0.66.0-rc14

13 Feb 18:19
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

v0.66.0-rc14 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21983572634

📦 Uncategorized

  • Add missing configs for gamma3

v0.67.0-dev20260212

13 Feb 03:43
Immutable release. Only release title and notes can be modified.
2190abc

Choose a tag to compare

v0.67.0-dev20260212 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21928826263

📦 Uncategorized

  • Add socket forward pipeline tests
  • Add memory profiling in deepseek demo
  • [skip ci] Add scripts to analyze exabox scaleout tools results
  • Bump nbconvert from 7.16.6 to 7.17.0 in /docs
  • Enhance tensor serialization in operation_tracer
  • Remove apc select tests and add test select to all post commit
  • Fix apc merge report
  • [skip ci] [#36521] Superpod Testing Documentation
  • noc_semaphore_inc_multicast support
  • [WATCHER] BH Post commit has watcher enabled passed to only some of the jobs
  • [Stable Diffusion 1.4] Revert "[skip ci] Disable pytest timeout for Stable Diffusion device
  • Configure SSH for multi-host MPI in dev image
  • Revert "Configure SSH for multi-host MPI in dev image (#37614)"
  • fix gpt rope post ttt prefetcher merge
  • SDXL fix clip encoders
  • Fix custom_mm leaving counters in nonzero state
  • [skip ci] Remove parallelism as we suspect a race condition somewhere
  • SFPI 7.25.0 252
  • Fix perf counters hitting device side asserts
  • Pool OOM Regression Tests
  • [skip ci] Expand Exabox troubleshooting guide with new operational issues
  • [skip ci] Include conv2d.hpp and deps in the public API
  • Initial Blitz flash mla datamovement
  • Add multihost mpi config to dev image
  • Fix SDPA dht_granularity regression for MLA (DHt != vDHt)
  • [tt-triage] Disable dump_fast_dispatch check on NCRISC
  • Blackhole pipeline re-org
  • Updated blackhole post commit smoke jobs with priority runners
  • Allow dynamic allocation of trace buffers when no trace region is set.
  • 32 Bit Indexing Support for MPWI
  • Respect MM throttle level in minimal_matmul
  • Fix missing wait_for_host_writes flag
  • Fused lerp operation & lerp_tile LLK

v0.66.0-rc12

13 Feb 00:05
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

v0.66.0-rc12 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21928847635

  • no changes

v0.67.0-dev20260211

12 Feb 00:57
Immutable release. Only release title and notes can be modified.
c664184

Choose a tag to compare

v0.67.0-dev20260211 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21888158442

📦 Uncategorized

  • Fix dead store in pool multi-core program factory
  • [skip ci] Add missing JIT files to .deb packages
  • Update bh eth test for new fabric router handshake
  • [Deepseek Blitz] Reduce-to-all op (python infra)
  • Add support for more scenarios when reading from pinned memory
  • Add utility to generate Blitz Decode Scaleout Configs
  • Initial Integration PR for Dram Prefetcher + Llama 8b on BH QB and LB
  • fix bad gpt-oss demo outputs
  • [skip ci] Fix eltwise ops docs
  • Decouple ControlPlane and it's children from MetalContext
  • [skip ci] add codeowners for models/common/sampling
  • Update blitz mcast/gather micro ops
  • Fix division by zero in pool operation scalar config generation
  • Relax power-of-2 constraint on SDPA chunk granularity
  • [skip ci] GPT-OSS skip perf check
  • [TT-Transformers] Put mesh_partition instead of reduce_scatter for slicing replicated mesh tensor
  • Calculating arc heartbeats per seconds with more precision
  • matmul: disable worker cores if receive padding input data only
  • SFPI 7.24.0 246
  • #36917: Add uint16 support for fill op
  • [tt-train] Remove unnecessary .hpp files from ttml METAL_OPS_FILES
  • Remap to max supported topk instead of assert
  • changed reshape tensor layout to TILE for deepseek moe_gate
  • Relax graph capture conv memory targets
  • [Watcher] Models-unit-tests turned red after recent commit, keeping it green with additional skips
  • Relocate host DFB files to correct experimental folder
  • Fix PCC fluctuation in BGE-large-en vLLM generator
  • Add 4 link ring to deepseek
  • TT-Train: CMakeLists.txt: TT_METAL_HOME determination fix, env usage removal
  • Relax ResNet50 BH batch 32 e2e perf threshold (#37554)
  • Revamp of binary/unary max/min via SFPLOADMACRO.
  • Use future.get() instead of wait() to allow proper error propagation.
  • Automatic DRAM Slicing for Pool2D
  • Fix unit_tests_debug_tools parallel-safety
  • Migrate all models to make full use of the Module base class
  • MLA Optimizations
  • Remove init_fabric
  • Fix CB size calculation in non-sharded matmul with transpose-A and user core grid
  • Haibo sun/issue#31236 Stateful APIs and Trid 2.0 API Tests
  • Remove EnqueueTerminateCommand and command infrastructure
  • Fix CMAKE_BINARY_DIR usage in scaleout tools for add_subdirectory consumption
  • [tt-train] Make training profiler respect TT_METAL_DEVICE_PROFILER

v0.66.0-rc11

12 Feb 01:55
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

v0.66.0-rc11 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21888180929

  • no changes