Skip to content

Releases: tenstorrent/tt-metal

v0.66.0-dev20260116

17 Jan 00:37
Immutable release. Only release title and notes can be modified.
5b30cee

Choose a tag to compare

v0.66.0-dev20260116 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21051111065

📦 Uncategorized

  • Rename ops params and input structs
  • fix(ttml): resolve nanobind duplicate type registration errors
  • [skip ci] Add download-artifact-with-retry action to fix corrupted .deb downloads
  • Add Qwen Image CI tests
  • Adding pi0 model to TTNN
  • Add support for other dtypes and L1 for multicore pad OP
  • Revert changes to create_arange_vector_of_bfloat16
  • Add support of choosing position_ids in testing MLA
  • #35313 fix sdpa with attn sinks
  • [UPSAMPLE] Add floating point scale factor support to TTNN upsample
  • Move uv to base stage
  • [TT-Transformers] Enable fused rotary and paged cache update ops in attention module
  • Fix Wan postprocess spatial output
  • Check for disallowed params combination in chunked SDPA
  • Use distributed LN in TT-DiT models
  • Add support for paged KV cache and chunked prefill to ring distributed sdpa
  • migrate to HF cross attention vision transformer of mLlama
  • Add checks for cgroup memory since Docker uses namespaces to limit things
  • Allow docs deployment to be from main
  • #0: Add actual device perf check in ops post commit
  • Add user configurable max packet size to fabric
  • L2 nightly test failure with ttnn.where()
  • [Bug fix] Altering ALU config from TRISC0
  • CCL Program Cache Updates
  • Data Movement Program Cache Fixes
  • [Fabric] Pkt hdr updates - support for upto 4X64 mesh
  • Z Router device changes
  • [skip ci] Delete ttnn/api/ttnn/Untitled
  • Enable MeshWorkload in ttnn.generic op
  • Fix Quasar FW compilation
  • Allow Logical to Physical Pinnings in MGD

v0.66.0-dev20260115

16 Jan 00:38
Immutable release. Only release title and notes can be modified.
d72a067

Choose a tag to compare

v0.66.0-dev20260115 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21014818709

📦 Uncategorized

  • Clean up CMakeLists.txt messages and flags
  • [skip ci] Fix uv command
  • #34755: Add Fused Implementation of Deepseek MOE Gate for Deepseek B1
  • [skip ci] Add suite-level device fixture to reduce CI overhead
  • Remove things related to old device operation
  • Improve trace tracking to fix device perf
  • [skip ci] Fix dead store warning in topology_mapper.cpp
  • Add Wan and Flux BH LB configuration
  • 34250 issue: include of test header to the production codebase
  • 1136 bug: tensor to torch workflow implementation through host conversion
  • Add Functional Qwen-Image on WH
  • [skip CI] Fixes for t3k perf pipeline changes
  • Migrate legacy tt_metal tests to gtest framework
  • #33778: Add uint16 support for bitwise shift ops
  • [skip ci] Fix create_venv.sh and finish uv propagation
  • Add stability test suite for BH GLX 2D Torus (1D and 2D)
  • Add metal api to all enqueue_read into PinnedMemory.
  • [skip ci] Remove CCL sharded address generator sweep tests (infinite speedup)
  • fix: Add [[maybe_unused]] to benchmark loop variables to silence clang static analyzer
  • #34880: Add llk kernel for addcmul
  • [skip ci] Update the description for the Eth link status check
  • Add ttnn.experimental.isin to TTNN Python and C++ APIs (2nd attempt)
  • Don't conditionally dispatch on individual devices during ttnn.paged_update_cache
  • Config Tensors in DRAM for Pool2D
  • #32879: Simple accurate softplus op
  • LLK uninits for BH
  • Gemma3-27b DP4 on TG added to vLLM-nightly
  • [skip ci] Fix download artifacts script
  • [skip ci] Fix mismatched model name in T3K unit pipeline
  • Revert "Don't conditionally dispatch on individual devices during ttnn.paged_update_cache (#35656)"
  • [DM]: Removing unused mesh_device parameter
  • Add OWL-ViT model using TTNN APIs
  • #35572: Use TensorAccessor for sharded untilize
  • [Fabric] Fix ccl tests after pkt hdr updates
  • Fix models_common_unit_tests in t3000 e2e tests CI
  • Update op perf report reading to support new op type format
  • Fix wormhole llk_uninit missing default values error
  • Fix variable shadowing and improve error handling in pad RM multi-core
  • [skip ci] auto-generate owners from pipeline reorg
  • Restore test_clean_init as standalone executable
  • 2erisc coordinated retrain on BH
  • [tt-train] SDPA Backward Pass operation
  • Avoid including dataflow_api.h in firmware builds.
  • [skip ci] Search multiple pip indexes
  • Update blackhole golden dispatch file
  • SDXL Img2img accuracy
  • Remove redundant return-type usings from device ops
  • #32998: Use bcast scalar with dest reuse for RMSNorm
  • Cache step independent computations in Wan2.2 pipeline
  • Capture src/dst addr and useful NoC counters in NoC Debug Packets
  • Remove dead store for num_cores in embeddings_fused_program_factory
  • Fix reading into pinned memory on tunneled devices
  • SFPI 7.16.0 168
  • [skip ci] make workflow yaml as template for analyzing ND failures workflow
  • [skip_ci] Add CODEOWNERS entry for llk_api/llk_sfpu
  • [UMD Bump] Automated UMD Bump 08.01.2026
  • [TT Transformers] DRAM Prefetcher Bring up on BH with Ring MM Unit test
  • [skip ci] Remove docker-job subdirectory workaround (phase 1)
  • [skip ci] Move install_uv and update create_venv

v0.66.0-dev20260114

15 Jan 01:02
Immutable release. Only release title and notes can be modified.
521c53f

Choose a tag to compare

v0.66.0-dev20260114 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20977570832

📦 Uncategorized

  • [tt-train] Revert test skip in NIGHTLY_UnusedParametersInModuleSGD
  • Added Fabric Benchmark Upload Guards
  • Fix Qwen T3k demo + perplexity tests due to missing seq len cutoff in warm up and incorrect max_seq_len
  • Update Mixtral model tests to use HF as reference
  • [skip ci] Fix calling of deploy docs
  • Update conv2d performance targets and threshold
  • #0: Add missing python comparison operator for CoreRangeSet
  • Fix sliding_window SDPA program caching
  • Adding stallwaits to first batch of uninits
  • Fix qwen25_vl unit tests
  • [skip ci] Add philei-tt & jmalone-tt to tt-train codeowners
  • Fabric tests were missing from merge gate status checks
  • Add 6u cyclic multiprocess tests to CI
  • [skip ci] Add check-prs cursor command for PR status monitoring
  • [skip ci] Add @mateusznowakTT to CODEOWNERS
  • [skip ci] Switch from pip to uv pip
  • Bump versions of deps that are so old pip is compiling it from scratch
  • Improve triage debug messages
  • Adding Warning when downgrading Mesh shape because of Connectivity
  • Cluster validation updates for characterizing BH Link Health
  • Replace assert()/TT_ASSERT() with reliable checks in tests
  • #28087 revert the binary compute core optimization revert and more changes
  • Allow multiple output tensors
  • declaring rta and crta thread_local, fixing linker values
  • [skip ci] Metal Profiler Tech Report Update
  • Topology Solver: Adjacency Graphs and Constraints API
  • Added codeowners for docs without owners
  • Support two risc in UDM mode
  • Add specialized Distributed Layernorm for DiT models
  • #35670: create a new job to determine the runner labels for Git Dispatch workflow
  • Add a script to count the number of pytest including parametrize expansions given a path
  • Fix N150 profiler
  • ci: Change pr-gate default build-type from ASanCoverage to ASan

v0.66.0-dev20260113

13 Jan 17:14
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

v0.66.0-dev20260113 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20939760521

📦 Uncategorized

  • [Fabric] Disable 80B header for 2D
  • Add override_output_sharding_config param to BlockShardedStrategyConfiguration
  • [skip ci] Enhance run_conv2d_short_sweep function to accept additional params
  • Deepseek module changes to ensure compatibility with higher sequence lengths
  • Improving error messages across owned scripts
  • Ring Attention datamovement optimization
  • Improve performance of accurate exponential
  • Fix dead store warnings in ternary_program_factory.cpp
  • Optimize SD Profiler Reads
  • [skip CI] Fixes for t3k demo pipeline changes
  • Add DeepSeekV3 unit tests to T3K unit and APC pipelines
  • Trigger 2x WH GLX similar to T3K multihosts
  • Remove '_no_pack' Tilize Variants
  • Fix parameter shadowing bug in BlockRep constructor
  • Use multicast when initializing metal context
  • Increased timeout for t3k integration llama3 test
  • Add dst addrs to NoC async read/write debug packets
  • Optimize page size in traces for performance.
  • Test fixes after moving 2.0 into experimental
  • 33696: Remove sub_device_manager_tracker from device
  • [skip ci] upstream image: give other users r/o permissions to the home directory
  • fix tracy .str conversion for when special_parent_text col is empty
  • Update tt-logger version to 1.1.7
  • Move DPRINT parsing logic to separate class
  • Fix Qwen garbage output
  • Cleanup dispatch_core_common.hpp
  • Remove metal_soc_descriptor.h from public Runtime API
  • Fix OOM in XQKV prefill matmul on P100 Llama 8b

v0.66.0-dev20260112

13 Jan 00:34
Immutable release. Only release title and notes can be modified.
b493a7b

Choose a tag to compare

v0.66.0-dev20260112 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20904408807

📦 Uncategorized

  • Improve accuracy of atan/atan2
  • Fix static analyzer false positive in device_operation.hpp
  • Modify unary_bcast API in metal to add new data formats
  • Fix ring matmul runtime arg hang and bad outputs in llama70b
  • [skip CI] Fixes for t3k pipeline changes

v0.66.0-dev20260111

12 Jan 00:36
0741d5c

Choose a tag to compare

v0.66.0-dev20260111 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20886677172

📦 Uncategorized

  • [Fabric] Add device freq validation for perf modes
  • Cleanup rms norm function
  • [skip ci] Cleanup "using" in llrt.hpp
  • Migrate op to new infra: matmul
  • Vit bh combined tech report
  • Strip out unused symbols for the bfloat utilties
  • Add time budget controls for t3k pipelines and renames frequent, nightly, model perf to integration, e2e, perf
  • fix-matmul-wrong-clang-tidy-fix
  • Trace Deepseek V3 on 1x Galaxy
  • Fix clang-tidy misc-unused-params warnings

v0.66.0-dev20260110

11 Jan 00:39
320939d

Choose a tag to compare

v0.66.0-dev20260110 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20869506581

📦 Uncategorized

  • Migrate op to new infra: FusedRMSNormPreAllGather
  • Force Single ERSIC Kernel Execution in run_cluster_validation
  • Telemetry: Add fabric bandwidth telemetry metrics (v2)
  • Fix models/common/tests CI failure
  • Increase Vovnet treshold
  • [skip ci] #35313 [GPT-OSS] disable 4k prefill unit tests
  • Relax T3K Qwen2.5-Coder-32B CI target
  • Fix T3K Mixtral Perplexity tests with missing is_mixture_of_experts flag
  • #32289: remove duplicate file
  • Fixes after prefix caching
  • [tt-train] Add KV cache support to tt-train's LLaMA
  • Changing DM PCC check to bitwise comparison
  • Improve accuracy of tanh on float32
  • migrate vision encoder unit test to HF
  • #35236: remove deepseek blitz ops tests from models unit tests
  • add missing pytest import
  • Sagarwal/profiler noc trace bug
  • Update on profiler CI options and remove other nightlies
  • Consolidate fabric init postcodes and telemetry status
  • PR: Fix rotary_embedding_llama sweep test with proper golden function
  • Mbahnas/vit bh hires 1211
  • Updating CB doc to indicate there is only 1 reader and 1 writer
  • Generalize timeStampedData function
  • Moving 2.0 apis into experimental and updating compute kernels to use CB abstraction
  • [Fabric] Fix ubench pipeline
  • TT-Transformers version 2 modules -- MLP
  • #35342: Revert PR #32045
  • Bump ttsim version to v1.2.0
  • use subordinate_sync correctly

v0.65.1

10 Jan 10:21
558a196

Choose a tag to compare

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20869526990

📦 Uncategorized

  • Remove prefetcher dangling reference from previous test
  • Fix batched prefill pcc issue
  • Llama-3.1-8B decode TSU optimizations
  • [skip ci] Re-gen Docker containers (#35305)

What's Changed

Read more

v0.65.0

13 Jan 22:36
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

TT-Metal v0.65.0 Release Notes

This release contains significant improvements and new features.

Changes

See CHANGELOG.txt for detailed commit history.

Installation

Refer to INSTALLING.md for installation instructions.

Model Updates

New

  • New Op Infrastructure Enablement for LLM & Diffusion Models
    Core transformer execution paths (QKV, rotary embeddings, SDPA decode) migrated to the new op infra, forming the backbone for scalable LLM and diffusion support.
    PR #33209 – Migrate op to new infra: sdpa_decode

Model Performance & Accuracy Updates

Improvements and New Features

Full Changelog: v0.64.5...v0.65.0

v0.66.0-dev20260109

09 Jan 14:22
f18f1d8

Choose a tag to compare

v0.66.0-dev20260109 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20836658890

📦 Uncategorized

  • [skip ci] Add timeout to package installation step
  • Expanding module tests to ensure added seq len functionality for Deepseek 671B model
  • Fix BH performance: Remove unnecessary NOC_BRCST_EXCLUDE resets
  • Graph tracing improvement
  • #35326: Add Deepseek Blitz unit tests to CI
  • Make allocate_tensor_on_device private, use create_device_tensor instead
  • [Fabric] Add infra for dynamic packet header sizing
  • Add sweeps for new model traced ops
  • Improve Out of Memory Error Message
  • [skip ci] update gpt-oss README
  • Add teacher forcing demo test for Deepseek 671B model
  • [DM] Update data movement tests
  • #34947: ttnn_tracer_model ttnn tutorial fix
  • Add memory usage tracking for DRAM & L1 in training loop
  • #0: [skip ci] Add P100 support in git bisect
  • Update ttexalens reference version to 0.2.0
  • [skip ci] Enable t3k demo tests cron job
  • adds TT_METAL_JIT_ANALYTICS environment variable
  • Add support for Automatic Prefix Caching in TT-Transformers
  • Reenable fabric manager tests in Galaxy Quick
  • #32983: Remove some initial calls to test_system_health as it's being deprecated
  • Expose Hyperparams to Standard Namespace AG & RS
  • Strip unused symbols in sub_device.hpp
  • Launch dispatch kernels in parallel on multiple devices
  • [skip ci] Update Wheel Artifact Naming Convention in CI
  • Reduce channel count when not all channels are needed.
  • allow subordinate_sync_t per architecture
  • [skip ci] Add bh demo tests and bh multi card test to release testing
  • [skip ci] Optimize clang-tidy presets: disable tt-train and switch to Debug config
  • Apascual/30094 test mixtral decoder against hf
  • [skip ci] update merge gate alerts
  • [TT-Train] GSM8K Finetuning example with dashboard and Galaxy support
  • Fix swapped BASE_DIRS in kernel_helper_functions CMakeLists.txt
  • Moved get_batch_size to shape file
  • Move compute_flat_indices to shape
  • Add owners of vLLM integration tech report
  • feat: refactor import_tracy_op_logs
  • Migrate op to new infra: all_gather_async
  • [skip ci] zstd for .debs
  • #35441: Fix ttnn.visualize_tensor() crash on multi-host systems
  • Haibo sun/issue#29156