17 Jan 00:37

Immutable

v0.66.0-dev20260116

5b30cee

v0.66.0-dev20260116 Pre-release

Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21051111065

📦 Uncategorized

Rename ops params and input structs
- PR: #35857
fix(ttml): resolve nanobind duplicate type registration errors
- PR: #35423
[skip ci] Add download-artifact-with-retry action to fix corrupted .deb downloads
- PR: #35853
Add Qwen Image CI tests
- PR: #35800
Adding pi0 model to TTNN
- PR: #35833
Add support for other dtypes and L1 for multicore pad OP
- PR: #35869
Revert changes to create_arange_vector_of_bfloat16
- PR: #35715
Add support of choosing position_ids in testing MLA
- PR: #35789
#35313 fix sdpa with attn sinks
- PR: #35817
[UPSAMPLE] Add floating point scale factor support to TTNN upsample
- PR: #35508
Move uv to base stage
- PR: #35896
[TT-Transformers] Enable fused rotary and paged cache update ops in attention module
- PR: #35111
Fix Wan postprocess spatial output
- PR: #35871
Check for disallowed params combination in chunked SDPA
- PR: #35811
Use distributed LN in TT-DiT models
- PR: #35831
Add support for paged KV cache and chunked prefill to ring distributed sdpa
- PR: #35742
migrate to HF cross attention vision transformer of mLlama
- PR: #35750
Add checks for cgroup memory since Docker uses namespaces to limit things
- PR: #35450
Allow docs deployment to be from main
- PR: #35910
#0: Add actual device perf check in ops post commit
- PR: #35473
Add user configurable max packet size to fabric
- PR: #35848
L2 nightly test failure with ttnn.where()
- PR: #35879
[Bug fix] Altering ALU config from TRISC0
- PR: #35090
CCL Program Cache Updates
- PR: #35400
Data Movement Program Cache Fixes
- PR: #35429
[Fabric] Pkt hdr updates - support for upto 4X64 mesh
- PR: #35494
Z Router device changes
- PR: #34561
[skip ci] Delete ttnn/api/ttnn/Untitled
- PR: #35951
Enable MeshWorkload in ttnn.generic op
- PR: #35323
Fix Quasar FW compilation
- PR: #35926
Allow Logical to Physical Pinnings in MGD
- PR: #34996

Assets 27

16 Jan 00:38

github-actions

Immutable

v0.66.0-dev20260115

d72a067

v0.66.0-dev20260115 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21014818709

📦 Uncategorized

Clean up CMakeLists.txt messages and flags
- PR: #35553
[skip ci] Fix uv command
- PR: #35799
#34755: Add Fused Implementation of Deepseek MOE Gate for Deepseek B1
- PR: #35759
[skip ci] Add suite-level device fixture to reduce CI overhead
- PR: #35783
Remove things related to old device operation
- PR: #35720
Improve trace tracking to fix device perf
- PR: #35791
[skip ci] Fix dead store warning in topology_mapper.cpp
- PR: #35693
Add Wan and Flux BH LB configuration
- PR: #35360
34250 issue: include of test header to the production codebase
- PR: #34406
1136 bug: tensor to torch workflow implementation through host conversion
- PR: #35266
Add Functional Qwen-Image on WH
- PR: #34627
[skip CI] Fixes for t3k perf pipeline changes
- PR: #35801
Migrate legacy tt_metal tests to gtest framework
- PR: #35680
#33778: Add uint16 support for bitwise shift ops
- PR: #35164
[skip ci] Fix create_venv.sh and finish uv propagation
- PR: #35804
Add stability test suite for BH GLX 2D Torus (1D and 2D)
- PR: #35718
Add metal api to all enqueue_read into PinnedMemory.
- PR: #28957
[skip ci] Remove CCL sharded address generator sweep tests (infinite speedup)
- PR: #35797
fix: Add [[maybe_unused]] to benchmark loop variables to silence clang static analyzer
- PR: #35794
#34880: Add llk kernel for addcmul
- PR: #35221
[skip ci] Update the description for the Eth link status check
- PR: #35796
Add ttnn.experimental.isin to TTNN Python and C++ APIs (2nd attempt)
- PR: #29607
Don't conditionally dispatch on individual devices during ttnn.paged_update_cache
- PR: #35656
Config Tensors in DRAM for Pool2D
- PR: #35212
#32879: Simple accurate softplus op
- PR: #33766
LLK uninits for BH
- PR: #35645
Gemma3-27b DP4 on TG added to vLLM-nightly
- PR: #35752
[skip ci] Fix download artifacts script
- PR: #35824
[skip ci] Fix mismatched model name in T3K unit pipeline
- PR: #35672
Revert "Don't conditionally dispatch on individual devices during ttnn.paged_update_cache (#35656)"
- PR: #35829
[DM]: Removing unused mesh_device parameter
- PR: #35786
Add OWL-ViT model using TTNN APIs
- PR: #35461
#35572: Use TensorAccessor for sharded untilize
- PR: #35686
[Fabric] Fix ccl tests after pkt hdr updates
- PR: #35559
Fix models_common_unit_tests in t3000 e2e tests CI
- PR: #35803
Update op perf report reading to support new op type format
- PR: #35821
Fix wormhole llk_uninit missing default values error
- PR: #35822
Fix variable shadowing and improve error handling in pad RM multi-core
- PR: #35700
[skip ci] auto-generate owners from pipeline reorg
- PR: #35777
Restore test_clean_init as standalone executable
- PR: #35835
2erisc coordinated retrain on BH
- PR: #35666
[tt-train] SDPA Backward Pass operation
- PR: #29259
Avoid including dataflow_api.h in firmware builds.
- PR: #35345
[skip ci] Search multiple pip indexes
- PR: #35840
Update blackhole golden dispatch file
- PR: #35782
SDXL Img2img accuracy
- PR: #35737
Remove redundant return-type usings from device ops
- PR: #35808
#32998: Use bcast scalar with dest reuse for RMSNorm
- PR: #35843
Cache step independent computations in Wan2.2 pipeline
- PR: #34237
Capture src/dst addr and useful NoC counters in NoC Debug Packets
- PR: #35682
Remove dead store for num_cores in embeddings_fused_program_factory
- PR: #35703
Fix reading into pinned memory on tunneled devices
- PR: #35810
SFPI 7.16.0 168
- PR: #35849
[skip ci] make workflow yaml as template for analyzing ND failures workflow
- PR: #35860
[skip_ci] Add CODEOWNERS entry for llk_api/llk_sfpu
- PR: #35695
[UMD Bump] Automated UMD Bump 08.01.2026
- PR: #35440
[TT Transformers] DRAM Prefetcher Bring up on BH with Ring MM Unit test
- PR: #35709
[skip ci] Remove docker-job subdirectory workaround (phase 1)
- PR: #35867
[skip ci] Move install_uv and update create_venv
- PR: #35862

Assets 27

15 Jan 01:02

github-actions

Immutable

v0.66.0-dev20260114

521c53f

v0.66.0-dev20260114 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20977570832

📦 Uncategorized

[tt-train] Revert test skip in NIGHTLY_UnusedParametersInModuleSGD
- PR: #35524
Added Fabric Benchmark Upload Guards
- PR: #35677
Fix Qwen T3k demo + perplexity tests due to missing seq len cutoff in warm up and incorrect max_seq_len
- PR: #35255
Update Mixtral model tests to use HF as reference
- PR: #35644
[skip ci] Fix calling of deploy docs
- PR: #35735
Update conv2d performance targets and threshold
- PR: #35729
#0: Add missing python comparison operator for CoreRangeSet
- PR: #35756
Fix sliding_window SDPA program caching
- PR: #35749
Adding stallwaits to first batch of uninits
- PR: #35287
Fix qwen25_vl unit tests
- PR: #35754
[skip ci] Add philei-tt & jmalone-tt to tt-train codeowners
- PR: #35764
Fabric tests were missing from merge gate status checks
- PR: #35717
Add 6u cyclic multiprocess tests to CI
- PR: #35472
[skip ci] Add check-prs cursor command for PR status monitoring
- PR: #35712
[skip ci] Add @mateusznowakTT to CODEOWNERS
- PR: #35768
[skip ci] Switch from pip to uv pip
- PR: #35707
Bump versions of deps that are so old pip is compiling it from scratch
- PR: #35767
Improve triage debug messages
- PR: #35676
Adding Warning when downgrading Mesh shape because of Connectivity
- PR: #35771
Cluster validation updates for characterizing BH Link Health
- PR: #35714
Replace assert()/TT_ASSERT() with reliable checks in tests
- PR: #35665
#28087 revert the binary compute core optimization revert and more changes
- PR: #35420
Allow multiple output tensors
- PR: #32193
declaring rta and crta thread_local, fixing linker values
- PR: #35545
[skip ci] Metal Profiler Tech Report Update
- PR: #35412
Topology Solver: Adjacency Graphs and Constraints API
- PR: #35769
Added codeowners for docs without owners
- PR: #35763
Support two risc in UDM mode
- PR: #35327
Add specialized Distributed Layernorm for DiT models
- PR: #35657
#35670: create a new job to determine the runner labels for Git Dispatch workflow
- PR: #35671
Add a script to count the number of pytest including parametrize expansions given a path
- PR: #35705
Fix N150 profiler
- PR: #35787
ci: Change pr-gate default build-type from ASanCoverage to ASan
- PR: #35779

Contributors

mateusznowakTT

Assets 27

13 Jan 17:14

github-actions

Immutable

v0.66.0-dev20260113

199056b

v0.66.0-dev20260113 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20939760521

📦 Uncategorized

[Fabric] Disable 80B header for 2D
- PR: #35584
Add override_output_sharding_config param to BlockShardedStrategyConfiguration
- PR: #35465
[skip ci] Enhance run_conv2d_short_sweep function to accept additional params
- PR: #35640
Deepseek module changes to ensure compatibility with higher sequence lengths
- PR: #35370
Improving error messages across owned scripts
- PR: #35649
Ring Attention datamovement optimization
- PR: #34929
Improve performance of accurate exponential
- PR: #32968
Fix dead store warnings in ternary_program_factory.cpp
- PR: #35583
Optimize SD Profiler Reads
- PR: #35581
[skip CI] Fixes for t3k demo pipeline changes
- PR: #35668
Add DeepSeekV3 unit tests to T3K unit and APC pipelines
- PR: #35409
Trigger 2x WH GLX similar to T3K multihosts
- PR: #35554
Remove '_no_pack' Tilize Variants
- PR: #35557
Fix parameter shadowing bug in BlockRep constructor
- PR: #35579
Use multicast when initializing metal context
- PR: #35188
Increased timeout for t3k integration llama3 test
- PR: #35674
Add dst addrs to NoC async read/write debug packets
- PR: #35414
Optimize page size in traces for performance.
- PR: #34752
Test fixes after moving 2.0 into experimental
- PR: #35675
33696: Remove sub_device_manager_tracker from device
- PR: #35452
[skip ci] upstream image: give other users r/o permissions to the home directory
- PR: #35642
fix tracy .str conversion for when special_parent_text col is empty
- PR: #35687
Update tt-logger version to 1.1.7
- PR: #35599
Move DPRINT parsing logic to separate class
- PR: #33161
Fix Qwen garbage output
- PR: #35555
Cleanup dispatch_core_common.hpp
- PR: #35489
Remove metal_soc_descriptor.h from public Runtime API
- PR: #34178
Fix OOM in XQKV prefill matmul on P100 Llama 8b
- PR: #35683

Assets 27

13 Jan 00:34

github-actions

Immutable

v0.66.0-dev20260112

b493a7b

v0.66.0-dev20260112 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20904408807

📦 Uncategorized

Improve accuracy of atan/atan2
- PR: #35470
Fix static analyzer false positive in device_operation.hpp
- PR: #35588
Modify unary_bcast API in metal to add new data formats
- PR: #35304
Fix ring matmul runtime arg hang and bad outputs in llama70b
- PR: #35368
[skip CI] Fixes for t3k pipeline changes
- PR: #35602

Assets 27

12 Jan 00:36

github-actions

v0.66.0-dev20260111

0741d5c

v0.66.0-dev20260111 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20886677172

📦 Uncategorized

[Fabric] Add device freq validation for perf modes
- PR: #35301
Cleanup rms norm function
- PR: #35455
[skip ci] Cleanup "using" in llrt.hpp
- PR: #35568
Migrate op to new infra: matmul
- PR: #34466
Vit bh combined tech report
- PR: #35567
Strip out unused symbols for the bfloat utilties
- PR: #34364
Add time budget controls for t3k pipelines and renames frequent, nightly, model perf to integration, e2e, perf
- PR: #35551
fix-matmul-wrong-clang-tidy-fix
- PR: #35582
Trace Deepseek V3 on 1x Galaxy
- PR: #35507
Fix clang-tidy misc-unused-params warnings
- PR: #35433

Assets 26

11 Jan 00:39

github-actions

v0.66.0-dev20260110

320939d

v0.66.0-dev20260110 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20869506581

📦 Uncategorized

Migrate op to new infra: FusedRMSNormPreAllGather
- PR: #35488
Force Single ERSIC Kernel Execution in run_cluster_validation
- PR: #35476
Telemetry: Add fabric bandwidth telemetry metrics (v2)
- PR: #34816
Fix models/common/tests CI failure
- PR: #35482
Increase Vovnet treshold
- PR: #35267
[skip ci] #35313 [GPT-OSS] disable 4k prefill unit tests
- PR: #35505
Relax T3K Qwen2.5-Coder-32B CI target
- PR: #35324
Fix T3K Mixtral Perplexity tests with missing is_mixture_of_experts flag
- PR: #35254
#32289: remove duplicate file
- PR: #32637
Fixes after prefix caching
- PR: #35503
[tt-train] Add KV cache support to tt-train's LLaMA
- PR: #33169
Changing DM PCC check to bitwise comparison
- PR: #35403
Improve accuracy of tanh on float32
- PR: #34927
migrate vision encoder unit test to HF
- PR: #35448
#35236: remove deepseek blitz ops tests from models unit tests
- PR: #35518
add missing pytest import
- PR: #35517
Sagarwal/profiler noc trace bug
- PR: #33730
Update on profiler CI options and remove other nightlies
- PR: #35513
Consolidate fabric init postcodes and telemetry status
- PR: #35481
PR: Fix rotary_embedding_llama sweep test with proper golden function
- PR: #34742
Mbahnas/vit bh hires 1211
- PR: #35426
Updating CB doc to indicate there is only 1 reader and 1 writer
- PR: #34282
Generalize timeStampedData function
- PR: #35479
Moving 2.0 apis into experimental and updating compute kernels to use CB abstraction
- PR: #35495
[Fabric] Fix ubench pipeline
- PR: #35471
TT-Transformers version 2 modules -- MLP
- PR: #35095
#35342: Revert PR #32045
- PR: #35492
Bump ttsim version to v1.2.0
- PR: #35562
use subordinate_sync correctly
- PR: #35566

Assets 26

10 Jan 10:21

github-actions

v0.65.1

558a196

v0.65.1

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20869526990

📦 Uncategorized

Remove prefetcher dangling reference from previous test
- PR: #35061
Fix batched prefill pcc issue
- PR: #35059
Llama-3.1-8B decode TSU optimizations
- PR: #35142
[skip ci] Re-gen Docker containers (#35305)
- PR: #35438

What's Changed

MM/Fused/Reduce Docs Touchups by @edwinleeTT in #33510
[skip ci] TG Resnet50 test_perf_trace_2cqs tweak by @astancovTT in #33672
Making check arc robust to firmware version when calculating uptime by @adjordjevic-TT in #33592
[skip ci] Update tt_transformers docs (and comments) to remove mentions of LLAMA_DIR by @gwangTT in #33669
Migrate op to new infra: send_async by @kevinwuTT in #33005
Migrate op to new infra: recv_async by @kevinwuTT in #33200
#33584: SDXL demo/accuracy test e2e time reporting by @ipotkonjak-tt in #33646
split_work_to_cores pybind in ttnn module by @arichinsTT in #32997
Migrate op to new infra: rotary_embedding_llama_fused_qk by @philei-tt in #33655
Fix GraphQL query in set-opened-on workflow by @jbakerTT in #33687
Adding 1x glx demo test + testing in CI for Deepseek V3 by @yalrawwashTT in #33508
Fix motif galaxy demo test by @sadesoyeTT in #33688
[Reduce APC] Remove cpp N150 runs from APC by @kkabilarTT in #33683
Fix torch reference tensor in sharded layernorm tests by @rmillerTT in #33564
#32710: Migrate op to new infra: nlp_create_qkv_heads_decode by @ssundaramTT in #33516
Removing devicePool from the API by @mfiltser-TT in #33668
add more cores to 2x harvesting to support further harvested P150s by @yugaoTT in #33613
Migrate op to new infra: conv2d by @awliu-TT in #33019
Enable 2 ERISC mode on bh glx upstream tests by @nnyamagoudar-TT in #33571
chore: update LLK submodule to 91fa6c2 by @fvranicTT in #33685
Migrate op to new infra: nlp_create_qkv_heads_vit by @philei-tt in #33658
Migrate op to new infra: nlp_kv_cache_load_slice by @philei-tt in #33650
[skip ci] updating auto triage token by @ebanerjeeTT in #33700
#0: Fix (Galaxy) Demo - Motif job by adding NO_PROMPT variable by @dimitri-tenstorrent in #33712
1D to support 1x32 chip routing by @daminakaTT in #32575
Fix check_noc_status on non-default setups by @jbaumanTT in #33522
Increasing timeout for manual hang detection in triage tests, enable logging when test fails by @adjordjevic-TT in #33670
Migrate op to new infra: nlp_create_qkv_heads_segformer by @philei-tt in #33664
Added a fix to the invalid test in test_strided_all_gather_minimal_matmul_async for t3k by @jvegaTT in #33717
fix to layout regression by @jvegaTT in #33665
add missing write barrier after noc_semaphore_set by @kpaigwar in #33710
[skip ci] Fix and simplify set-opened-on workflow by @jbakerTT in #33724
[skip ci] Add libc++ to CMakePresets by @afuller-TT in #33727
Add dynamic power throttling to BH by @rdjogoTT in #33627
Migrate op to new infra: bcast by @vtsilytskyiTT in #33657
Remove checkout from setup-job action to eliminate SHA-pinned rollout hazard by @Copilot in #33691
Add additional argument handling in graph serializer by @dgomezTT in #29563
New Model: TG Qwen3-32b with TG Llama3-70b Optimizations at 65 t/s/u by @ricozhu-TT in #31018
Migrate op to new infra: prod_nc by @shutovilyaep in #33562
Migrate op to new infra: prod_all by @shutovilyaep in #33568
Migrate op to new infra: argmax by @shutovilyaep in #33310
enhance fetching time from dram to l1 cmddat_q through prefetching by @mdingTT in #33537
Whisper - decoder and encoder optimization by @mbahnasTT in #33450
[skip ci] Adding support for 4x timeshare by @akirby-TT in #33693
Remove silent defaults from DeepSeekV3 demo and tests by @esmalTT in #33686
optimize bh fabric rx ack credit path by @SeanNijjar in #33524
#32626: Remove Moreh operations from docs by @mgajewskiTT in #33765
Set auto triage to false and revert 8dfb324 by @dpopovTT in #33774
Add ClosetBox fabric test configuration by @jpanasiukTT in #33596
Implement in-memory GSD/FSD validation to avoid disk I/O by @jpanasiukTT in #33055
Re-enable Triage tests by @afuller-TT in #33529
Autopacketization support for fabric data movement - Part 1 by @tlevinTT in #33081
Revert "Implement in-memory GSD/FSD validation to avoid disk I/O (#33055)" by @tt-rkim in #33792
Fix set-opened-on workflow: null handling, token validation, and pagination by @jbakerTT in #33781
Run paged llama attention prefill unit test instead of default attention by @alingTT in #33681
Decreasing low threshold for heartbeat per seconds to avoid ND test fail by @adjordjevic-TT in #33788
Add full grid worker forwarding channels for UDM Mux [4/n] by @yugaoTT in #33364
SFPI 7.12.0 by @nathan-TT in #33720
[skip ci] Fix GraphQL date type in set-opened-on workflow by @jbakerTT in #33797
Update Falcon7b PCC and expected output jsons after layernorm op changes broke tests by @skhorasganiTT in #33786
Fixing print tests with ND failures in tt-sim by @kstevensTT in #33520
Enabling tt-triage in APC by @tt-vjovanovic in #33798
Allow the profiler DRAM buffer size to be dynamically allocated depending on a user-specified op count by @sagarwalTT in #33004
remove deprecated fabric latency tests by @SeanNijjar in #33762
Fixing 1D Mapping Algorithm in Mesh Device for flipped coords by @Riddy21 in #33499
[Reduce APC] Run ccl from cpp-unit-tests on merge gate by @kkabilarTT in #33731
Updated CODEOWNERS to include all files in ttnn/cpp/ttnn/deprecated/ by @fplavecTT in #33807
[Fabric] fabric unicast scatter multi-chunk by @daminakaTT in #32395
Optimize fused strided all gather and minimal matmul to read local slice from AG input by @jonathansuTT in #33703
Improve accuracy of accurate sigmoid_tile by @nmauriceTT in #31266
[skip ci] Remove cron jobs from t3k and galaxy demo tests by @dpopovTT in #33814
[Fabric] telemetry to be controled by env var in more detail by @daminakaTT in #33523
[skip ci] Optimize set-opened-on workflow by hardcoding IDs by @jbakerTT in #33810
TT-Fabric Intermesh Traffic VC (VC1) Support [1/n] by @ubcheema in #33750
[skip ci] Refactor wheel building CI job by @afuller-TT in #33530
Mi...

Contributors

jasondavies, yieldthought, and 121 other contributors

Assets 17

13 Jan 22:36

dpopovTT

Immutable

v0.65.0

d50ae8a

v0.65.0 Latest

Latest

TT-Metal v0.65.0 Release Notes

This release contains significant improvements and new features.

Changes

See CHANGELOG.txt for detailed commit history.

Installation

Refer to INSTALLING.md for installation instructions.

Model Updates

New

New Op Infrastructure Enablement for LLM & Diffusion Models
Core transformer execution paths (QKV, rotary embeddings, SDPA decode) migrated to the new op infra, forming the backbone for scalable LLM and diffusion support.
PR #33209 – Migrate op to new infra: sdpa_decode

Model Performance & Accuracy Updates

Stable Diffusion / SDXL Accuracy Fix
Corrected SDXL VAE accuracy issues that impacted image quality and downstream validation.
PR #33156 – SDXL vae batch encode accuracy fix

Improvements and New Features

Sub-Core Grid Scaling Across Ops
Enabled sub-core grid support for core unary ops, unblocking better utilization and scaling on large devices.
PR #33157 – Add sub_core_grids to unary infra and ops
Numerical Accuracy Fixes in Core Math Ops
Fixed accuracy issues in exponential-related ops that directly affect model convergence and output quality.
PR #33139 – Fix expm1 accuracy
Large-Kernel Support
Added support for huge kernels, enabling execution of larger and more complex workloads without fragmentation.
PR #32956 – Huge kernel support
Improved Error Propagation in Build System
Ensured exceptions in build threads correctly propagate to the main thread, preventing silent failures.
PR #33205 – Ensure exceptions in build threads are propagated
Fabric Router Heartbeat
Added heartbeat support to the fabric router, significantly improving detection of stalled or unhealthy links.
PR #31255 – Fabric router heartbeat feature
Telemetry Firmware Visibility
Exposed remaining firmware versions via telemetry, improving fleet visibility and debugging.
PR #33158 – Telemetry: Expose remaining firmware versions
CI & Workflow Hardening
Embedded pytest commands directly into Galaxy workflows, reducing CI flakiness and improving debuggability.
PR #32991 – Embed Pytest commands in Galaxy workflows

Full Changelog: v0.64.5...v0.65.0

Assets 27

CHANGELOG.txt

sha256:b0bdcf5157482e2476886f00e61f653737f838e31350a17c26d59e17e4e528a7

16.2 KB 2026-01-10T00:22:48Z
INSTALLING.md

sha256:1d4fb01e6b2344868a1c09453b6c1c3b1d353bb6bfc243f7e3615bfc5153d4af

8.54 KB 2026-01-10T00:22:48Z
MODEL_UPDATES.md

sha256:85628f5132241adcbe1cd00c5b556fcea7703e2afebc6ea39d3f5a8d7a565c60

13.3 KB 2026-01-10T00:22:48Z
README.md

sha256:e7420ab263cffca5b610560942738c877789455cd639afbfaf576f555391022c

15.4 KB 2026-01-10T00:22:48Z
tt-metalium-dev_0.65.0.rc14.ubuntu22.04_amd64.deb

sha256:05f785c5ae75629870c1c7f4668016b4f23f097c253ae18d67db015c854737d2

703 KB 2026-01-13T22:35:26Z
tt-metalium-dev_0.65.0.rc14.ubuntu24.04_amd64.deb

sha256:6d3755900e0180aa2f1eff8118b66d47bf32d0f93ff7fa2aa0e7c5fab5d98240

703 KB 2026-01-13T22:35:26Z
tt-metalium-examples_0.65.0.rc14.ubuntu22.04_amd64.deb

sha256:76c538ccf44311d08cf54be7f096d8a74636a1ebbf1bbe3e724ceadba5617424

66.4 KB 2026-01-13T22:35:26Z
tt-metalium-examples_0.65.0.rc14.ubuntu24.04_amd64.deb

sha256:d406a8ae98e75cf2bd12aa850d7b5ffad53f0259bd722658afb154f7d7094046

66.4 KB 2026-01-13T22:35:27Z
tt-metalium-jit_0.65.0.rc14.ubuntu22.04_amd64.deb

sha256:e5d1e65f61e06a02f318f9959de5425efc787117aeebe4bfe3fd6a501b3413e7

152 MB 2026-01-13T22:35:27Z
tt-metalium-jit_0.65.0.rc14.ubuntu24.04_amd64.deb

sha256:545a0215f1a456eca5b3933f2747baba65d7aeb9c3e22b8b0930c6672c1605f8

152 MB 2026-01-13T22:35:27Z
Source code (zip)

2026-01-10T00:20:04Z
Source code (tar.gz)

2026-01-10T00:20:04Z
Release attestation (json)

2026-01-10T00:20:04Z

09 Jan 14:22

github-actions

v0.66.0-dev20260109

f18f1d8

v0.66.0-dev20260109 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/20836658890

📦 Uncategorized

[skip ci] Add timeout to package installation step
- PR: #35417
Expanding module tests to ensure added seq len functionality for Deepseek 671B model
- PR: #34177
Fix BH performance: Remove unnecessary NOC_BRCST_EXCLUDE resets
- PR: #34368
Graph tracing improvement
- PR: #34263
#35326: Add Deepseek Blitz unit tests to CI
- PR: #35344
Make allocate_tensor_on_device private, use create_device_tensor instead
- PR: #32948
[Fabric] Add infra for dynamic packet header sizing
- PR: #34976
Add sweeps for new model traced ops
- PR: #35361
Improve Out of Memory Error Message
- PR: #32150
[skip ci] update gpt-oss README
- PR: #35398
Add teacher forcing demo test for Deepseek 671B model
- PR: #33967
[DM] Update data movement tests
- PR: #35026
#34947: ttnn_tracer_model ttnn tutorial fix
- PR: #35372
Add memory usage tracking for DRAM & L1 in training loop
- PR: #35316
#0: [skip ci] Add P100 support in git bisect
- PR: #35427
Update ttexalens reference version to 0.2.0
- PR: #35451
[skip ci] Enable t3k demo tests cron job
- PR: #35453
adds TT_METAL_JIT_ANALYTICS environment variable
- PR: #35388
Add support for Automatic Prefix Caching in TT-Transformers
- PR: #33883
Reenable fabric manager tests in Galaxy Quick
- PR: #35402
#32983: Remove some initial calls to test_system_health as it's being deprecated
- PR: #35094
Expose Hyperparams to Standard Namespace AG & RS
- PR: #35322
Strip unused symbols in sub_device.hpp
- PR: #34348
Launch dispatch kernels in parallel on multiple devices
- PR: #34750
[skip ci] Update Wheel Artifact Naming Convention in CI
- PR: #35432
Reduce channel count when not all channels are needed.
- PR: #35155
allow subordinate_sync_t per architecture
- PR: #35399
[skip ci] Add bh demo tests and bh multi card test to release testing
- PR: #35469
[skip ci] Optimize clang-tidy presets: disable tt-train and switch to Debug config
- PR: #35475
Apascual/30094 test mixtral decoder against hf
- PR: #35138
[skip ci] update merge gate alerts
- PR: #35478
[TT-Train] GSM8K Finetuning example with dashboard and Galaxy support
- PR: #31108
Fix swapped BASE_DIRS in kernel_helper_functions CMakeLists.txt
- PR: #35477
Moved get_batch_size to shape file
- PR: #32873
Move compute_flat_indices to shape
- PR: #32862
Add owners of vLLM integration tech report
- PR: #35480
feat: refactor import_tracy_op_logs
- PR: #35310
Migrate op to new infra: all_gather_async
- PR: #34975
[skip ci] zstd for .debs
- PR: #35466
#35441: Fix ttnn.visualize_tensor() crash on multi-host systems
- PR: #35464
Haibo sun/issue#29156
- PR: #35349

Assets 26

Releases: tenstorrent/tt-metal

v0.66.0-dev20260116

📦 Uncategorized

Uh oh!

v0.66.0-dev20260115

📦 Uncategorized

Uh oh!

v0.66.0-dev20260114

📦 Uncategorized

Contributors

Uh oh!

v0.66.0-dev20260113

📦 Uncategorized

Uh oh!

v0.66.0-dev20260112

📦 Uncategorized

Uh oh!

v0.66.0-dev20260111

📦 Uncategorized

Uh oh!

v0.66.0-dev20260110

📦 Uncategorized

Uh oh!

v0.65.1

📦 Uncategorized

What's Changed

Contributors

Uh oh!

v0.65.0

TT-Metal v0.65.0 Release Notes

Changes

Installation

Model Updates

New

Model Performance & Accuracy Updates

Improvements and New Features

Uh oh!

v0.66.0-dev20260109

📦 Uncategorized

Uh oh!