Releases · tenstorrent/tt-metal

11 Feb 00:55

github-actions

Immutable

v0.67.0-dev20260210

807ee3d

v0.67.0-dev20260210 Pre-release

Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21846858892

📦 Uncategorized

[skip ci] Disable pytest timeout for Stable Diffusion device perf tests
- PR: #37301
Implement KV store-and-forward chain optimization for non-causal SDPA
- PR: #37285
[GPT-OSS] Add high throughput model to vLLM nightly
- PR: #37192
Update ResNet50 batch_size=32 performance target for Blackhole
- PR: #37373
Improve tracing tooling to provide the whole inputs for ttnn operations
- PR: #35924
SDXL Relax encoder2 perf targets
- PR: #37375
chore: update LLK submodule to 7e7cf4f
- PR: #37358
Set medgemma's max_prefill_chunk_size the same as gemma-3
- PR: #37305
Fix setuptools pkg_resources issue
- PR: #37417
Update SDXL VAE device perf targets after SDPA KV chain forwarding optimization
- PR: #37382
In post sdpa op, mcast to 13x10 grid
- PR: #37427
Removed program cache when no_dispatch
- PR: #36772
Fix FP32 precision loss in untilize for wide tensors
- PR: #37333
[skip ci] Remove Fabric Sanity Benchmark from BH post-commit tests
- PR: #37430
DeepSeek Blitz MOE routed expert
- PR: #37294
Bump blackhole deepseek blitz op tests timeout
- PR: #37369
Add fix for Qsr packet_tag breaking compilation
- PR: #37428
Use up-to-date main() declaration in all kernels & docs
- PR: #37147
[skip ci] #0: remove mamba from perf models yaml
- PR: #37387
Increase coverage of unpack reconfig
- PR: #36987
Add 4 chunks scatter_write and extra ring optimization to all_to_all_async_generic
- PR: #37339
Optimize decode for Llama3-70B for TG
- PR: #37359
[skip ci] Optimize pkg-resources patch
- PR: #37438
Add the accuracy_tips tech report
- PR: #37146

Assets 27

10 Feb 20:45

github-actions

Immutable

v0.66.0-rc10

719fbb9

v0.66.0-rc10 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21846883882

📦 Uncategorized

Remove sending lm_head persistent_buffer to DRAM
- PR: #37386
Optimize decode for Llama3-70B for TG for stable branch
- PR: #37360

Assets 27

10 Feb 00:52

github-actions

Immutable

v0.67.0-dev20260209

44e8cdc

v0.67.0-dev20260209 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21808434016

📦 Uncategorized

Add Multi-Threaded Test using H <-> D Sockets
- PR: #37341
Migrate device headers from tt_metal/include to tt_metal/hw/inc
- PR: #36583

Assets 27

10 Feb 10:34

github-actions

Immutable

v0.66.0-rc9

49ab40b

v0.66.0-rc9 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21820952145

no changes

Assets 27

09 Feb 00:58

github-actions

Immutable

v0.67.0-dev20260208

2a9fa01

v0.67.0-dev20260208 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21789607116

📦 Uncategorized

Bump ttsim version to v1.3.4
- PR: #37330
Add Support For Fused Shared-Expert Kernel
- PR: #37332
Remove assert on ARCH_NAME in data collection step in workflows
- PR: #37300
Sparse checkout optimizations for workflows
- PR: #37299
Fix various problems seen in test_prefetcher and test_dispatcher
- PR: #37233
chore: update LLK submodule to 02a4c57
- PR: #37351

Assets 27

08 Feb 00:52

github-actions

Immutable

v0.67.0-dev20260207

430d1f4

v0.67.0-dev20260207 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21770820481

📦 Uncategorized

[skip ci] Update vLLM nightly to test sampling
- PR: #37092
Fuse TP/SP Broadcast into pre_sdpa
- PR: #37143
[skip ci] set-cpu-governor VM handling
- PR: #37248
Move tt_dit out of experimental directory
- PR: #34053
chore: update LLK submodule to e050aab
- PR: #37194
#36149: Add native llk kernel for addcdiv
- PR: #36634
Fix Clang Static Analyzer warning: virtual call in GraphProcessor constructor
- PR: #37255
[gpt-oss] attention decode optimizations
- PR: #37190
Fix Blackhole op performance model FPU and DRAM utilization calculations
- PR: #36902
[Gemma3] Test fix: Ref MLP uses float32 for long sequences
- PR: #37263
Fix undefined behavior in fabric worker memory allocation
- PR: #37182
Encapsulate noc non blocking reads in cq into separate files
- PR: #36959
Expose Parameters in all gather
- PR: #37234
[Watcher] In order to get tt-train-cpp-unit group of tests green there was a need to skip some tests with watcher
- PR: #37261
SFPI 7.23.0 243
- PR: #37271
Using known interval to calculate uptime in check arc
- PR: #37277
Fix ttnn.{gcd,lcm} docs.
- PR: #37189
[Watcher] ttnn-unit-test group skips with watcher
- PR: #37043
Add 32x4 quad BH rankbindings file
- PR: #37115
move fabric benchmark test and update golden
- PR: #37224
#37259: add ifdef guard for layernorm kernels
- PR: #37284
[Quasar DFB]: Add support for multi-threaded producer/consumer + make blocked consumer use remapper
- PR: #36916
Add time budget controls for Galaxy model perf -> Galaxy perf pipeline
- PR: #37240
[skip ci] bring back BH GLX tests in CI
- PR: #37297
Add fabric telemetry neighbor node id exchange
- PR: #36872
Remove harvesting info from build_key when coordinate virtualization is enabled
- PR: #37135
[skip ci] Update CODEOWNERS for programming_examples
- PR: #37293
#0: add models timeout for bh
- PR: #37245
Make perf test timeout explicit for stable_diffusion_1_4 model
- PR: #37289
Fix DeepSeek V3 config loading when model path is a symlink
- PR: #37287
Refactor TTNN tests to use shared config for CI and TTSim
- PR: #37095
Add time budget controls for Galaxy demo pipeline
- PR: #37238
Remove tests/scripts/run_tests.sh and stress-fast-dispatch-build-and-unit-tests.yaml pipeline
- PR: #37251
#36852 BinaryNg kernel deadlocks with reshard
- PR: #37229
Fix segfaults on ttnn.ones, ttnn.zeros, ttnn.empty
- PR: #37272
[Merge stable to main] Llama3.3-70b and 3.1-8b - Fix sampling parameters
- PR: #36476
D2H Sockets
- PR: #37164
Fuse Post SDPA with TP All Reduce.
- PR: #37241
[skip ci] Increase timeout for blackhole deepseek blitz tests
- PR: #37331
#36881: add validation check for sharded softmax
- PR: #37329

Assets 27

07 Feb 03:50

github-actions

Immutable

v0.67.0-dev20260206

0474b44

v0.67.0-dev20260206 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21734080303

📦 Uncategorized

Topology Mapper Pinning Regression Tests
- PR: #37002
Remove deprecated Grayskull (tt::ARCH::GRAYSKULL) architecture support
- PR: #36897
latency result superset export
- PR: #36822
[BEVFormer] Update PCC
- PR: #36935
SDXL Refiner Matmul memory configs optimization
- PR: #37049
Fix conv2d reader kernel runtime arg mismatch for height-sharded conv
- PR: #37109
[TT-Transformers] Reduce batch-32 prompt length to avoid some tokenizers going over 1024 tokens
- PR: #37120
SDXL override global timeot
- PR: #37177
Remove unused type declarations in tt_metal identified by clangd
- PR: #37155
[skip ci] Reduce CMake install message verbosity for incremental builds
- PR: #37084
#36225: Handle Binary_op_type for mixed dtypes - FPU for EQ
- PR: #37165
SDXL disable timeout
- PR: #37181
add per core compile args
- PR: #37070
Add fused post sdpa op
- PR: #37082
#28532: Add Installer validation as a CI workflow
- PR: #36656
Add time budget controls for Galaxy frequent pipeline -> now Galaxy integration pipeline
- PR: #36857
[skip ci] fix(copilot-autofix-clangsa): fix broken pipe error in jq query
- PR: #37179
Improve custom_mm to performantly cover more shapes and enable transpose
- PR: #37121
[skip ci] Add workflow comparison script for CI analysis
- PR: #36995
[tt-train] TP+DP Llama training
- PR: #35284
#23354 more data type support for llk bcast
- PR: #36054
Fix noc debugging tool test when run back to back
- PR: #37140
[skip ci] Increase timeouts for longer running BH multicard model tests
- PR: #37206
Fix hard-coded action hash causing CI errors
- PR: #37158
Add versioning system to fabric telemetry
- PR: #36998
Increase hang detection timeout for data movement tests
- PR: #37214
#0 - Tests scripts update
- PR: #37116
[deepseek] Fix test_model decode reference for non‑zero position ids
- PR: #37006
Lower Tensor Utilities to Runtime Staging Area
- PR: #36621
Add commands to do packed large linear reads/unicast writes
- PR: #36664
[skip ci] Add pytest timeout flags to long-running model tests in CI
- PR: #37174
Add new all_to_all_dispatch variant for DeepSeek that supports multiple algorithms, fabric mux variants, and persistent buffer/semaphore optimizations
- PR: #36831
Add scattered core support for gather operation
- PR: #37145
Support for local tile reduce using DST accum
- PR: #37151
Add support for 'export TT_METAL_DISABLE_SFPLOADMACRO=1'
- PR: #37222
Remove unused types from ttnn
- PR: #37163
Optimized number of workers for ReduceScatterMinimalMatmul for Llama 70B on Galaxy
- PR: #36992
Extend CB tests
- PR: #37230

Assets 27

06 Feb 03:11

github-actions

Immutable

v0.67.0-dev20260205

c1416c2

v0.67.0-dev20260205 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21694041502

📦 Uncategorized

TTNN Tensor Creation APIs - Update Docs, Create Example
- PR: #36789
[skip ci]: Adding additional codeowners to dataflow buffers
- PR: #37064
[skip ci] Split out file lists from infra definitions for code ownership purposes
- PR: #36875
H2D Sockets
- PR: #36909
Move blitz CCL to generic framework and fuse CCL Broadcast and RMSNorm
- PR: #36679
Add fabric API for querying neighboring devices
- PR: #36957
Fuse KV Cache to Main
- PR: #36988
Add option to pass chunk_start_idx as tensor.
- PR: #36936
[UMD Bump] Automated UMD Bump 02.02.2026
- PR: #36953
Fix demo test workflows failing on schedule triggers
- PR: #37083
[ROTATE] Implementation of rotate bilinear operation
- PR: #34212
chore: update LLK submodule to ace8fa5
- PR: #37102
[#36026] Improve analyze_validation_results.sh for faster operator triage
- PR: #36728
#37098: add missing unused runtime arg to softmax
- PR: #37110
[#37052] Merge cluster configs and allow overlapping hostnames for intra pod config merge
- PR: #36770
[skip ci] temp skip BH GLX tests until another glx is back in CI
- PR: #36871
[tt-train] AdamW as a fused operation
- PR: #33076
Implemented tilize support for width sharded case
- PR: #36836
Simplifying untilize ND sharding kernel logic
- PR: #37001
Matmul - Port Batched DRAM MM to BH
- PR: #36859
Quad GLX Deepseek CI Improvements
- PR: #37017
Make perf test timeout explicit for DiT models
- PR: #37093
Update docker to use zstd / OCI
- PR: #36840
TT-Train TTML python module: prefer CPP artifacts built by build_metal.sh, fallback to standalone 'uv pip install' artifacts
- PR: #36511
#36225: inf/-inf fix for ttnn.eq
- PR: #36636
[skip ci] revamp clang tidy job
- PR: #37090
don't use cache write around when not needed
- PR: #37137
Add performance tests for mla deepseek
- PR: #36748
Re-enable saving T3K perf data
- PR: #37009

Assets 27

06 Feb 01:03

github-actions

Immutable

v0.66.0-rc8

f87c34a

v0.66.0-rc8 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21694062915

no changes

Assets 27

04 Feb 11:19

github-actions

Immutable

v0.67.0-dev20260204

099f579

v0.67.0-dev20260204 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/21653460697

📦 Uncategorized

Bump ttsim version to v1.3.3
- PR: #36951
Remove pytest-xdist and custom watchdog system
- PR: #36209
[skip ci] Remove unnecessary TT_METAL_HOME export instructions from programming examples
- PR: #37000
Add SfpuType::[unary_]{max,min}_[u]int32.
- PR: #36893
fix(sweep): correct schedule mappings for lead models and model trace…
- PR: #36834
Exabox CPU only Mock Tests
- PR: #36980
#35535: Fuse QRoPE and QNoPE (matmul3) into fused kernel
- PR: #36264
[skip ci] Fix broken sweep workflow
- PR: #37018
Move RTA sentinel constants
- PR: #36880
[skip ci] Comment out reset_tensix call in teardown
- PR: #37022
Refactor gemma model config
- PR: #36939
Use transaction IDs in kernels with unaligned access
- PR: #33905
Fix non-deterministic pytest test collection in ttnn unit tests
- PR: #36910
#36094: enable large tensor rms norm
- PR: #36979
Bump exalens version
- PR: #37038
Skipping cores in reset when dumping debug bus signals in triage
- PR: #37037
Fix CI failures related to #35077
- PR: #36934
[gpt-oss] fix long context demo
- PR: #37041
move possibly unused var inside useage scope
- PR: #37039
[#36107] Cluster Validation Performance Improvements
- PR: #36405
Add more watcher fields to triage DispatcherCoreData
- PR: #36994
Support writing to device from pinned memory
- PR: #36212
Update targets for galaxy Whisper test
- PR: #37042
Update CI tests for wan2.2 with support for image to video
- PR: #37015
Disabling dumping debug bus signals in CI
- PR: #37058
Add documentation for NOC debug dump. rename env to TT_METAL_NOC_DEBUG_DUMP
- PR: #36976
Revert "#36094: enable large tensor rms norm"
- PR: #37055
Remove OFT and 20 core PDL from CI due to OOM issue
- PR: #36812
Add support for 1x16 fabric testing
- PR: #36874
chore: update LLK submodule to 80e7617
- PR: #37024
#37021: use preferred read/write nocs in ema
- PR: #37072
issue:32603 - memcpy functions should be renamed. resolved
- PR: #35899
#37020: add ifndef to ln kernel to avoid reading undefined vars
- PR: #37067
Fix DPRINT << TSLICE so it prints correct tiles in a loop
- PR: #36868
DeepSeek reduce_to_one
- PR: #36819

Assets 27

Releases: tenstorrent/tt-metal

v0.67.0-dev20260210

📦 Uncategorized

Uh oh!

v0.66.0-rc10

📦 Uncategorized

Uh oh!

v0.67.0-dev20260209

📦 Uncategorized

Uh oh!

v0.66.0-rc9

Uh oh!

v0.67.0-dev20260208

📦 Uncategorized

Uh oh!

v0.67.0-dev20260207

📦 Uncategorized

Uh oh!

v0.67.0-dev20260206

📦 Uncategorized

Uh oh!

v0.67.0-dev20260205

📦 Uncategorized

Uh oh!

v0.66.0-rc8

Uh oh!

v0.67.0-dev20260204

📦 Uncategorized

Uh oh!