Summary of Changes
- New README.md and Model Support doc structure
- meta-llama/Llama-3.3-70B-Instruct
- TTFT optimizations across 2k-64k seqlen range for WH-Galaxy implementation
- Decode TPS optimizations across whole seqlen range for WH-Galaxy implementation
- Qwen/Qwen3-32B
- TTFT optimizations across 1k-64k ISL range for WH-Galaxy implementation
- Decode TPS optimizations across whole seqlen range for WH-Galaxy implementation
- meta-llama/Llama-3.1-8B-Instruct
- TTFT optimizations across 1-4k seqlen range for WH-Galaxy implementation
- openai/gpt-oss-20b
- experimental status at the moment, performance optimizations and vLLM features to come in next release
- supported devices: T3K, Galaxy
- maximum model length set to 1024 on T3K, 128k on Galaxy
- maximum concurrency set to 1 on T3K, 32 on Galaxy
- openai/gpt-oss-120b
- experimental status at the moment, performance optimizations and vLLM features to come in next release
- supported devices: T3K, Galaxy
- maximum model length set to 1024 on T3K, 128k on Galaxy
- maximum concurrency set to 1 on T3K, 32 on Galaxy
- BAAI/bge-large-en-v1.5
- supported devices: N150, N300, N300 with TP=2, T3K, Galaxy
- maximum ISL set to 384
- maximum concurrency set to 8, or 16 if TP=2
- added default configuration in https://github.com/tenstorrent/tt-inference-server/blob/dev/tt-media-server/config/constants.py
- Qwen/Qwen3-Embedding-8B
- supported devices: N150, N300, N300 with TP=2, T3K, Galaxy
- configurable maximum ISL up to 1024 for N150 and Galaxy, or 4096 for N300 and T3K
- configurable maximum concurrency for up to 32 for T3K
- added default configuration in https://github.com/tenstorrent/tt-inference-server/blob/dev/tt-media-server/config/constants.py
- genmo/mochi-1-preview
- supported devices: T3K, Galaxy
- DP1 for both
- added default configuration in https://github.com/tenstorrent/tt-inference-server/blob/dev/tt-media-server/config/constants.py
- Wan-AI/Wan2.2-T2V-A14B
- supported devices: T3K, Galaxy
- DP1 for both
- added default configuration in https://github.com/tenstorrent/tt-inference-server/blob/dev/tt-media-server/config/constants.py
SW versions recommended for Wormhole Galaxy:
- tt-smi: 3.0.38
- Firmware: 19.2.0
- tt-kmd: 2.5.0
Model Spec Release Updates
This document shows model specification updates.
| Impl | Model Arch | Weights | Devices | TT-Metal Commit Change | Status Change | CI Job Link |
|---|---|---|---|---|---|---|
qwen3_32b_galaxy |
Qwen3-32B |
Qwen/Qwen3-32B |
GALAXY | a9b09e0 → 65718bb |
No change | N/A |
llama3_70b_galaxy |
Llama-3.3-70B-Instruct |
meta-llama/Llama-3.3-70B-Instructmeta-llama/Llama-3.1-70Bmeta-llama/Llama-3.1-70B-Instructdeepseek-ai/DeepSeek-R1-Distill-Llama-70B |
GALAXY | a9b09e0 → 65718bb |
No change | N/A |
tt_transformers |
Llama-3.1-8B |
meta-llama/Llama-3.1-8Bmeta-llama/Llama-3.1-8B-Instruct |
GALAXY, GALAXY_T3K | a9b09e0 → 65718bb |
No change | N/A |
tt_transformers |
mochi-1-preview |
genmo/mochi-1-preview |
T3K, GALAXY | c180ef7 → 65718bb |
No change | N/A |
tt_transformers |
Wan2.2-T2V-A14B-Diffusers |
Wan-AI/Wan2.2-T2V-A14B-Diffusers |
T3K, GALAXY | c180ef7 → 65718bb |
No change | N/A |
tt_transformers |
stable-diffusion-xl-base-1.0 |
stabilityai/stable-diffusion-xl-base-1.0stabilityai/stable-diffusion-xl-base-1.0-img-2-img |
N150, N300, T3K, GALAXY | a9b09e0 → 65718bb |
No change | N/A |
whisper |
whisper-large-v3 |
openai/whisper-large-v3distil-whisper/distil-large-v3 |
N150, GALAXY, T3K | a9b09e0 → 65718bb |
No change | N/A |
tt_vllm_plugin |
bge-large-en-v1.5 |
BAAI/bge-large-en-v1.5 |
N150, N300, T3K, GALAXY | 2496be4 → 65718bb |
No change | N/A |
tt_transformers |
Qwen3-Embedding-8B |
Qwen/Qwen3-Embedding-8B |
N150, N300, T3K, GALAXY | 2496be4 → 65718bb |
No change | N/A |
Release Artifacts Summary
Images Promoted from Models CI
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-65718bb-409b1cd
- https://ghcr.io/tenstorrent/tt-media-inference-server:0.9.0-65718bb
Existing Images with Models CI reference
Images that already exist on remote and have a valid Models CI image available.
No existing images with Models CI reference.
Existing Images without Models CI reference
Images that already exist on remote but have no valid Models CI reference (manually built/pushed).
- https://ghcr.io/tenstorrent/tt-media-inference-server:0.2.0-2496be4518bca0a7a5b497a4cda3cfe7e2f59756
- https://ghcr.io/tenstorrent/tt-media-inference-server:0.5.0-fbbbd2da8cfab49ddf43d28dd9c0813a3c3ee2bd
- https://ghcr.io/tenstorrent/tt-shield/tt-media-inference-server-forge:a9b09e0b611da6deb4d8972e8296148fd864e5fd_98dcf62_60920940673
Total: 5
Docker Images Requiring New Builds
Note: Model Specs added outside of Models CI will need to have Docker images built manually and will show up here if not already existing. This will happen by design when a release happens and the VERSION file is incremented.
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-0b10c51-3499ffa
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-13f44c5-0edd242
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-17a5973-aa4ae1e
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-20edc39-03cb300
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-25305db-6e67d2d
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-55fd115-aa4ae1e
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-5b5db8a-e771fff
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-60ffb199-3499ffa1
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-9b67e09-a91b644
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-ae65ee5-35f023f
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-c18569e-b2894d3
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-c254ee3-c4f2327
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-e95ffa5-48eba14
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-v0.61.1-rc1-5cbc982
- https://ghcr.io/tenstorrent/tt-inference-server/vllm-tt-metal-src-release-ubuntu-22.04-amd64:0.9.0-v0.62.0-rc33-e7c329b
- https://ghcr.io/tenstorrent/tt-media-inference-server:0.9.0-a9b09e0
- https://ghcr.io/tenstorrent/tt-media-inference-server:0.9.0-be88351
- https://ghcr.io/tenstorrent/tt-media-inference-server:0.9.0-c180ef7
Total: 18