Summary of Changes

New README.md and Model Support doc structure
meta-llama/Llama-3.3-70B-Instruct
- TTFT optimizations across 2k-64k seqlen range for WH-Galaxy implementation
- Decode TPS optimizations across whole seqlen range for WH-Galaxy implementation
Qwen/Qwen3-32B
- TTFT optimizations across 1k-64k ISL range for WH-Galaxy implementation
- Decode TPS optimizations across whole seqlen range for WH-Galaxy implementation
meta-llama/Llama-3.1-8B-Instruct
- TTFT optimizations across 1-4k seqlen range for WH-Galaxy implementation
openai/gpt-oss-20b
- experimental status at the moment, performance optimizations and vLLM features to come in next release
- supported devices: T3K, Galaxy
- maximum model length set to 1024 on T3K, 128k on Galaxy
- maximum concurrency set to 1 on T3K, 32 on Galaxy
openai/gpt-oss-120b
- experimental status at the moment, performance optimizations and vLLM features to come in next release
- supported devices: T3K, Galaxy
- maximum model length set to 1024 on T3K, 128k on Galaxy
- maximum concurrency set to 1 on T3K, 32 on Galaxy
BAAI/bge-large-en-v1.5
- supported devices: N150, N300, N300 with TP=2, T3K, Galaxy
- maximum ISL set to 384
- maximum concurrency set to 8, or 16 if TP=2
- added default configuration in https://github.com/tenstorrent/tt-inference-server/blob/dev/tt-media-server/config/constants.py
Qwen/Qwen3-Embedding-8B
- supported devices: N150, N300, N300 with TP=2, T3K, Galaxy
- configurable maximum ISL up to 1024 for N150 and Galaxy, or 4096 for N300 and T3K
- configurable maximum concurrency for up to 32 for T3K
- added default configuration in https://github.com/tenstorrent/tt-inference-server/blob/dev/tt-media-server/config/constants.py
genmo/mochi-1-preview
- supported devices: T3K, Galaxy
- DP1 for both
- added default configuration in https://github.com/tenstorrent/tt-inference-server/blob/dev/tt-media-server/config/constants.py
Wan-AI/Wan2.2-T2V-A14B
- supported devices: T3K, Galaxy
- DP1 for both
- added default configuration in https://github.com/tenstorrent/tt-inference-server/blob/dev/tt-media-server/config/constants.py

SW versions recommended for Wormhole Galaxy:

tt-smi: 3.0.38
Firmware: 19.2.0
tt-kmd: 2.5.0

Model Spec Release Updates

This document shows model specification updates.

Impl	Model Arch	Weights	Devices	TT-Metal Commit Change	Status Change	CI Job Link
`qwen3_32b_galaxy`	`Qwen3-32B`	`Qwen/Qwen3-32B`	GALAXY	`a9b09e0` → `65718bb`	No change	N/A
`llama3_70b_galaxy`	`Llama-3.3-70B-Instruct`	`meta-llama/Llama-3.3-70B-Instruct` `meta-llama/Llama-3.1-70B` `meta-llama/Llama-3.1-70B-Instruct` `deepseek-ai/DeepSeek-R1-Distill-Llama-70B`	GALAXY	`a9b09e0` → `65718bb`	No change	N/A
`tt_transformers`	`Llama-3.1-8B`	`meta-llama/Llama-3.1-8B` `meta-llama/Llama-3.1-8B-Instruct`	GALAXY, GALAXY_T3K	`a9b09e0` → `65718bb`	No change	N/A
`tt_transformers`	`mochi-1-preview`	`genmo/mochi-1-preview`	T3K, GALAXY	`c180ef7` → `65718bb`	No change	N/A
`tt_transformers`	`Wan2.2-T2V-A14B-Diffusers`	`Wan-AI/Wan2.2-T2V-A14B-Diffusers`	T3K, GALAXY	`c180ef7` → `65718bb`	No change	N/A
`tt_transformers`	`stable-diffusion-xl-base-1.0`	`stabilityai/stable-diffusion-xl-base-1.0` `stabilityai/stable-diffusion-xl-base-1.0-img-2-img`	N150, N300, T3K, GALAXY	`a9b09e0` → `65718bb`	No change	N/A
`whisper`	`whisper-large-v3`	`openai/whisper-large-v3` `distil-whisper/distil-large-v3`	N150, GALAXY, T3K	`a9b09e0` → `65718bb`	No change	N/A
`tt_vllm_plugin`	`bge-large-en-v1.5`	`BAAI/bge-large-en-v1.5`	N150, N300, T3K, GALAXY	`2496be4` → `65718bb`	No change	N/A
`tt_transformers`	`Qwen3-Embedding-8B`	`Qwen/Qwen3-Embedding-8B`	N150, N300, T3K, GALAXY	`2496be4` → `65718bb`	No change	N/A

Release Artifacts Summary

Images Promoted from Models CI

Existing Images with Models CI reference

Images that already exist on remote and have a valid Models CI image available.

No existing images with Models CI reference.

Existing Images without Models CI reference

Images that already exist on remote but have no valid Models CI reference (manually built/pushed).

Total: 5

Docker Images Requiring New Builds

Note: Model Specs added outside of Models CI will need to have Docker images built manually and will show up here if not already existing. This will happen by design when a release happens and the VERSION file is incremented.

Total: 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.0

Choose a tag to compare

Sorry, something went wrong.