Skip to content

tenstorrent/tt-inference-server

Repository files navigation

TT-Inference-Server

Tenstorrent Inference Server (tt-inference-server) is the repo of available model APIs for deploying on Tenstorrent hardware.

Official Repository

https://github.com/tenstorrent/tt-inference-server

Getting Started

Please follow setup instructions for the model you want to serve, Model Name in tables below link to corresponding implementation.

Note: models with Status [πŸ› οΈ Experimental] are under active development. If you encounter setup or stability problems with any model please file an issue and our team will address it.

Model Support

For automated and pre-configured vLLM inference server using Docker please see the Model Readiness Workflows User Guide. The list below shows the default model implementations supported.

Model Weights Hardware Status tt-metal commit vLLM commit Docker Image
AFM-4.5B n300, WH-QuietBox/WH-LoudBox (T3K) πŸ› οΈ Experimental ae65ee5 35f023f 0.3.0-ae65ee5-35f023f
gemma-3-1b-it n150 πŸ› οΈ Experimental c254ee3 c4f2327 0.3.0-c254ee3-c4f2327
gemma-3-4b-it
medgemma-4b-it
n300, n150 πŸ› οΈ Experimental c254ee3 c4f2327 0.3.0-c254ee3-c4f2327
gemma-3-27b-it
medgemma-27b-it
WH-QuietBox/WH-LoudBox (T3K) πŸ› οΈ Experimental c254ee3 c4f2327 0.3.0-c254ee3-c4f2327
Qwen2.5-VL-3B-Instruct n300, WH-QuietBox/WH-LoudBox (T3K), n150 πŸ› οΈ Experimental 5bf679a 48eba14 0.3.0-5bf679a-48eba14
Qwen2.5-VL-7B-Instruct n300, WH-QuietBox/WH-LoudBox (T3K), n150 πŸ› οΈ Experimental 5bf679a 48eba14 0.3.0-5bf679a-48eba14
Qwen2.5-VL-32B-Instruct WH-QuietBox/WH-LoudBox (T3K) πŸ› οΈ Experimental 5bf679a 48eba14 0.3.0-5bf679a-48eba14
Qwen2.5-VL-72B-Instruct WH-QuietBox/WH-LoudBox (T3K) 🟑 Functional 5bf679a 48eba14 0.3.0-5bf679a-48eba14
Qwen3-8B n150, n300, WH-QuietBox/WH-LoudBox (T3K), Galaxy 🟑 Functional e95ffa5 48eba14 0.3.0-e95ffa5-48eba14
Qwen3-32B WH-QuietBox/WH-LoudBox (T3K), Galaxy 🟑 Functional e95ffa5 48eba14 0.3.0-e95ffa5-48eba14
Mistral-7B-Instruct-v0.3 n300, WH-QuietBox/WH-LoudBox (T3K), n150 🟒 Complete 9b67e09 a91b644 0.3.0-9b67e09-a91b644
QwQ-32B WH-QuietBox/WH-LoudBox (T3K), Galaxy 🟑 Functional e95ffa5 48eba14 0.3.0-e95ffa5-48eba14
Qwen2.5-72B
Qwen2.5-72B-Instruct
WH-QuietBox/WH-LoudBox (T3K), Galaxy 🟑 Functional 13f44c5 0edd242 0.3.0-13f44c5-0edd242
Qwen2.5-7B
Qwen2.5-7B-Instruct
n300 πŸ› οΈ Experimental 5b5db8a e771fff 0.3.0-5b5db8a-e771fff
Llama-3.3-70B-Instruct
Llama-3.1-70B
Llama-3.1-70B-Instruct
DeepSeek-R1-Distill-Llama-70B
Galaxy 🟒 Complete e95ffa5 48eba14 0.3.0-e95ffa5-48eba14
Llama-3.3-70B-Instruct
Llama-3.1-70B
Llama-3.1-70B-Instruct
DeepSeek-R1-Distill-Llama-70B
WH-QuietBox/WH-LoudBox (T3K) 🟑 Functional 9b67e09 a91b644 0.3.0-9b67e09-a91b644
Llama-3.3-70B-Instruct
Llama-3.1-70B
Llama-3.1-70B-Instruct
DeepSeek-R1-Distill-Llama-70B
BH-QuietBox (P150X4) 🟑 Functional 55fd115 aa4ae1e 0.3.0-55fd115-aa4ae1e
Llama-3.2-11B-Vision
Llama-3.2-11B-Vision-Instruct
n300, WH-QuietBox/WH-LoudBox (T3K) 🟑 Functional v0.61.1-rc1 5cbc982 0.3.0-v0.61.1-rc1-5cbc982
Llama-3.2-90B-Vision
Llama-3.2-90B-Vision-Instruct
WH-QuietBox/WH-LoudBox (T3K) 🟑 Functional v0.61.1-rc1 5cbc982 0.3.0-v0.61.1-rc1-5cbc982
Llama-3.2-1B
Llama-3.2-1B-Instruct
n300, WH-QuietBox/WH-LoudBox (T3K), n150 🟑 Functional 9b67e09 a91b644 0.3.0-9b67e09-a91b644
Llama-3.2-3B
Llama-3.2-3B-Instruct
n300, WH-QuietBox/WH-LoudBox (T3K), n150 🟑 Functional 20edc39 03cb300 0.3.0-20edc39-03cb300
Llama-3.1-8B
Llama-3.1-8B-Instruct
n300, WH-QuietBox/WH-LoudBox (T3K), n150 🟒 Complete 9b67e09 a91b644 0.3.0-9b67e09-a91b644
Llama-3.1-8B
Llama-3.1-8B-Instruct
p100, p150 πŸ› οΈ Experimental 55fd115 aa4ae1e 0.3.0-55fd115-aa4ae1e
Llama-3.1-8B
Llama-3.1-8B-Instruct
BH-QuietBox (P150X4) 🟒 Complete 55fd115 aa4ae1e 0.3.0-55fd115-aa4ae1e
Llama-3.1-8B
Llama-3.1-8B-Instruct
Galaxy 🟑 Functional e95ffa5 48eba14 0.3.0-e95ffa5-48eba14
Qwen2.5-Coder-32B-Instruct WH-QuietBox/WH-LoudBox (T3K) πŸ› οΈ Experimental 17a5973 aa4ae1e 0.3.0-17a5973-aa4ae1e
stable-diffusion-xl-base-1.0 Galaxy, n300, WH-QuietBox/WH-LoudBox (T3K), n150 🟒 Complete 13f44c5 N/A 0.4.0-e95ffa59adbe39237525161272141cbbb603c686
stable-diffusion-3.5-large WH-QuietBox/WH-LoudBox (T3K), Galaxy 🟒 Complete 13f44c5 N/A 0.4.0-e95ffa59adbe39237525161272141cbbb603c686
whisper-large-v3
distil-large-v3
Galaxy, WH-QuietBox/WH-LoudBox (T3K), n150 🟒 Complete 13f44c5 N/A 0.4.0-e95ffa59adbe39237525161272141cbbb603c686
resnet-50 n300, n150 πŸ› οΈ Experimental 2496be4 N/A 0.2.0-2496be4518bca0a7a5b497a4cda3cfe7e2f59756
vovnet n300, n150 πŸ› οΈ Experimental 2496be4 N/A 0.2.0-2496be4518bca0a7a5b497a4cda3cfe7e2f59756
mobilenetv2 n300, n150 πŸ› οΈ Experimental 2496be4 N/A 0.2.0-2496be4518bca0a7a5b497a4cda3cfe7e2f59756