tt-inference-server is the fastest way to deploy and test models for serving inference on Tenstorrent hardware.
On first-run please see the prerequisites guide for general Tenstorrent hardware and software setup.
For the specific quickstart guide and details for your model, select your model and hardware configuration in Model Support pages and tables below. Alternatively you can see all models supported for your given Tenstorrent hardware.
Browse models by type:
- LLM Models - Large Language Models
- VLM Models - Vision-Language Models
- Video Models - Video generation models
- Image Models - Image generation models
- Audio Models - Speech-to-text models
- TTS Models - Text-to-speech models
- Embedding Models - Text embedding models
- CNN Models - Convolutional Neural Networks
Browse models by hardware:
For details on the workflow automation for:
- deploying inference servers
- running E2E performance benchmarks
- running accuracy evals
See:
For more details see benchmarking/README.md
For more details see evals/README.md
Developer documentation: docs/README.md
Release documentation: scripts/release/README.md
If you encounter setup or stability problems with any model please file an issue and our team will address it.