tt-inference-server

tt-inference-server is the fastest way to deploy and test models for serving inference on Tenstorrent hardware.

Quickstart guide

On first-run please see the prerequisites guide for general Tenstorrent hardware and software setup.

For the specific quickstart guide and details for your model, select your model and hardware configuration in Model Support pages and tables below. Alternatively you can see all models supported for your given Tenstorrent hardware.

Models by Model Type

Browse models by type:

LLM Models - Large Language Models
VLM Models - Vision-Language Models
Video Models - Video generation models
Image Models - Image generation models
Audio Models - Speech-to-text models
TTS Models - Text-to-speech models
Embedding Models - Text embedding models
CNN Models - Convolutional Neural Networks

Models by Hardware Configuration

Browse models by hardware:

Workflow automation in tt-inference-server

For details on the workflow automation for:

deploying inference servers
running E2E performance benchmarks
running accuracy evals

See:

Benchmarking

For more details see benchmarking/README.md

Evals

For more details see evals/README.md

Development

Developer documentation: docs/README.md

Release documentation: scripts/release/README.md

If you encounter setup or stability problems with any model please file an issue and our team will address it.

Name		Name	Last commit message	Last commit date
Latest commit History 1,177 Commits
.cursor/rules		.cursor/rules
.github		.github
benchmarking		benchmarking
docs		docs
evals		evals
scripts		scripts
stress_tests		stress_tests
tests		tests
tt-media-server		tt-media-server
tt-vllm-plugin		tt-vllm-plugin
utils		utils
vllm-tt-metal-llama3		vllm-tt-metal-llama3
workflows		workflows
.cursorignore		.cursorignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
docker-entrypoint.sh		docker-entrypoint.sh
model_specs_output.json		model_specs_output.json
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tt-inference-server

Quickstart guide

Models by Model Type

Models by Hardware Configuration

Workflow automation in tt-inference-server

Benchmarking

Evals

Development

About

Uh oh!

Releases 13

Packages

Uh oh!

Uh oh!

Contributors 46

Uh oh!

Languages

License

tenstorrent/tt-inference-server

Folders and files

Latest commit

History

Repository files navigation

tt-inference-server

Quickstart guide

Models by Model Type

Models by Hardware Configuration

Workflow automation in tt-inference-server

Benchmarking

Evals

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Uh oh!

Contributors 46

Uh oh!

Languages

Packages