Rumeza/externalize tt infer by rfatimaTT · Pull Request #588 · tenstorrent/tt-studio

rfatimaTT · 2026-01-22T18:04:12Z

This PR pivots the usage of tt-inference-server to a release artifact instead

How the artifact is created:

The artifact is a .tar.gz of the tt-inference-server repository source code.
GitHub releases: When a release/tag is created in the tt-inference-server repo, GitHub automatically generates a source code tarball at: https://github.com/tenstorrent/tt-inference-server/archive/refs/tags/{version}.tar.gz

the run.py downloads this tarball and extracts it to .artifacts/tt-inference-server/.

TT_INFERENCE_ARTIFACT_VERSION=v0.8.0

you can change up the version of tt-inference by changing in env

remove inference-server submodule and route benchmarking to artifact repo inetgrate inference server as artifact

Copilot

Pull request overview

This PR removes the tt-inference-server git submodule and switches TT Studio to consume tt-inference-server as a versioned release artifact, while introducing a dedicated inference-api FastAPI service that wraps the artifact’s run.py.

Changes:

Remove the tt-inference-server submodule and .gitmodules entry, replacing it with an artifact-based workflow managed from .artifacts/tt-inference-server.
Update run.py to download and extract a configurable tt-inference-server release tarball, wire environment variables for the artifact, and start an inference-api FastAPI app instead of the old submodule-based server.
Add a new inference-api package (FastAPI app, requirements, API endpoints) that integrates with the artifact’s run.py and workflows, including deployment progress and log streaming endpoints.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tt-inference-server	Removes the git submodule reference so the repo no longer embeds `tt-inference-server` directly.
run.py	Switches setup from git submodules to downloaded artifacts, configures the artifact directory/ENV, and starts the new `inference-api` instead of the embedded FastAPI server.
inference-api/requirements.txt	Defines dependencies for the new inference API (FastAPI, uvicorn, docker, pydantic, etc.).
inference-api/main.py	Entry point exposing the FastAPI `app` from `api.py` for uvicorn.
inference-api/api.py	New FastAPI service that imports `run.py` and workflows from the artifact, manages deployment runs, progress/log tracking, and exposes model/workflow/device listing endpoints.
inference-api/init.py	Package marker file with SPDX headers.
app/.env.default	Adds `TT_INFERENCE_ARTIFACT_VERSION` default to configure which `tt-inference-server` release to use.
.gitmodules	Deletes the `tt-inference-server` submodule configuration.
.gitignore	Ignores `.artifacts`, common archive formats, and `inference-api` virtualenv/bytecode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

inference-api/api.py

run.py

inference-api/api.py

run.py

anirudTT · 2026-02-06T17:39:32Z

inference-api/api.py

+            "last_updated": time.time()
+        })
+
+        # Add stalled detection (>120s no updates)


I think we need to change this logic , i think this might be causing deployment issues for larger models .

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…torrent/tt-studio into rumeza/externalize-tt-infer

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

anirudTT · 2026-02-11T19:18:34Z

tested on n300

* Enhance inference server configuration with reconfiguration options and improved artifact info logging. Added support for reconfiguring inference server artifacts and updated metadata writing to include detailed configuration information and instructions. * Implement artifact re-download logic based on version and branch changes in inference server configuration. Added proactive sudo authentication request for artifact management to prevent permission issues during cleanup.

anirudTT

LGTM

Rumeza Fatima and others added 2 commits January 22, 2026 17:03

inetgrate inference server as artifact

7d8e2b9

remove inference-server submodule and route benchmarking to artifact repo inetgrate inference server as artifact

resolve run.py

c7b8e90

rfatimaTT requested review from anirudTT and Copilot January 22, 2026 18:04

Copilot started reviewing on behalf of rfatimaTT January 22, 2026 18:04 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

anirudTT linked an issue Jan 22, 2026 that may be closed by this pull request

Externalize TT Inference Server FAST API Code #578

Open

rfatimaTT added 6 commits January 30, 2026 21:47

Merge branch 'dev' into rumeza/externalize-tt-infer

12eba5b

Merge branch 'dev' into rumeza/externalize-tt-infer

59d728a

merge dev

7f6583a

add normalized devices

316b55b

increase timeout

d6b8620

support branch and release

251e19b

anirudTT reviewed Feb 6, 2026

View reviewed changes

rfatimaTT and others added 3 commits February 6, 2026 12:50

Update inference-api/api.py

20c00ef

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

remove redundant lines

f1a6af9

Merge branch 'rumeza/externalize-tt-infer' of https://github.com/tens…

c682db8

…torrent/tt-studio into rumeza/externalize-tt-infer

anirudTT mentioned this pull request Feb 6, 2026

refactor this part of the code #616

Open

anirudTT self-requested a review February 6, 2026 18:08

rfatimaTT and others added 2 commits February 6, 2026 15:47

Update inference-api/api.py

d36cd1e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

api.py

0b03a1a

anirudTT approved these changes Feb 12, 2026

View reviewed changes

anirudTT assigned rfatimaTT Feb 12, 2026

rfatimaTT merged commit 63b21de into dev Feb 12, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rumeza/externalize tt infer#588

Rumeza/externalize tt infer#588
rfatimaTT merged 14 commits intodevfrom
rumeza/externalize-tt-infer

rfatimaTT commented Jan 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anirudTT Feb 6, 2026

Uh oh!

anirudTT commented Feb 11, 2026 •

edited

Loading

Uh oh!

anirudTT left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rfatimaTT commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How the artifact is created:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anirudTT Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

anirudTT commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anirudTT left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rfatimaTT commented Jan 22, 2026 •

edited

Loading

anirudTT commented Feb 11, 2026 •

edited

Loading