Skip to content

Rumeza/externalize tt infer#588

Merged
rfatimaTT merged 14 commits intodevfrom
rumeza/externalize-tt-infer
Feb 12, 2026
Merged

Rumeza/externalize tt infer#588
rfatimaTT merged 14 commits intodevfrom
rumeza/externalize-tt-infer

Conversation

@rfatimaTT
Copy link
Collaborator

@rfatimaTT rfatimaTT commented Jan 22, 2026

This PR pivots the usage of tt-inference-server to a release artifact instead

How the artifact is created:

The artifact is a .tar.gz of the tt-inference-server repository source code.
GitHub releases: When a release/tag is created in the tt-inference-server repo, GitHub automatically generates a source code tarball at: https://github.com/tenstorrent/tt-inference-server/archive/refs/tags/{version}.tar.gz

the run.py downloads this tarball and extracts it to .artifacts/tt-inference-server/.

TT_INFERENCE_ARTIFACT_VERSION=v0.8.0

you can change up the version of tt-inference by changing in env
Screenshot 2026-01-21 at 4 33 58 PM
Screenshot 2026-01-21 at 5 00 51 PM

Rumeza Fatima and others added 2 commits January 22, 2026 17:03
remove inference-server submodule and route benchmarking to artifact repo

inetgrate inference server as artifact
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the tt-inference-server git submodule and switches TT Studio to consume tt-inference-server as a versioned release artifact, while introducing a dedicated inference-api FastAPI service that wraps the artifact’s run.py.

Changes:

  • Remove the tt-inference-server submodule and .gitmodules entry, replacing it with an artifact-based workflow managed from .artifacts/tt-inference-server.
  • Update run.py to download and extract a configurable tt-inference-server release tarball, wire environment variables for the artifact, and start an inference-api FastAPI app instead of the old submodule-based server.
  • Add a new inference-api package (FastAPI app, requirements, API endpoints) that integrates with the artifact’s run.py and workflows, including deployment progress and log streaming endpoints.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tt-inference-server Removes the git submodule reference so the repo no longer embeds tt-inference-server directly.
run.py Switches setup from git submodules to downloaded artifacts, configures the artifact directory/ENV, and starts the new inference-api instead of the embedded FastAPI server.
inference-api/requirements.txt Defines dependencies for the new inference API (FastAPI, uvicorn, docker, pydantic, etc.).
inference-api/main.py Entry point exposing the FastAPI app from api.py for uvicorn.
inference-api/api.py New FastAPI service that imports run.py and workflows from the artifact, manages deployment runs, progress/log tracking, and exposes model/workflow/device listing endpoints.
inference-api/init.py Package marker file with SPDX headers.
app/.env.default Adds TT_INFERENCE_ARTIFACT_VERSION default to configure which tt-inference-server release to use.
.gitmodules Deletes the tt-inference-server submodule configuration.
.gitignore Ignores .artifacts, common archive formats, and inference-api virtualenv/bytecode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@anirudTT anirudTT linked an issue Jan 22, 2026 that may be closed by this pull request
"last_updated": time.time()
})

# Add stalled detection (>120s no updates)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to change this logic , i think this might be causing deployment issues for larger models .

rfatimaTT and others added 3 commits February 6, 2026 12:50
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@anirudTT anirudTT self-requested a review February 6, 2026 18:08
rfatimaTT and others added 2 commits February 6, 2026 15:47
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@anirudTT
Copy link
Collaborator

anirudTT commented Feb 11, 2026

  • tested on n300
Screenshot 2026-02-11 at 2 16 49 PM

* Enhance inference server configuration with reconfiguration options and improved artifact info logging. Added support for reconfiguring inference server artifacts and updated metadata writing to include detailed configuration information and instructions.

* Implement artifact re-download logic based on version and branch changes in inference server configuration. Added proactive sudo authentication request for artifact management to prevent permission issues during cleanup.
Copy link
Collaborator

@anirudTT anirudTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rfatimaTT rfatimaTT merged commit 63b21de into dev Feb 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Externalize TT Inference Server FAST API Code

2 participants