Conversation
✅ Test Results - PASSEDSummary
Details
🎉 All tests passed! This PR is ready for review. |
✅ Test Coverage ReportCoverage of Changed Lines
|
bgoelTT
left a comment
There was a problem hiding this comment.
Before we uplift the commits can you please execute a Models CI dispatch run that proves the whole benchmark and accuracy evaluations workflows complete?
|
I ran the release workflow with this Uplift commit, and it worked fine for everything except Galaxy. Galaxy seems to either hang or fail partway through, so I’m currently looking into it. Qwen3-8B on n150## Tenstorrent Model Release Summary: Qwen3-8B on n150Metadata: Qwen3-8B on n150{
"report_id": "id_tt-transformers_Qwen3-8B_n150_2026-01-30_00-53-40",
"model_name": "Qwen3-8B",
"model_id": "id_tt-transformers_Qwen3-8B_n150",
"model_spec_json": "/home/kyamaguchi/tt-inference-server/workflow_logs/run_specs/tt_model_spec_2026-01-29_23-59-57_id_tt-transformers_Qwen3-8B_n150_release_ELQmmwze.json",
"model_repo": "Qwen/Qwen3-8B",
"model_impl": "tt-transformers",
"inference_engine": "vLLM",
"device": "n150",
"server_mode": "docker",
"tt_metal_commit": "41345ac",
"vllm_commit": "628d4dc",
"run_command": "python run.py --model Qwen3-8B --device n150 --workflow release --docker-server"
}Performance Benchmark Sweeps for Qwen3-8B on n150vLLM Text-to-Text Performance Benchmark Sweeps for Qwen3-8B on n150
Note: all metrics are means across benchmark run unless otherwise stated.
Qwen3-8B on t3k## Tenstorrent Model Release Summary: Qwen3-8B on t3kMetadata: Qwen3-8B on t3k{
"report_id": "id_tt-transformers_Qwen3-8B_t3k_2026-01-29_12-56-00",
"model_name": "Qwen3-8B",
"model_id": "id_tt-transformers_Qwen3-8B_t3k",
"model_spec_json": "/home/kyamaguchi/tt-inference-server/workflow_logs/run_specs/tt_model_spec_2026-01-29_12-18-43_id_tt-transformers_Qwen3-8B_t3k_release_dHOFti0Q.json",
"model_repo": "Qwen/Qwen3-8B",
"model_impl": "tt-transformers",
"inference_engine": "vLLM",
"device": "t3k",
"server_mode": "docker",
"tt_metal_commit": "41345ac",
"vllm_commit": "628d4dc",
"run_command": "python run.py --model Qwen3-8B --device t3k --workflow release --docker-server"
}Performance Benchmark Sweeps for Qwen3-8B on t3kvLLM Text-to-Text Performance Benchmark Sweeps for Qwen3-8B on t3k
Note: all metrics are means across benchmark run unless otherwise stated.
Qwen3-8B on galaxy_t3k## Tenstorrent Model Release Summary: Qwen3-8B on galaxy_t3kMetadata: Qwen3-8B on galaxy_t3k{
"report_id": "id_tt-transformers_Qwen3-8B_galaxy_t3k_2026-01-30_02-05-08",
"model_name": "Qwen3-8B",
"model_id": "id_tt-transformers_Qwen3-8B_galaxy_t3k",
"model_spec_json": "/home/ubuntu/works/tt-inference-server/workflow_logs/run_specs/tt_model_spec_2026-01-30_01-41-49_id_tt-transformers_Qwen3-8B_galaxy_t3k_release_ur9k6LIg.json",
"model_repo": "Qwen/Qwen3-8B",
"model_impl": "tt-transformers",
"inference_engine": "vLLM",
"device": "galaxy_t3k",
"server_mode": "docker",
"tt_metal_commit": "41345ac",
"vllm_commit": "628d4dc",
"run_command": "python run.py --model Qwen3-8B --device galaxy_t3k --workflow release --docker-server"
}Performance Benchmark Sweeps for Qwen3-8B on galaxy_t3kvLLM Text-to-Text Performance Benchmark Sweeps for Qwen3-8B on galaxy_t3k
Note: all metrics are means across benchmark run unless otherwise stated.
Also, for the N150 Models CI, the performance benchmark results are being produced, but it still appears to be failing, so I’m investigating that as well. https://github.com/tenstorrent/tt-shield/actions/runs/21421020337/job/61980149524 |
@sott0n the benchmark workflow is passing, it returns status code 0. What is failing is the accuracy evaluations due to our fork of lm-eval-harness being rebased. I noticed you were using this branch of tt-inference-server which makes sense why it would fail - you'll need to rebase Now for the Galaxy hangs, do you have a Models CI run to examine? |
|
@bgoelTT No, I don’t have a Models CI run yet. I tested it on a local Galaxy setup and observed that it hangs. However, I saw the same behavior with the current commit on the dev branch (not this uplift), so I’ll also check it in the Models CI. |
|
@sott0n can you post the Models CI run showing this uplift change? |
|
@tstescoTT Sorry for the lack of updates. The Models CI seems to be failing partway through, so I’ve been running the release on a local 6U system. However, it appears to hang during execution, so I’ve been bisecting tt-metal commits to identify which commit introduced the issue. I haven’t been able to pinpoint the root cause yet. From the inference-server policy perspective, would it be problematic for Qwen3-8B to be split per device? |
Running Qwen3-8B on N150 with tt-inference-server v0.8.0 can result in an out-of-memory (OOM) error reported by the Discord community.
The issue has been resolved by this vLLM commit that explicitly sets
max_tokens_all_users.This PR uplifts that fix to address the OOM issue and ensure Qwen3-8B runs correctly on N150.
fix #1869