Skip to content

Remove assert on ARCH_NAME in data collection step in workflows#37300

Merged
roseli-TT merged 11 commits intomainfrom
roseli/0-remove-arch-name-req-save-env-data
Feb 7, 2026
Merged

Remove assert on ARCH_NAME in data collection step in workflows#37300
roseli-TT merged 11 commits intomainfrom
roseli/0-remove-arch-name-req-save-env-data

Conversation

@roseli-TT
Copy link
Contributor

@roseli-TT roseli-TT commented Feb 6, 2026

We would like to gradually remove reliance on ARCH_NAME env var.

Recently ARCH_NAME was removed from Galaxy demo pipelines. No tests need the env var set, but the data collection step in the workflow still has an assert on ARCH_NAME.

This PR removes the hard assert on ARCH_NAME in the data collection script and tries to infer from the runner name for CIv2 or the SKU config for CIv1.

Testing:

@roseli-TT roseli-TT force-pushed the roseli/0-remove-arch-name-req-save-env-data branch from b675b7f to cb0d360 Compare February 6, 2026 21:07
github-merge-queue bot pushed a commit that referenced this pull request Feb 6, 2026
Change galaxy demo pipeline to use [reorganized
pipeline](https://tenstorrent.atlassian.net/wiki/spaces/MI6/pages/1396506680/Proposed+pipeline+and+test+organization+changes)
format.

- format Galaxy Demo Pipeline to use `*tests.yaml` and time budgets
- format Galaxy Demo Pipeline to use `.github/sku_config.yaml` to map
SKUs in `*tests.yaml` to
machine `runs-on` labels for infra team use
- removes `ARCH_NAME` from workflow

This PR does not affect how the pipeline is used. Only affects adding
tests in the future.

**About pipeline reorg:**

Devs should only have to interface with tests/pipeline_reorg/*tests.yaml
to add their tests instead of adding them directly to the github actions
workflow files.

Tests are subject to a team budget .github/time_budget.yaml that
dictates budgets per team, per pipeline, per SKU aka machine type (eg.
n150, t3k, p150, etc).

Team names and budgets are in flux - please message me if you have any
concerns.


Testing

- [x] Select 1 test
https://github.com/tenstorrent/tt-metal/actions/runs/21732479327
- [x] Select all tests
https://github.com/tenstorrent/tt-metal/actions/runs/21732581550
ARCH_NAME related failures in Save Environment Data step is addressed in
#37300
The tests themselves still pass
- [x] Invoke from Galaxy select your own pipeline
https://github.com/tenstorrent/tt-metal/actions/runs/21732620140
@roseli-TT roseli-TT marked this pull request as ready for review February 6, 2026 21:35
@roseli-TT roseli-TT requested review from a team as code owners February 6, 2026 21:35
Copilot AI review requested due to automatic review settings February 6, 2026 21:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the workflow’s hard dependency on ARCH_NAME by updating the benchmark environment data-collection code to infer device type from runner metadata (CIv2 runner naming) or repo config (CIv1 SKU config), and updates the Galaxy demo workflow to stop exporting ARCH_NAME.

Changes:

  • Add device type inference logic (RUNNER_NAME / .github/sku_config.yaml) and remove the hard ARCH_NAME assert in benchmark environment JSON creation.
  • Refactor repo-root path resolution used by benchmark artifact discovery.
  • Comment out ARCH_NAME injection in the Galaxy demo workflow container environment.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
infra/data_collection/github/utils.py Adds device-type inference and removes hard ARCH_NAME requirement for environment completion; refactors repo-root path usage.
.github/workflows/galaxy-demo-tests-impl.yaml Stops exporting ARCH_NAME into the container environment for Galaxy demo jobs.

@roseli-TT roseli-TT force-pushed the roseli/0-remove-arch-name-req-save-env-data branch from 66e151f to 95f9364 Compare February 6, 2026 22:27
roseli-TT and others added 2 commits February 6, 2026 17:58
Co-authored-by: William Ly <williamly@tenstorrent.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

sku_config_path = _get_repo_root() / ".github" / "sku_config.yaml"
if sku_config_path.exists():
with open(sku_config_path) as f:
config = yaml.safe_load(f)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml.safe_load(f) can return None (e.g., empty YAML file), in which case config.get(...) will raise an AttributeError. Consider defaulting to an empty dict (e.g., config = yaml.safe_load(f) or {}) before accessing .get() to make the inference logic robust.

Suggested change
config = yaml.safe_load(f)
config = yaml.safe_load(f) or {}

Copilot uses AI. Check for mistakes.
@roseli-TT roseli-TT added this pull request to the merge queue Feb 7, 2026
Merged via the queue into main with commit 28a9863 Feb 7, 2026
103 checks passed
@roseli-TT roseli-TT deleted the roseli/0-remove-arch-name-req-save-env-data branch February 7, 2026 19:40
adrian-pascual-bernal pushed a commit that referenced this pull request Feb 10, 2026
Change galaxy demo pipeline to use [reorganized
pipeline](https://tenstorrent.atlassian.net/wiki/spaces/MI6/pages/1396506680/Proposed+pipeline+and+test+organization+changes)
format.

- format Galaxy Demo Pipeline to use `*tests.yaml` and time budgets
- format Galaxy Demo Pipeline to use `.github/sku_config.yaml` to map
SKUs in `*tests.yaml` to
machine `runs-on` labels for infra team use
- removes `ARCH_NAME` from workflow

This PR does not affect how the pipeline is used. Only affects adding
tests in the future.

**About pipeline reorg:**

Devs should only have to interface with tests/pipeline_reorg/*tests.yaml
to add their tests instead of adding them directly to the github actions
workflow files.

Tests are subject to a team budget .github/time_budget.yaml that
dictates budgets per team, per pipeline, per SKU aka machine type (eg.
n150, t3k, p150, etc).

Team names and budgets are in flux - please message me if you have any
concerns.


Testing

- [x] Select 1 test
https://github.com/tenstorrent/tt-metal/actions/runs/21732479327
- [x] Select all tests
https://github.com/tenstorrent/tt-metal/actions/runs/21732581550
ARCH_NAME related failures in Save Environment Data step is addressed in
#37300
The tests themselves still pass
- [x] Invoke from Galaxy select your own pipeline
https://github.com/tenstorrent/tt-metal/actions/runs/21732620140
adrian-pascual-bernal pushed a commit that referenced this pull request Feb 10, 2026
We would like to gradually remove reliance on `ARCH_NAME` env var.

Recently `ARCH_NAME` was removed from Galaxy demo pipelines. No tests
need the env var set, but the data collection step in the workflow still
has an assert on `ARCH_NAME`.

This PR removes the hard assert on `ARCH_NAME` in the data collection
script and tries to infer from the runner name for CIv2 or the SKU
config for CIv1.

Testing:

- [x] Galaxy demo
https://github.com/tenstorrent/tt-metal/actions/runs/21768966673

---------

Co-authored-by: William Ly <williamly@tenstorrent.com>
ssundaramTT pushed a commit that referenced this pull request Feb 10, 2026
Change galaxy demo pipeline to use [reorganized
pipeline](https://tenstorrent.atlassian.net/wiki/spaces/MI6/pages/1396506680/Proposed+pipeline+and+test+organization+changes)
format.

- format Galaxy Demo Pipeline to use `*tests.yaml` and time budgets
- format Galaxy Demo Pipeline to use `.github/sku_config.yaml` to map
SKUs in `*tests.yaml` to
machine `runs-on` labels for infra team use
- removes `ARCH_NAME` from workflow

This PR does not affect how the pipeline is used. Only affects adding
tests in the future.

**About pipeline reorg:**

Devs should only have to interface with tests/pipeline_reorg/*tests.yaml
to add their tests instead of adding them directly to the github actions
workflow files.

Tests are subject to a team budget .github/time_budget.yaml that
dictates budgets per team, per pipeline, per SKU aka machine type (eg.
n150, t3k, p150, etc).

Team names and budgets are in flux - please message me if you have any
concerns.


Testing

- [x] Select 1 test
https://github.com/tenstorrent/tt-metal/actions/runs/21732479327
- [x] Select all tests
https://github.com/tenstorrent/tt-metal/actions/runs/21732581550
ARCH_NAME related failures in Save Environment Data step is addressed in
#37300
The tests themselves still pass
- [x] Invoke from Galaxy select your own pipeline
https://github.com/tenstorrent/tt-metal/actions/runs/21732620140
ssundaramTT pushed a commit that referenced this pull request Feb 10, 2026
We would like to gradually remove reliance on `ARCH_NAME` env var.

Recently `ARCH_NAME` was removed from Galaxy demo pipelines. No tests
need the env var set, but the data collection step in the workflow still
has an assert on `ARCH_NAME`.

This PR removes the hard assert on `ARCH_NAME` in the data collection
script and tries to infer from the runner name for CIv2 or the SKU
config for CIv1.

Testing:

- [x] Galaxy demo
https://github.com/tenstorrent/tt-metal/actions/runs/21768966673

---------

Co-authored-by: William Ly <williamly@tenstorrent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants