Skip to content

Add qb2-blackhole support to Nightly CI support + 4x TP tests in onPush/onPR#3174

Open
kmabeeTT wants to merge 3 commits intomainfrom
kmabee/qb2_nightly_job
Open

Add qb2-blackhole support to Nightly CI support + 4x TP tests in onPush/onPR#3174
kmabeeTT wants to merge 3 commits intomainfrom
kmabee/qb2_nightly_job

Conversation

@kmabeeTT
Copy link
Contributor

@kmabeeTT kmabeeTT commented Feb 5, 2026

Ticket

None

Problem description

  • We have our first QuietBox 2 machine in CI this week and need to start qualifying our models in preparation for product launch
  • Just a single machine to start without s3-bucket mirror, more machines coming later this month in cloud, so for now we need to limit the usage of this machine by not rolling out QB2 tests in the same frequency as other archs.
  • We'd also like to have TP tests in onPush/onPR, was missing until now, use the otherwise-idle during the daytime CI machine for this

What's changed

  • Add qb2 specific nightly CI jobs via dedicated model-test-passing-qb2.json, model-test-xfail-qb2.json with usage only in new QB2 Nightly on main branch (exclude from official nightly for now).
  • Add qb2-blackhole to ALLOWED_ARCHES and default_archs in conftest.py
  • Add qb2-blackhole to supported_archs in ~420 test config entries
  • Add ~50 explicit qb2-blackhole arch_overrides modelled after p150 to start, refine/improve later.
  • Add qb2-blackhole arch_overrides for 2x lower PCC (0.98) and 8x s3-bucket-missing fails
  • Tag 4x TP tests for QB2 onPush/onPR and upate model-test-push.json (<10 min runtime)

Checklist

 - Add qb2 specific nightly CI jobs (main branch only, limited CI resources)
 - Add qb2-blackhole to ALLOWED_ARCHES and default_archs in conftest.py
 - Add qb2-blackhole to supported_archs in ~450 test config entries
 - Add ~65 explicit qb2-blackhole arch_overrides
 - Add model-test-xfail-qb2.json
 - Add qb2-blackhole arch_overrides for 2x lower PCC (0.98) and 8x s3-bucket-missing fails
@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 28.37%. Comparing base (72a9d8f) to head (5de2f77).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3174      +/-   ##
==========================================
- Coverage   28.38%   28.37%   -0.01%     
==========================================
  Files          31       33       +2     
  Lines        4154     4088      -66     
==========================================
- Hits         1179     1160      -19     
+ Misses       2975     2928      -47     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vmilosevic
Copy link
Contributor

I wouldn't include this in nightly tests until we get more stable setup with some redundancy (at least 3 runners with some label) as we use nightly as regression gate for release and this can block us.
I would rather add it to nightly experimental instead.

- Create new schedule-nightly-qb2.yml workflow for QB2-specific nightly tests
  to reduce risk on official nightly job while we only have single CI machine
- Remove QB2 jobs from main schedule-nightly.yml workflow
- Update workflow-run-collect-data.yml to collect data from "On nightly QB2" workflow
- QB2 nightly workflow runs at same time as main nightly (cron: '0 0 * * *')
- Includes both model-test-passing-qb2.json and model-test-xfail-qb2.json jobs
@kmabeeTT
Copy link
Contributor Author

kmabeeTT commented Feb 5, 2026

Thanks @vmilosevic , I pushed change here to split these tests to their own dedicated QB2 nightly job after offline discussion (TLDR; folks want these stable/expected passing tests reported on in superset and experimental nightly is for not-stable/not-expected-passing models and isn't reported on), take another look?

@ndrakulicTT
Copy link
Contributor

Do we need both single_chip and data_parallel for all the models, as graph difference should only be in mesh_shard on the model inputs, or is this needed for reporting on superset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants