Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[float8nocompile] Temporarily disable float8nocompile CI tests due to flakiness on sm89 #1902

Merged
merged 2 commits into from
Mar 24, 2025

Conversation

danielvegamyhre
Copy link
Contributor

float8 does not work properly on sm89, the tests only pass on H100s.

@danielvegamyhre danielvegamyhre added ci topic: bug fix Use this tag for PRs that fix bugs labels Mar 14, 2025
Copy link

pytorch-bot bot commented Mar 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1902

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 7196d56 with merge base c376285 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 14, 2025
@msaroufim
Copy link
Member

msaroufim commented Mar 15, 2025

You will almost never have signal on main in a reasonable timeframe if you merge this (16h as of this comment). There's hardly ever any H100 available in pool.

@jainapurva
Copy link
Contributor

You will almost never have signal on main in a reasonable timeframe if you merge this (16h as of this comment). There's hardly ever any H100 available in pool.

Agreed, we can't run a lot of tests on H100. This test is flaky on SM-89, either we can disable the flaky tests on SM89 or run it on H100 with reduced frequency (nightly or only on commit in main)

@danielvegamyhre
Copy link
Contributor Author

Ok let's disable this test for now and I'll take a look at why it's flaky on sm89

@danielvegamyhre danielvegamyhre changed the title [float8nocompile] Only run float8nocompile CI on h100 [float8nocompile] Temporarily disable float8nocompile CI tests due to flakiness on sm89 Mar 21, 2025
@danielvegamyhre
Copy link
Contributor Author

@jainapurva this is ready for another look when you have a chance. I just commented out the jobs: ... for now.

@jainapurva
Copy link
Contributor

Ok let's disable this test for now and I'll take a look at why it's flaky on sm89

This looks good.

@danielvegamyhre danielvegamyhre merged commit 84ce855 into main Mar 24, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: bug fix Use this tag for PRs that fix bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants