-
Notifications
You must be signed in to change notification settings - Fork 241
[CI] Skip failing tests #5100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[CI] Skip failing tests #5100
Changes from 3 commits
7166c71
29af1c9
8af4dcc
1b2bd08
2683a95
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these test failures/skips unique to torch version 2.11? What about 2.12: https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/skip_tests/pytorch_2.12.py ? |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -67,6 +67,95 @@ | |
| "torch": [ | ||
| "test_cpp_warnings_have_python_context_cuda", | ||
| ], | ||
| "utils": [ | ||
| # ROCm devel/runtime-dependent UT. Skip in the PyTorch full-suite lane; | ||
| # this is expected to run in the separate ROCm devel UT step. | ||
| "test_load_standalone", | ||
| ], | ||
| "multiprocessing": [ | ||
| # ROCm devel/runtime-dependent UTs. Skip in the PyTorch full-suite | ||
| # lane; these are expected to run in the separate ROCm devel UT step. | ||
| "(test_fs and not test_fs_)", | ||
| "test_fs_is_shared", | ||
| "test_fs_pool", | ||
| "test_fs_preserve_sharing", | ||
| "test_fs_sharing", | ||
| ], | ||
| "serialization": [ | ||
| # TestSerialization - NJT weights_only import check | ||
| # TestOldSerialization - CI env assertion | ||
| "test_debug_set_in_ci", | ||
| ], | ||
| "modules": [ | ||
| # TestModuleCUDA - CTCLoss cpu/gpu parity scalar mismatch | ||
| "test_cpu_gpu_parity_nn_CTCLoss_cuda_float32", | ||
| # TestModuleCUDA - CTCLoss forward scalar mismatch | ||
| "test_forward_nn_CTCLoss_cuda_float32", | ||
| ], | ||
| "export": [ | ||
| # TestExportOnFakeCudaCUDA - subprocess import fails: missing librocm_sysdeps_liblzma.so.5 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This rather sounds like a bug that should be fixed.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i agree librocm_sysdebs_liblzma must be available otherwise it is a bug |
||
| "test_fake_export___getitem___cuda_float32", | ||
| "test_fake_export_nn_functional_batch_norm_cuda_float32", | ||
| "test_fake_export_nn_functional_batch_norm_without_cudnn_cuda_float32", | ||
| "test_fake_export_nn_functional_conv2d_cuda_float32", | ||
| "test_fake_export_nn_functional_instance_norm_cuda_float32", | ||
| "test_fake_export_nn_functional_multi_margin_loss_cuda_float32", | ||
| "test_fake_export_nn_functional_scaled_dot_product_attention_cuda_float32", | ||
| "test_fake_export_nonzero_cuda_float32", | ||
| "test_preserve_original_behavior_cuda", | ||
| ], | ||
| "inductor": [ | ||
| # inductor/test_aot_inductor_package: AOTI C++ package tests need | ||
| # more complete CMake/runtime library-path handling in the wheel CI lane. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As the comment already says this might need proper handling in the wheel, why are tests skipped instead of aiming for a fix? |
||
| "test_compile_after_package_multi_arch", | ||
| "test_compile_after_package_static", | ||
| "test_compile_standalone_cos", | ||
| "test_compile_with_exporter", | ||
| "test_compile_with_exporter_weights", | ||
| #Also failed on https://github.com/ROCm/TheRock/actions/runs/24898379109 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Logs will expire, so this needs an issue to track.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and please add a proper error message here - so we know why it was added to the skip list |
||
| "test_flex_attention_logging_cuda", | ||
| "test_linalg_eig_stride_consistency_cuda", | ||
| "test_linalg_eig_stride_consistency_cuda", | ||
| "test_linalg_eig_stride_consistency_dynamic_shapes_cuda", | ||
| "test_repeated_calling_cuda", | ||
| #new test | ||
| "test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_amp", | ||
| #Passed in https://github.com/ROCm/TheRock/actions/runs/24898379109 | ||
| "test_return_aux_deprecation_warnings_cuda_float16", | ||
|
Comment on lines
+123
to
+124
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is is passing, why is it on the exclude list? As above, logs will expire and this might need a proper issue. |
||
| ], | ||
| "functorch": [ | ||
| #passed on https://github.com/ROCm/TheRock/actions/runs/24898379109 | ||
| "test_torch_return_types_returns_cuda", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so are those tests flaky or why are the passing in one but are here added to the skip list? |
||
| ], | ||
| "jit_fuser_te": [ | ||
| #passed on https://github.com/ROCm/TheRock/actions/runs/24898379109 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same |
||
| "test_binary_div_ops", | ||
| "test_binary_ops", | ||
| "test_binary_tensor_scalar_ops", | ||
| "test_ternary_norm_ops", | ||
| "test_ternary_ops", | ||
| "test_unary_ops", | ||
| "test_where_ops", | ||
| "test_binary_div_ops", | ||
| "test_binary_ops", | ||
| "test_binary_tensor_scalar_ops", | ||
| "test_ternary_norm_ops", | ||
| "test_ternary_ops", | ||
| "test_unary_ops", | ||
| "test_where_ops", | ||
| ], | ||
| "torch_config_hash_determinism": [ | ||
| #passed on https://github.com/ROCm/TheRock/actions/runs/24898379109 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same |
||
| "test_inductor_config_hash_portable_deterministic", | ||
| "test_inductor_config_hash_portable_without_ignore", | ||
| ], | ||
| "spectral_ops": [ | ||
| #new failures | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. new failures - why? rocm or pytorch? would be good to triage a bit if it is a new failure to know where the regresison comes from and please again: more error messages as comments here so we can better categorize the test failure |
||
| #skipping these tests for a cleaner run | ||
| #these tests passed in https://github.com/ROCm/TheRock/actions/runs/25360031025 | ||
| "test_istft_against_librosa_cuda_float64", | ||
| "test_stft_cuda_float64", | ||
| ], | ||
| }, | ||
| "gfx942": { | ||
| "cuda": [ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workflow makes an attempt to be portable to Windows, see the
if: ${{ runner.os == 'Windows'steps below. These commands that will only ever work on Linux should either beif: ${{ runner.os == 'Linux' }}conditionThat being said, we may also want to segment the tests that need
build-essentialsomehow, similar to tests that need rocm development headers/files. Users should get a functional install of pytorch with justpip install torch, and anything that needs to be separately installed at the operating system level should be treated with care. If only 1% of the software/tests need additional packages we don't want to miss that suddenly growing to 50%+ since all of our test workflows preinstall packages.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. my old comment (which i never sent) was "why is this needed?".
definitely needs to be flagged behind linux and should have some explanation why it is needed in the first plac.e