[ALMIOPEN-1696] Re-enabling certain MIOpen tests with ASan enabled.#7904
Draft
NolanHannaAMD wants to merge 1 commit into
Draft
[ALMIOPEN-1696] Re-enabling certain MIOpen tests with ASan enabled.#7904NolanHannaAMD wants to merge 1 commit into
NolanHannaAMD wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Trying to reduce the number of disabled tests under ASan to increase test coverage while still attempting to avoid test breakages on CI. These were disabled in a early sweep when many issues were being encountered. Since then, several fixes have gone in and the goal is to reduce the list to a more manageable size. The remaining disabled tests are split into 2 groups, one of which may be able to be re-enabled with further testing on other architectures/environments and the other that should remain disabled until the underlying issues are addressed.
Putting this PR up as a draft as this work was begun, but a full ASan gtest run is still necessary. All enabled tests passed individual testing.
Note: There is still ongoing work with the CK hangs that will hopefully address the remainder of the hangs. When that work is complete, more tests should ideally be able to come off this list.
Technical Details
There are two different categories that were re-enabled and two that remain disabled. A full run of the gtests with ASan enabled should be done before considering.
Definitely safe - reenabled
The following files no longer exist and therefore could not cause issues when being re-enabled:
ck_builder_shared.cpp— deleted in PR [MIOpen][CK Builder] Remove CK builder integration #5347ck_builder_xdl.cpp— deleted in PR [MIOpen][CK Builder] Remove CK builder integration #5347graphapi_conv_bias_res_add_activ_fwd.cpp— test file deletedunit_conv_solver_ConvHipImplicitGemmBwdXdlops.cpp— deleted in PR Deprecate ck non-grouped convolution fwd and bwd solver #3953unit_conv_solver_ConvHipImplicitGemmFwdXdlops.cpp— deleted in PR Deprecate ck non-grouped convolution fwd and bwd solver #3953Potentially safe - reenabled
The following are tests that all passed individual testing, but still need to be tested within a full gtest run before being merged:
bad_fusion_plan.cppcba_find2_infer.cppcba_infer.cppconv_activ_infer.cppconv_ai_3d_kernel_tuning_utils.cppfind_2_conv.cppfind_db.cppfind_mode_trust_verify.cppfused_conv_bias_res_add_activ.cppgroup_conv_deterministic_split_k.cppgroup_conv2d_fwd.cppgroup_conv2d_bwd.cppgroup_conv2d_wrw.cppkernel_tuning_net.cppmiopendriver_gemm.cppmiopendriver_regression_big_tensor.cpp(also has a duplicate entry that can be dropped)miopendriver_regression_half.cppperf_config_HipImplicitGemm3DGroupFwdXdlops.cppunit_conv_solver_ConvAsmImplicitGemmGTCDynamicBwdXdlops.cppunit_conv_solver_ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC.cppunit_conv_solver_ConvAsmImplicitGemmGTCDynamicFwdXdlops.cppunit_conv_solver_ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC.cppunit_conv_solver_ConvAsmImplicitGemmGTCDynamicWrwXdlops.cppunit_conv_solver_ConvAsmImplicitGemmGTCDynamicWrwXdlopsNHWC.cppunit_conv_solver_ConvHipImplicitGemm3DGroupBwdXdlops.cppunit_conv_solver_ConvHipImplicitGemmGroupFwdXdlops.cppunit_conv_solver_ConvHipImplicitGemmGroupWrwXdlops.cppunit_implicitgemm_ck_util.cppUntested - remain on disabled list
The following are tests that were skipped when individual tests were executed due to architecture or disabled frameworks):
conv_ck_igemm_fwd_v6r1_dlops_nchw.cpp— solver not supported on gfx942conv_hip_igemm_xdlops.cpp—test_drive<>framework disabledconv_igemm_mlir_xdlops_bwd_wrw.cpp—test_drive<>framework disabledconv_igemm_mlir_xdlops_fwd.cpp—test_drive<>framework disabledsmoke_solver_ConvCkIgemmFwdV6r1DlopsNchw.cpp—test_drive<>framework disabledsmoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_fp32_fp16.cpp—test_drive<>framework disabledsmoke_solver_ConvAsmImplicitGemmGTCDynamicXdlopsNHWC_bf16.cpp—test_drive<>framework disabledunit_conv_solver_ConvHipImplicitGemmBwdDataV1R1Xdlops.cpp— gfx942 not supportedunit_conv_solver_ConvHipImplicitGemmBwdDataV4R1Xdlops.cpp— gfx942 not supportedunit_conv_solver_ConvHipImplicitGemmForwardV4R4Xdlops.cpp— gfx942 not supportedunit_conv_solver_ConvHipImplicitGemmForwardV4R4Xdlops_Padded_Gemm.cpp— gfx942 not supportedunit_conv_solver_ConvHipImplicitGemmForwardV4R5Xdlops.cpp— gfx942 not supportedunit_conv_solver_ConvHipImplicitGemmWrwV4R4Xdlops.cpp— gfx942 not supportedunit_conv_solver_ConvHipImplicitGemmWrwV4R4Xdlops_Padded_Gemm.cpp— gfx942 not supportedUnsafe
group_conv3d_fwd.cpp— CK 3D conv kernel hanggroup_conv3d_bwd.cpp— CK 3D conv kernel hanggroup_conv3d_wrw.cpp— CK 3D conv kernel hangmiopendriver_conv_immed.cpp— BFP16 subprocess hangs >60minmiopendriver_conv2d_trans.cpp— BFP16 trans conv extremely slow >27minmiopendriver_regression_half_gfx9.cpp— 3D FP16 subprocess hangs >25minunit_conv_solver_ConvHipImplicitGemm3DGroupFwdXdlops.cpp— CK 3D fwd hangunit_conv_solver_ConvHipImplicitGemm3DGroupWrwXdlops.cpp— CK 3D wrw hangunit_conv_solver_ConvCkGroupedConvFwd.cpp— ASAN GPU crash in CK GridwiseGroupedConv2DFwdunit_conv_solver_ConvHipImplicitGemmGroupBwdXdlops.cpp— ASAN GPU crash on TF32 dilation>Test Plan
With an ASan enabled build, run the full MIOpen gtest suite that is run on CI to verify no problems are encountered.
Test Result
TBD: If no issues are encountered, this should be safe to merge, if any hangs/etc are encountered, this should be moved to the top disabled list (with a specific comment mentioning that the issue only manifests on a full run).
Risk Assessment
Currently, I would rate this as MEDIUM as the full gtest suite needs to be re-run together under ASan. Issues could affect the ASan CI and re-introdude failures that were experienced before, however this will not impact non-ASan builds at all. Once the full test run is complete, if no issues are encountered, I would downgrade this to low.