Open
Description
In CMSSW_14_2_ROCM_X_2024-11-06-2300 we observe multiple Unit test and RelVal failures:
What failed | Description |
---|---|
DataFormats/SoATemplate/testRocmSoALayoutAndView_t | HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception |
HeterogeneousCore/AlpakaInterface/alpakaTestBufferROCmAsync | HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception |
HeterogeneousCore/AlpakaInterface/alpakaTestPrefixScanROCmAsync | Many Device-side assertion '0 == blockDimension % warpSize' failed. followed by HSA_STATUS_ERROR_EXCEPTION |
Relval 141.008583 step 2 | ModuleTypeResolverAlpaka had no backends available because of the combination of the job configuration and accelerator availability of on the machine. The job sees accelerators |
Relval 29834.403 step 2 | HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources |
Relval 29834.404 step 2 | StdException |
Relval 141.008507 step 3 | ModuleTypeResolverAlpaka had no backends available because of the combination of the job configuration and accelerator availability of on the machine. The job sees accelerators |
Relval 141.008508 step 3 | Fatal exception: Unable to choose current device because CUDAService is not preset or disabled. If CUDAService was not explicitly disabled in the configuration, the probable cause is that there is no GPU or there is some problem in the CUDA runtime or drivers. |
Relval 141.008513 step 3 | ModuleTypeResolverAlpaka had no backends available because of the combination of the job configuration and accelerator availability of on the machine. The job sees accelerators |
Relval 141.008514 step 3 | BadAlloc |
Relval 141.008523 step 3 | ModuleTypeResolverAlpaka had no backends available because of the combination of the job configuration and accelerator availability of on the machine. The job sees accelerators |
Relval 141.008524 step 3 | BadAlloc |
Relval 12834.402 step 3 | SIGSEGV in roc::DmaBlitManager::hsaCopyStaged |
Relval 13034.402 step 3 | SIGABRT |
Relval 13034.404 step 3 | SIGABRT |
Relval 13034.406 step 3 | SIGABRT |
Relval 13034.408 step 3 | SIGABRT |
Relval 13050.402 step 3 | SIGABRT |
Relval 13050.404 step 3 | SIGABRT |
Relval 13050.406 step 3 | SIGSEGV in roc::DmaBlitManager::hsaCopyStaged |
Relval 13050.408 step 3 | SIGABRT |
Relval 13061.402 step 3 | SIGSEGV in roc::DmaBlitManager::hsaCopyStaged |
Relval 29634.402 step 3 | SIGABRT |
Relval 29834.402 step 3 | SIGABRT |
Relval 160.03502 step 4 | BadAlloc |
(SIGABRTs are either HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception
or HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources