Skip to content

Conversation

@makortel
Copy link
Contributor

PR description:

Backport of #48824. Original description

Testing ROOT PR on an option to disable header parsing during TClass::GetClass() call (root-project/root#18402) it was noticed the type alias gets listed in the rootmap file only if the alias is requested before the real type (see root-project/root#19705 for more details).

In the mean time, having the type alias in the rootmap file is necessary to avoid header parsing for the execution of the read rules that use the type alias names, e.g.

rule.fSource = type + "::Layout layout_;";

, and therefore it seemed worthwhile to change the dictionaries with PortableHost{Collection,Object}.

In the present state this PR demonstrates what the impact would be for the PortableTestObjects. If deemed viable, the next steps would be to update DataFormats/Portable README and scripts, and then update all the other classes_def.xml files that declare these portable data products.

PR validation:

None beyond #48824

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Backport of #48824 (using the same branch).

…heir aliased types in classes_def.xml

In ROOT presently the type alias gets listed in the rootmap file only
if the alias is requested before the real type. Having the type alias
in the rootmap file is necessary to avoid header parsing for the
execution of the read rules that use the type alias names.
…aliased-to types

This should avoid ROOT header parsing
They are not in DataFormats, so they are allowed to be transient only.
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel for CMSSW_16_0_X.

It involves the following packages:

  • DataFormats/BeamSpot (reconstruction)
  • DataFormats/EcalRecHit (reconstruction)
  • DataFormats/HGCalReco (reconstruction)
  • DataFormats/HcalDigi (simulation)
  • DataFormats/HcalRecHit (reconstruction)
  • DataFormats/ParticleFlowReco (reconstruction)
  • DataFormats/Portable (heterogeneous)
  • DataFormats/PortableTestObjects (heterogeneous)
  • DataFormats/SiPixelClusterSoA (heterogeneous, reconstruction)
  • DataFormats/SiPixelDigiSoA (heterogeneous, reconstruction)
  • RecoTracker/LSTCore (reconstruction)

@Moanwar, @civanch, @cmsbuild, @fwyzard, @jfernan2, @kpedro88, @makortel, @mandrenguyen, @mdhildreth, @srimanob can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @ReyerBand, @VinInn, @VourMa, @abdoulline, @argiro, @bsunanda, @dgulhan, @dkotlins, @elusian, @felicepantaleo, @ferencek, @gpetruc, @hatakeyamak, @lgray, @mariadalfonso, @missirol, @mmasciov, @mmusich, @mroguljic, @mtosi, @rchatter, @rovere, @thomreis, @tsusa, @wang0jin this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 12, 2026

cms-bot internal usage

@makortel
Copy link
Contributor Author

enable gpu

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

-1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1bc021/50529/summary.html
COMMIT: d001897
CMSSW: CMSSW_16_0_X_2026-01-12-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49774/50529/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 1 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 4 differences found in the comparisons
  • Reco comparison had 4 failed jobs
  • DQMHistoTests: Total files compared: 55
  • DQMHistoTests: Total histograms compared: 4513958
  • DQMHistoTests: Total failures: 67
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4513871
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 54 files compared)
  • Checked 235 log files, 208 edm output root files, 55 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

  • You potentially added 3 lines to the logs
  • Reco comparison results: 234 differences found in the comparisons
  • Reco comparison had 6 failed jobs
  • DQMHistoTests: Total files compared: 11
  • DQMHistoTests: Total histograms compared: 149371
  • DQMHistoTests: Total failures: 31134
  • DQMHistoTests: Total nulls: 9
  • DQMHistoTests: Total successes: 118228
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
  • Checked 42 log files, 45 edm output root files, 11 DQM output files
  • TriggerResults: no differences found

AMD_W7900 Comparison Summary

Summary:

  • You potentially added 9 lines to the logs
  • Reco comparison results: 241 differences found in the comparisons
  • Reco comparison had 6 failed jobs
  • DQMHistoTests: Total files compared: 11
  • DQMHistoTests: Total histograms compared: 149371
  • DQMHistoTests: Total failures: 32407
  • DQMHistoTests: Total nulls: 12
  • DQMHistoTests: Total successes: 116952
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
  • Checked 42 log files, 45 edm output root files, 11 DQM output files
  • TriggerResults: no differences found

NVIDIA_L40S Comparison Summary

Summary:

  • You potentially removed 7 lines from the logs
  • Reco comparison results: 239 differences found in the comparisons
  • Reco comparison had 6 failed jobs
  • DQMHistoTests: Total files compared: 11
  • DQMHistoTests: Total histograms compared: 149371
  • DQMHistoTests: Total failures: 27798
  • DQMHistoTests: Total nulls: 11
  • DQMHistoTests: Total successes: 121562
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
  • Checked 42 log files, 45 edm output root files, 11 DQM output files
  • TriggerResults: no differences found

NVIDIA_T4 Comparison Summary

Summary:

  • You potentially removed 14 lines from the logs
  • Reco comparison results: 249 differences found in the comparisons
  • Reco comparison had 6 failed jobs
  • DQMHistoTests: Total files compared: 11
  • DQMHistoTests: Total histograms compared: 149371
  • DQMHistoTests: Total failures: 30097
  • DQMHistoTests: Total nulls: 13
  • DQMHistoTests: Total successes: 119261
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
  • Checked 42 log files, 45 edm output root files, 11 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

On NVIDIA H100 some of the workflows failed with

----- Begin Fatal Exception 12-Jan-2026 19:18:50 CET-----------------------
An exception of category 'StdException' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'MC_Ele5_Open_Unseeded'
   [2] Calling method for module HGCalSoARecHitsLayerClustersProducer@alpaka/'hltHgcalSoARecHitsLayerClustersProducer'
Exception Message:
A std::exception was thrown.
/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc13/external/alpaka/2.0.0-8493f1d11d0378dc14d6ea6ecfc69ac5/include/alpaka/mem/buf/uniformCudaHip/traits/BufUniformCudaHipRtTraits.hpp(283) 'TApi::mallocAsync(&memPtr, static_cast<std::size_t>(width) * sizeof(TElem), queue.getNativeHandle())' returned error  : 'cudaErrorNotSupported': 'operation not supported'!
----- End Fatal Exception -------------------------------------------------

The symptoms look like #47270 with

AlpakaServiceCudaAsync succesfully initialised.
Found 1 device:
  - NVIDIA H100L-2-24C MIG 2g.24gb

@fwyzard
Copy link
Contributor

fwyzard commented Jan 13, 2026

+heterogeneous

@fwyzard
Copy link
Contributor

fwyzard commented Jan 13, 2026

The failing tests were running on grid1165624 with the NVIDIA driver 580.x:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+

while the passing ones ran on ngt-nvidia-h100-01 with the NVIDIA driver 570.x:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+

@smuzaffar do we have different runners for the H100 tests ?

@fwyzard
Copy link
Contributor

fwyzard commented Jan 13, 2026

ignore tests-rejected with external-failure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants