Skip to content

Conversation

@makortel
Copy link
Contributor

@makortel makortel commented Dec 22, 2025

PR description:

This PR removes CUDA-depending modules from DQM/SiPixelHeterogeneous. The inclusion of these modules in runTheMatrix workflow configurations came up in failures following CUDADataFormats dictionary removal in #49656 (comment). Since all direct CUDA components are slated for removal (#45844), this PR suggests to remove them. These components seem to have been superseded by more generic ones in #45206.

Resolves cms-sw/framework-team#1742

PR validation:

Workflows 11634.5 and 34434.5 succeeded.

These modules seem to have been supersed by more generic ones:
- SiPixelCompareVertexSoA -> SiPixelCompareVertices
- SiPixel*CompareRecHitsSoA -> SiPixel*CompareRecHits
- SiPixel*CompareTrackSoA -> SiPixel*CompareTracks
- SiPixel*MonitorRecHitsSoA -> SiPixel*MonitorRecHitsSoAAlpaka
- SiPixel*MonitorTrackSoA -> SiPixel*MonitorTrackSoAAlpaka
- SiPixelMonitorVertexSoA -> SiPixelMonitorVertexSoAAlpaka
@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 22, 2025

cms-bot internal usage

@makortel
Copy link
Contributor Author

FYI @cms-sw/heterogeneous-l2

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49697/47255

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel for master.

It involves the following packages:

  • DQM/SiPixelHeterogeneous (dqm)

@cmsbuild, @ctarricone, @gabrielmscampos, @nothingface0, @rseidita can you please review it and eventually sign? Thanks.
@fioriNTU, @idebruyn, @jandrea, @mmusich, @threus this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

test parameters:

  • workflows = 136.8855,136.8885,11634.5,34434.5
  • enable = gpu

@makortel
Copy link
Contributor Author

@cmsbuild, please test


# Run-3 sequence
monitorpixelSoASource = cms.Sequence(siPixelPhase1MonitorRecHitsSoA * siPixelPhase1MonitorTrackSoA * siPixelMonitorVertexSoA)
monitorpixelSoASource = cms.Sequence()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me if these empty sequences serve any purpose anymore other than being placeholders that are toReplaceWith() below with various modifiers.

Comment on lines 128 to 130
monitorpixelSoACompareSource = cms.Sequence(siPixelPhase1MonitorRawDataACPU *
siPixelPhase1MonitorRawDataAGPU *
siPixelPhase1MonitorRecHitsSoACPU *
siPixelPhase1MonitorRecHitsSoAGPU *
siPixelPhase1CompareRecHitsSoA *
siPixelPhase1MonitorTrackSoAGPU *
siPixelPhase1MonitorTrackSoACPU *
siPixelPhase1CompareTrackSoA *
siPixelMonitorVertexSoACPU *
siPixelMonitorVertexSoAGPU *
siPixelCompareVertexSoA *
siPixelPhase1RawDataErrorComparator)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me if the remaining 3 modules in this Sequence would be useful, or if it would be better to remove them as well.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-NVIDIA_T4
Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-046ac5/50392/summary.html
COMMIT: d99a3f5
CMSSW: CMSSW_16_1_X_2025-12-22-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/49697/50392/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals-NVIDIA_T4

ValueError: Undefined workflows: 29834.751, 29834.404, 29834.402, 29834.704, 29834.403

Comparison Summary

There are some workflows for which there are errors in the baseline:
11634.5 step 3
136.8855 step 3
136.8885 step 3
34434.5 step 3
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

  • You potentially added 99 lines to the logs
  • Reco comparison results: 12 differences found in the comparisons
  • Reco comparison had 4 failed jobs
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4280553
  • DQMHistoTests: Total failures: 15
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4280518
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 239 log files, 204 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

ValueError: Undefined workflows: 29834.751, 29834.404, 29834.402, 29834.704, 29834.403

I wonder what this means in practice

@makortel
Copy link
Contributor Author

ValueError: Undefined workflows: 29834.751, 29834.404, 29834.402, 29834.704, 29834.403

I wonder what this means in practice

I opened an issue #49700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove obsolete CUDA-using modules from DQM/SiPixelHeterogeneous

2 participants