Implement stream::FixedQueueEDProducer#50627
Conversation
|
cms-bot internal usage |
|
enable gpu |
|
please test |
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50627/48825 ERROR: Build errors found during clang-tidy run. |
stream::FixedQueueEDProducer is a stream EDProducer with a fixed association of device queues to framework streams.
This ensures that PyTorch sees only a limited number of device streams, reducing the overall device memory utilisation.
0dff13b to
8b5045e
Compare
|
please test |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50627/48826
|
|
A new Pull Request was created by @fwyzard for master. It involves the following packages:
@fwyzard, @hjkwon260, @makortel, @valsdav, @y19y19 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
+1 Size: This PR adds an extra 44KB to repository The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Comparison SummaryThe workflows 2025.0010001, 2025.0000002, 2024.0070001, 2024.0060001, 2024.0050001, 2024.0040001, 2024.0030001, 2024.0020001, 2024.0010001, 2024.0000001, 2023.0020001 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons Summary:
AMD_MI300X Comparison SummarySummary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_H100 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
Max Memory Comparisons exceeding threshold@cms-sw/core-l2 , I found 6 workflow step(s) with memory usage exceeding the error threshold: Expand to see workflows ...
|
PR description:
Implement a new kind of alpaka
stream::EDProducerwith a fixed association of device queues (e.g. CUDA streams) to framework streams.This is useful for using external software that associates resources to the device queues, for example the PyTorch device memory caching allocator.
Migrating the PyTorch alpaka modules from
stream::EDProducertostream::FixedQueueEDProducerensures that PyTorch sees only a limited number of device queues, reducing the overall device memory utilisation.For more background information see the presentation ML inference on GPUs in CMSSW with PyTorch by @EmanueleCoradin at the CMS developments with GPUs on March 30th, 2026.
PR validation:
All unit tests pass.