Skip to content

Slow remote reads in CMSSW_15_0_2 #47750

Open
@davidlange6

Description

@davidlange6

Production is using CMSSW_15_0_2 to run a remini-renano workflow and seeing rather poor CPU efficiencies.

An example log is /eos/cms/store/logs/prod/recent/PRODUCTION/pdmvserv_Run2024I_EGamma0_MINIv6NANOv15_250321_075704_7868/DataProcessing/cmsgwms-submit9.fnal.gov-2270711-1-log.tar

(others in that same area, this is one i grabbed)

Notably in the log there are quite a number of O(6-8 minute) pauses.

Running from a node CERN, copying the three input files takes a few minutes and then the workflow runs at reasonable efficiency (eg, it finished after 40 minutes while the job with remote file reads is 10% done after 70 minutes).

Running on the original files also illustrates pauses between events being processed (but not at the same event numbers as in the original job). The files in my example are:

xrdcp root://xrootd-cms.infn.it//store/data/Run2024I/EGamma0/AOD/PromptReco-v1/000/386/605/00000/9ad0cdbe-0470-45a8-90fa-9ccbd0ef4087.root .
xrdcp root://xrootd-cms.infn.it//store/data/Run2024I/EGamma0/AOD/PromptReco-v1/000/386/605/00000/5d7bb4b0-674e-4584-afe4-e71b63a95c1b.root .
xrdcp root://xrootd-cms.infn.it//store/data/Run2024I/EGamma0/AOD/PromptReco-v1/000/386/605/00000/6c68d214-3fd6-4b7b-9809-d0c8705a24cf.root .

(at T2_US_Vanderbilt, so not very close to cern...I believe the original job ran in Bari, which would be even further away)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions