-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Updates for runTheMatrix.py
: input checks, GPUs repartition, input recycling
#47377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates for runTheMatrix.py
: input checks, GPUs repartition, input recycling
#47377
Conversation
SimpleTrackValidation
AnalyzerrunTheMatrix.py
cms-bot internal usage |
runTheMatrix.py
runTheMatrix.py
: input checks, GPUs repartition, input recycling
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47377/43734
|
A new Pull Request was created by @AdrianoDee for master. It involves the following packages:
@AdrianoDee, @Moanwar, @cmsbuild, @DickyChant, @miquork, @srimanob, @subirsarkar can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
test parameters:
|
enable gpu |
please test |
-1 Failed Tests: RelVals RelVals-INPUT RelVals
RelVals-INPUT
GPU Comparison SummarySummary:
|
There seems to be an unrelated problem with the CUDA drivers on the worker node. |
please test |
+1 Size: This PR adds an extra 16KB to repository Comparison SummarySummary:
CUDA Comparison SummarySummary:
ROCM Comparison SummarySummary:
|
+pdmv |
+Upgrade |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @rappoccio, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2) |
REMINDER @mandrenguyen, @sextonkennedy, @rappoccio, @antoniovilela: This PR was tested with #47669, please check if they should be merged together |
This PR and #47669 can be merged independently. |
+1 |
PR description:
This PR proposes a few modifications to
runTheMatrix.py
and correlated packages. It would add the possibility to:check if the default samples for the workflows requested are actually defined. This is done via the
-c
/--checkInputs
flag. This should solve [RFC] Minimal test of Configuration/PyReleaseValidation/python/relval_steps.py validity #46910 if in the routine PR tests one runsrunTheMatrix.py -n -c
;have a workflow start from a specific step (
GEN
,SIM
,DIGI
, ...) with the option--startFrom STEP
. This will remove all the steps before the one with acmsDriver.py
with-s STEP, [...]
;use a different file as input with
--recycle
. This is intended to be used either together with--startFrom
either on wfs that, as first step, use a pre-existing input;have duplicate wfs in input with option
-l WF, WF, WF [...]
with--allowDuplicates
. Each wf would run in a different job (if specified) and_jobX
is appended to the work area to avoid using the same folder;And when running with the
-gpu
option and multiple jobs with-j N
now each job would be assigned to a different GPU. Available GPUs may be also selected on the basis of the compute capability (only for NVIDIA) with the already existing--cuda-capabilities
or by name with the already existing--force-gpu-name
. If more jobs than available GPUs are requested, the job to GPU assignment will restart from the first GPU available until completion. So, e.g., with 8 jobs and 3 GPUs:[0, 3, 6]
[1, 4, 7]
[2, 5]
This should solve #47337