Skip to content

Dirac CWL Executor + testing with LHCb Workflows (simulation and analysis productions)#57

Closed
ryuwd wants to merge 29 commits intoDIRACGrid:mainfrom
ryuwd:roneil-lhcb-wf
Closed

Dirac CWL Executor + testing with LHCb Workflows (simulation and analysis productions)#57
ryuwd wants to merge 29 commits intoDIRACGrid:mainfrom
ryuwd:roneil-lhcb-wf

Conversation

@ryuwd
Copy link
Copy Markdown
Contributor

@ryuwd ryuwd commented Nov 19, 2025

for #52

Adds a new dirac-cwl-run utility for running CWL workflows with input data resolution according to the concepts introduced in #69 with Dirac LFNs, the "replica catalog", and a proof-of-concept for using input data (in LHCb this is the "Bookkeeping") queries to pre-fill empty input-data workflow parameters.

Also adds converters for LHCb production YAMLs --> CWL workflow files.

Tested with LHCb simulation and analysis productions workflows.

@ryuwd ryuwd changed the title LHCb Workflows LHCb Workflows (simulation) Dec 1, 2025
- 0
events: -1
priority: 2
multicore: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, with the latest changes made to dirac-production-request-launch, multicore is going to be false for analysis productions.
It's going to be merged within the week if everything works as expected 🤞

for step_index, step in enumerate(steps):
step_name = _sanitizeStepName(step.get("name", f"step_{step_index}"))
step_names.append(step_name)
cwl_step = _buildCWLStep(production, step, step_index, workflow_inputs, step_names)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you need the changes I am making to lb-mc to integrate the submission-info section here to build subworkflows (that would become transformations).

@ryuwd ryuwd changed the title LHCb Workflows (simulation) LHCb Workflows (simulation and analysis productions) Dec 24, 2025
@ryuwd ryuwd changed the title LHCb Workflows (simulation and analysis productions) Dirac CWL Executor + testing with LHCb Workflows (simulation and analysis productions) Dec 29, 2025
Copy link
Copy Markdown
Contributor

@aldbr aldbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That looks very promising 🙂

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops: .yam -> .yaml

Comment on lines +6 to +20
- class: dirac:inputDataset
event_type: '90000000'
conditions_dict:
inTCKs: ALL
inProPass: Real Data/Reco17/Stripping29r2p2
configName: LHCb
inFileType: CHARM.MDST
configVersion: Collision17
inProductionID: ALL
inDataQualityFlag: OK
launch_parameters:
end_run:
start_run:
run_numbers:
conditions_description: Beam6500GeV-VeloClosed-MagUp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of context about the dirac:execution-hooks hint because I feel like the concepts are very similar here, but I might be wrong.

The current dirac:execution-hooks hint is used at 2 different locations:

  • transformation system: use specific parameters to query a FileCatalog/Bookkeeping to get LFNs and attach them to a job to submit
  • job: use the hooks to pre/post process the jobs and the specific parameters to register the outputs in a specific part of the FileCatalog/Bookkeeping

Now you are right, the current dirac:execution-hooks hint is not global to a CWL production, but actually defined in every transformation.
I think the reason is that I initially wrote CWL workflows where the input parameters were used to build the query (based on an example provided by CTAO).

Example with mandelbrot:

Then CTAO wrote an example where they were using parameters not defined as inputs of the workflow: https://github.com/aldbr/dirac-cwl-proto/blob/dc2cee46c1b804a6ec5add6da5dd7123c990630e/src/dirac_cwl_proto/execution_hooks/plugins/core.py#L32-L35

Is that something you actually need @arrabito?

In this PR, you also define query parameters not defined as inputs of the workflow @ryuwd .

If that's the case for everyone, then may be we should have a discussion about:

  • are the query parameters always different from the input parameters of the workflows? Can we use both as query parameters?
  • are the query parameters the same for the whole workflow? (e.g. production with 3 transformations that would have the same query parameters)

Then we should think whether and how we should separate:

  • execution hooks (pre/post processing of the jobs),
  • query parameters (used by the transformation system to get LFNs, as well as the jobs to store LFNs at the end),
  • other input/output parameters (e.g. output data and sandbox).

In the meantime, just for your context, you could have potentially created an execution hooks plugin such as (but you would have to define it for every transformation, which I agree, is not convenient in this context):

class LHCbBasedPlugin(ExecutionHooksBasePlugin):

    # LFN parameters
    in_tcks: str = Field(...)
    in_pro_pass: Optional[str] = Field(...)
    config_name: Optional[str] = Field(...)
    in_file_type: Optional[str] = Field(...)
    ...


Generated by ProductionRequestToCWL converter
inputs:
- id: production-id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just curious, why do you need the production-id here?
Same question for the prod-job-id.

I would have been tempted, if possible, to remove anything DIRAC-related from the CWL workflow itself, and move DIRAC-specific attributes to a hint (may be this is not possible).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed in newer versions of the LHCb CWL generator in LbAPCommon

doc: Production Job ID
default: 6789
type: int
- id: input-data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what do you expect here? LFNs you got from you bk query? Is this going to work if you have a very large number of files?

I had issues with some workflows with my converter. You can see how I dealt with it if you check the archive I gave you a few weeks/months ago but basically, I did:

input-data:
  class: File
  contents: '["/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023715_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023714_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023753_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023754_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023685_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023697_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023710_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023711_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023693_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023686_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023708_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023682_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023674_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023688_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023673_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023680_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023675_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023706_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023681_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023691_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023699_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023712_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023695_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023713_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023704_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023696_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023676_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023732_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023719_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023747_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023716_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023724_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023728_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023723_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023740_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023749_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023743_1.charm_d02hh_dvntuple.root",
    "/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023748_1.charm_d02hh_dvntuple.root"]'

configuration:
cpu: '1000000'
priority: 2
multicore: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • priority would likely go in the scheduling hint (though I think it's going to disappear in diracx with the work on the new matcher)
  • cpu would likely be defined in the CWL itself as a requirements no?
  • multicore would also be part of the CWL requirements
  • I think output_se, remove_inputs_flags, output_file_mask, output_mode, input_data_policy would be defined outside of configuration (because configuration contains your query parameters for the FC/Bookkeeping)
  • do you need to use ancestore_depth and events as query parameters? Do you know why it's here?

logger = logging.getLogger("dirac-cwl-run")


class DiracPathMapper(PathMapper):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to understand why we need this DiracPathMapper, can you explain please?

Because, I would have naively thought that adding the replica catalog to the active CLT directory would be enough: the application needing the replica catalog would read it and use whatever PFNs it needs.

Copy link
Copy Markdown
Contributor Author

@ryuwd ryuwd Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My naive understanding is CWL needs to be told various ways to deal with paths and this class is the way. The StdFsAccess deals with filesystem access. I'm not sure yet if there's a cleaner way to do all this. The nice feature of having this is we can directly map LFNs to PFNs using the replica catalog before the paths reach the command line. I am not sure how much of a good idea this is for LHCb jobs, but it seems like a nice default to have to hide this detail from end users writing their workflows.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The better answer is the the CWL PathMapper can't deal with LFNs so we need to write our own to pass them through or resolve them to PFNs using the replica map



# Apply the monkey-patch
CommandLineTool.make_path_mapper = staticmethod(_custom_make_path_mapper)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you would need to override the CommandLineTool to avoid the monkey patch.
Overall, I think that would be a good idea to ask for guidance (just to make sure we are going in the right directions here) in the CWL forum. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is ideal

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#115 has dealt with this but I ran into some bizarre issues with mypyc distributed cwltool which might need some further discussion

@ryuwd
Copy link
Copy Markdown
Contributor Author

ryuwd commented Jan 24, 2026

New plan:

  1. Remove replica catalog stuff and import from diracx

  2. Refactoring / cleanup of the executor

  3. Remove converters (now lives in analysis productions, eventually may move simulation part to lbmcsubmit)

  4. Regenerate some example CWL files that can be used in tests.

  5. Write tests of the cwl executor.

@ryuwd
Copy link
Copy Markdown
Contributor Author

ryuwd commented Feb 2, 2026

Moving the executor to #94

@aldbr aldbr removed a link to an issue Feb 9, 2026
4 tasks
@ryuwd ryuwd closed this Feb 17, 2026
@ryuwd
Copy link
Copy Markdown
Contributor Author

ryuwd commented Feb 17, 2026

Another PR to be opened focused purely on tests with the dirac CWL executor and with LHCb workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to deal with input data resolution in different execution contexts and with a replica catalog

2 participants