Skip to content

Adding support for HTCondor pools without shared filesystems #14

Open
@jhiemstrawisc

Description

@jhiemstrawisc

Many HTCondor pools don't have a shared filesystem, but this plugin relies on one.

I started looking into how the plugin constructs its submit file, and I suspect a good starting point toward relaxing its dependence on a shared filesystem would be to remove a few absolute paths and rely on HTCondor's file transfer mechanism to make these files available in each job's scratch directory on the execution point.

In particular, I noticed several filepaths in the construction of the submit file's arguments parameter:

-m snakemake --snakefile /access/point/path/to/Snakefile --target-jobs 'log_parameters:algorithm=omicsintegrator1,params=params-PU62FNV' --allowed-rules 'log_parameters' --cores 1 --attempt 1 --force-use-threads  --wait-for-files '/access/point/path/to/.snakemake/tmp.nej38zse' --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --rerun-triggers input software-env code mtime params --conda-frontend mamba --shared-fs-usage sources software-deployment persistence storage-local-copies input-output source-cache --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --configfiles /access/point/path/to/example_config.yaml --latency-wait 5 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path /access/point/path/to/miniconda3/envs/spras/bin --default-resources base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= --mode remote

I don't know snakemake well enough to know what all of these arguments are doing, but things like the Snakefile and the config.yml could be provided to each job at the EP by modifying the submit_dict to contain something like:

executable = python
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = /access/point/path/to/Snakefile, /access/point/path/to/input, /access/point/path/to/example_config.yml, ...
arguments = "-m snakemake --snakefile Snakefile ... --configfiles example_config.yml ... "

When HTCondor transfers these files to the execution point, it will flatten them into the job's scratch directory.

What other blockers are there to making something like this work? Perhaps one route to consider is making an HTCondor storage provider plugin that sets some of this up?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions