Skip to content

Overusing threads argument when running parallel commands #55

@holmeso

Description

@holmeso

Hi,
It looks like the threads parameter that is passed as an argument to /opt/bin/run_clairs is sometimes being used to determine the number of jobs within a parallel command, and also being passed to the command that parallel is running.

eg. Lines 1066 - 1077 from https://github.com/HKU-BAL/ClairS/blob/main/run_clairs
...
pt_command = '( ' + time + args.parallel
pt_command += ' --joblog ' + args.output_dir + '/logs/clair3_log/parallel_4_phase_tumor.log'
pt_command += ' -j ' + str(args.threads)
pt_command += ' ' + args.clair3_option.longphase + ' phase '
pt_command += ' -s ' + clair3_output_path + '/vcf/{1}.vcf'
pt_command += ' -b ' + (args.tumor_bam_fn if not args.use_normal_bam_for_intermediate_phasing else args.normal_bam_fn)
pt_command += ' -r ' + args.ref_fn
pt_command += ' -t ' + str(args.threads)
pt_command += ' --indels ' if args.use_heterozygous_indel_for_intermediate_phasing else ""
pt_command += ' -o ' + clair3_output_path + '/phased_output/tumor_phased_{1}'
pt_command += ' --ont' if args.platform == 'ont' else ' --pb'
pt_command += ' :::: ' + args.output_dir + '/tmp/CONTIGS'
...

So if you run /opt/bin/run_clairs with --threads 24, then you could potentially have 576 threads being activated for this 1 command.
Happy to submit a PR with suggested changes if that would be desirable?
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions