Description of the bug
I'm trying to run the pipeline, but BUSCO always fails. It seems to manage to run successfully on some bins and I can check the statistics inside the job's folder, but at some point it throws an error, and the job fails. I've been trying to troubleshoot this to no avail, could you perhaps point me in the right direction? My parms and custom resources are as follows.
Custom resources (this is just to avoid errors with CATPACK and to give metaspades enough resources to finish on the first attempt):
process {
resourceLimits = [
cpus: 48,
memory: 799.GB
]
withName: METASPADES {
cpus = { params.spades_fix_cpus != -1 ? params.spades_fix_cpus : (20 * task.attempt) }
memory = { 128.GB * (2 ** (task.attempt - 1)) }
}
withName: 'NFCORE_MAG:MAG:CATPACK:CATPACK_BINS' {
cpus = {12 * task.attempt}
time = {24.h * task.attempt}
}
withName: 'NFCORE_MAG:MAG:CATPACK:CATPACK_SUMMARISE_BINS' {
ext.when = false
}
}
params {
skip_cat_summarise = true
}
Params json:
{
"input": "\/home\/masalgar\/DeiC-KU-L59\/users\/masalgar\/CF_metagenome\/mag_samples.csv",
"outdir": "\/home\/masalgar\/DeiC-KU-L59\/users\/masalgar\/CF_metagenome\/mag\/",
"multiqc_title": "CF_Metagenome mag assembly",
"skip_clipping": true,
"skip_shortread_qc": true,
"cat_db": "\/home\/masalgar\/DeiC-KU-L59\/databases\/CAT\/nr\/",
"run_checkm2": true,
"checkm2_db": "\/home\/masalgar\/DeiC-KU-L59\/databases\/CheckM2\/CheckM2_database\/uniref100.KO.1.dmnd",
"gtdb_db": "\/home\/masalgar\/DeiC-KU-L59\/databases\/GTDB\/gtdbtk_data.tar.gz",
"skip_prokka": true,
"exclude_unbins_from_postbinning": true,
"run_busco": true,
"busco_clean": true,
"busco_db": "\/home\/masalgar\/DeiC-KU-L59\/databases\/BUSCO\/",
"refine_bins_dastool": true,
"gtdbtk_pplacer_useram": true,
"postbinning_input": "refined_bins_only",
"run_gunc": true,
"gunc_db": "\/home\/masalgar\/DeiC-KU-L59\/databases\/GUNC\/gunc_db_progenomes2.1.dmnd",
"generate_bigmag_file": true
}
Command used and terminal output
nextflow run nf-core/mag -r 5.4.0 -profile apptainer -resume \
-name CF_m_mag_trimmed-and-dedup_final5 \
-c /home/masalgar/DeiC-KU-L59/users/masalgar/CF_metagenome/mag_custom_resources.config \
-work-dir /home/masalgar/DeiC-KU-L59/users/masalgar/tmp \
-params-file /home/masalgar/DeiC-KU-L59/users/masalgar/CF_metagenome/mag_CF_metagenome_parms.json
Relevant files
I've attached nextflow.json and the log files of one of the failed of one of the failed BUSCO jobs.
.nextflow.log
.command.log
.command.err.txt
.command.out.txt
The error from the .err file is as follows:
Exception in thread "main" java.lang.NumberFormatException: Cannot parse null string
at java.base/java.lang.Integer.parseInt(Integer.java:550)
at java.base/java.lang.Integer.<init>(Integer.java:1065)
at phylolab.taxonamic.PPlacerJSONMerger.relabelJson(PPlacerJSONMerger.java:172)
at phylolab.taxonamic.PPlacerJSONMerger.main(PPlacerJSONMerger.java:288)
Traceback (most recent call last):
File "/opt/conda/bin/run_sepp.py", line 26, in <module>
ExhaustiveAlgorithm().run()
File "/opt/conda/lib/python3.12/site-packages/sepp/algorithm.py", line 205, in run
self.merge_results()
File "/opt/conda/lib/python3.12/site-packages/sepp/exhaustive.py", line 292, in merge_results
mergeJsonJob.run()
File "/opt/conda/lib/python3.12/site-packages/sepp/jobs.py", line 150, in run
raise JobError("\n".join([
sepp.scheduler.JobError: The following execution failed:
java -jar /opt/conda/bin/seppJsonMerger.jar - - /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/placement_files/output_placement.json
json locations: [/faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_0/pplacer.extended.0.38dx0iyf.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_0/pplacer.extended.1.ec1om_7a.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_0/pplacer.extended.2.x_nnddst.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_1/pplacer.extended.0.i0uwb1ad.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_1/pplacer.extended.1.jmtuj2t6.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_1/pplacer.extended.2.xx22nr8m.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_2/pplacer.extended.0.69nlt327.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_2/pplacer.extended.1.zxe6bwxa.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_4/pplacer.extended.0.bsqyufq8.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_7/pplacer.extended.0.nhkmd5t2.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_7/pplacer.extended.1._ymc4c_0.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_7/pplacer.extended.2.q7g73_u9.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_8/pplacer.extended.0.og_13lua.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_8/pplacer.extended.1.87wbxge5.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_8/pplacer.extended.2.onn18c8q.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_12/pplacer.extended.0.1p2kflh0.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_14/pplacer.extended.0.f2l4_sot.jplace, /faststorage/project/DeiC-KU-L59/users/masalgar/tmp/1a/8ab5b582aba99b1387baa123069741/CF_metagenome_dedup-auto-busco/MEGAHIT-MaxBin2Refined-CF_metagenome_dedup.003.fa/auto_lineage/run_bacteria_odb12/sepp_tmp_files/output._tenenrn/root/P_14/pplacer.extended.1.ubgw_ugq.jplace]
Exception in thread "main" java.lang.NumberFormatException: Cannot parse null string
at java.base/java.lang.Integer.parseInt(Integer.java:550)
at java.base/java.lang.Integer.<init>(Integer.java:1065)
at phylolab.taxonamic.PPlacerJSONMerger.relabelJson(PPlacerJSONMerger.java:172)
at phylolab.taxonamic.PPlacerJSONMerger.main(PPlacerJSONMerger.java:288)
System information
No response
Description of the bug
I'm trying to run the pipeline, but BUSCO always fails. It seems to manage to run successfully on some bins and I can check the statistics inside the job's folder, but at some point it throws an error, and the job fails. I've been trying to troubleshoot this to no avail, could you perhaps point me in the right direction? My parms and custom resources are as follows.
Custom resources (this is just to avoid errors with CATPACK and to give metaspades enough resources to finish on the first attempt):
Params json:
Command used and terminal output
Relevant files
I've attached nextflow.json and the log files of one of the failed of one of the failed BUSCO jobs.
.nextflow.log
.command.log
.command.err.txt
.command.out.txt
The error from the .err file is as follows:
System information
No response