Open
Description
Spectre crashes with a ValueError (see below) when encountering a contig header with additional attributes that are not ID
or length
. This happens because of a failing integer conversion.
How to Reproduce
Pass this VCF from the nf-core test data to Spectre: https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/homo_sapiens/illumina/vcf/NA12878_GIAB.chr22.vcf.gz
The cause is probably the regex that parses the header, as group 2 captures everything after length=
. See vcf_parser.py
Spectre Log:
spectre::INFO> Spectre version: 0.3.2
spectre::INFO> Spectre enabled
spectre::INFO> Extraction of metadata is activated
spectre::INFO> Generating metadata based on reference: genome.fasta.gz
spectre::INFO> Writing metadata to file: out/metadata.mdr
spectre::INFO> Reading >chr22
spectre::INFO> Calculating bp statistics
spectre::INFO> Calculating N positions
spectre::INFO> Writing report
spectre::INFO> out
spectre::INFO> No blacklist was provided
spectre::INFO> Spectre calculating for: /home/xschmy/projects/nf-core-modules/.nf-test/tests/20bb1cc1452c60cbf2cd84a22373f0/work/c6/f1f84104a4905b7e68fd0cd9539ebe and bin size: 1000
/opt/conda/lib/python3.9/site-packages/numpy/_core/fromnumeric.py:3596: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/opt/conda/lib/python3.9/site-packages/numpy/_core/_methods.py:138: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
spectre::INFO> Data normalization and outlier removal (right tail)
spectre::INFO> Parsing VCF to AF freqs file
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/opt/conda/lib/python3.9/site-packages/spectre/main.py", line 120, in outside_spectre_worker
worker.cnv_call()
File "/opt/conda/lib/python3.9/site-packages/spectre/spectreCNV.py", line 100, in cnv_call
self.cnv_analysis.vcf_based_af_bins_annotation()
File "/opt/conda/lib/python3.9/site-packages/spectre/analysis/analysis.py", line 118, in vcf_based_af_bins_annotation
vcf_df = pl.from_pandas(vcf.vcf_to_dataframe(self.snv_file))
File "/opt/conda/lib/python3.9/site-packages/spectre/util/vcf_parser.py", line 65, in vcf_to_dataframe
[chr_id, chr_len] = [chr_id_len.group(1), int(chr_id_len.group(2))]
ValueError: invalid literal for int() with base 10: '248956422,assembly=human_GRCh38_no_alt_analysis_set.fasta'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/bin/spectre", line 10, in <module>
sys.exit(run_main())
File "/opt/conda/lib/python3.9/site-packages/spectre/main.py", line 510, in run_main
spectre_run.spectre_exe()
File "/opt/conda/lib/python3.9/site-packages/spectre/main.py", line 274, in spectre_exe
results = pool.map(outside_spectre_worker, tuple(spectre_instructions))
File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
ValueError: invalid literal for int() with base 10: '248956422,assembly=human_GRCh38_no_alt_analysis_set.fasta'
Metadata
Metadata
Assignees
Labels
No labels