Skip to content

Spectre fails with ValueError when contig header contains attributes other than ID and length #6

Open
@Schmytzi

Description

@Schmytzi

Spectre crashes with a ValueError (see below) when encountering a contig header with additional attributes that are not ID or length. This happens because of a failing integer conversion.

How to Reproduce

Pass this VCF from the nf-core test data to Spectre: https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/homo_sapiens/illumina/vcf/NA12878_GIAB.chr22.vcf.gz

The cause is probably the regex that parses the header, as group 2 captures everything after length=. See vcf_parser.py

Spectre Log:

 spectre::INFO> Spectre version: 0.3.2                                                                                                                                                                                                                    
    spectre::INFO> Spectre enabled                                                                                                                                                                                                                           
    spectre::INFO> Extraction of metadata is activated                                                                                                                                                                                                       
    spectre::INFO> Generating metadata based on reference: genome.fasta.gz                                                                                                                                                                                   
    spectre::INFO> Writing metadata to file: out/metadata.mdr                                                                                                                                                                                                
    spectre::INFO> Reading >chr22                                                                                                                                                                                                                            
    spectre::INFO> Calculating bp statistics                                                                                                                                                                                                                 
    spectre::INFO> Calculating N positions                                                                                                                                                                                                                   
    spectre::INFO> Writing report                                                                                                                                                                                                                            
    spectre::INFO> out                                                                                                                                                                                                                                      
    spectre::INFO> No blacklist was provided
    spectre::INFO> Spectre calculating for: /home/xschmy/projects/nf-core-modules/.nf-test/tests/20bb1cc1452c60cbf2cd84a22373f0/work/c6/f1f84104a4905b7e68fd0cd9539ebe and bin size: 1000                                                                   
    /opt/conda/lib/python3.9/site-packages/numpy/_core/fromnumeric.py:3596: RuntimeWarning: Mean of empty slice.
      return _methods._mean(a, axis=axis, dtype=dtype,
    /opt/conda/lib/python3.9/site-packages/numpy/_core/_methods.py:138: RuntimeWarning: invalid value encountered in scalar divide
      ret = ret.dtype.type(ret / rcount)
    spectre::INFO> Data normalization and outlier removal (right tail)
    spectre::INFO> Parsing VCF to AF freqs file
    multiprocessing.pool.RemoteTraceback: 
    """
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 125, in worker
        result = (True, func(*args, **kwds))
      File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
        return list(map(*args))
      File "/opt/conda/lib/python3.9/site-packages/spectre/main.py", line 120, in outside_spectre_worker
        worker.cnv_call()
      File "/opt/conda/lib/python3.9/site-packages/spectre/spectreCNV.py", line 100, in cnv_call
        self.cnv_analysis.vcf_based_af_bins_annotation()
      File "/opt/conda/lib/python3.9/site-packages/spectre/analysis/analysis.py", line 118, in vcf_based_af_bins_annotation
        vcf_df = pl.from_pandas(vcf.vcf_to_dataframe(self.snv_file))
      File "/opt/conda/lib/python3.9/site-packages/spectre/util/vcf_parser.py", line 65, in vcf_to_dataframe
        [chr_id, chr_len] = [chr_id_len.group(1), int(chr_id_len.group(2))]
    ValueError: invalid literal for int() with base 10: '248956422,assembly=human_GRCh38_no_alt_analysis_set.fasta'
    """
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/opt/conda/bin/spectre", line 10, in <module>
        sys.exit(run_main())
      File "/opt/conda/lib/python3.9/site-packages/spectre/main.py", line 510, in run_main
        spectre_run.spectre_exe()
      File "/opt/conda/lib/python3.9/site-packages/spectre/main.py", line 274, in spectre_exe
        results = pool.map(outside_spectre_worker, tuple(spectre_instructions))
      File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 364, in map
        return self._map_async(func, iterable, mapstar, chunksize).get()
      File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 771, in get
        raise self._value
    ValueError: invalid literal for int() with base 10: '248956422,assembly=human_GRCh38_no_alt_analysis_set.fasta'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions