Skip to content

Latest commit

 

History

History
34 lines (23 loc) · 1.34 KB

File metadata and controls

34 lines (23 loc) · 1.34 KB

SnpEff Chromosome Mapping Error — Notes

What happened

Running snpEff Mycobacterium_tuberculosis_h37rv filtered_variants.recode.vcf produced ERROR_CHROMOSOME_NOT_FOUND for all variant records.

Root cause

The VCF file (produced by aligning to GCF_000195955.2_ASM19595v2_genomic.fna) uses the chromosome identifier NC_000962.3. The pre-built SnpEff database Mycobacterium_tuberculosis_h37rv was built from a different reference with a different chromosome ID — the identifiers do not match.

Fix attempted

A custom SnpEff database was built using the matching .fna and .gff files:

  1. Added to snpEff.config: MTB.genome : Mycobacterium_tuberculosis_H37Rv
  2. Created directory: ~/snpeff_custom/data/MTB/ with sequences.fa and genes.gff
  3. Ran: snpEff build -gff3 -v MTB -dataDir ~/snpeff_custom/data

Why it still failed

SnpEff requires additional validation files (cds.fa and protein.fa) during the database build. These are often missing or incorrectly formatted in bacterial GFF3 files downloaded from NCBI. The database build failed at the CDS/protein check step.

Impact

Functional annotation was not completed. The filtered VCF (filtered_variants.recode.vcf) with 3,525 high-quality variants (Phred Q ≥ 30) is available in results/task5/ and can be annotated with a correctly configured SnpEff database.