SnpEff Chromosome Mapping Error — Notes

What happened

Running snpEff Mycobacterium_tuberculosis_h37rv filtered_variants.recode.vcf produced ERROR_CHROMOSOME_NOT_FOUND for all variant records.

Root cause

The VCF file (produced by aligning to GCF_000195955.2_ASM19595v2_genomic.fna) uses the chromosome identifier NC_000962.3. The pre-built SnpEff database Mycobacterium_tuberculosis_h37rv was built from a different reference with a different chromosome ID — the identifiers do not match.

Fix attempted

A custom SnpEff database was built using the matching .fna and .gff files:

Added to snpEff.config: MTB.genome : Mycobacterium_tuberculosis_H37Rv
Created directory: ~/snpeff_custom/data/MTB/ with sequences.fa and genes.gff
Ran: snpEff build -gff3 -v MTB -dataDir ~/snpeff_custom/data

Why it still failed

SnpEff requires additional validation files (cds.fa and protein.fa) during the database build. These are often missing or incorrectly formatted in bacterial GFF3 files downloaded from NCBI. The database build failed at the CDS/protein check step.

Impact

Functional annotation was not completed. The filtered VCF (filtered_variants.recode.vcf) with 3,525 high-quality variants (Phred Q ≥ 30) is available in results/task5/ and can be annotated with a correctly configured SnpEff database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SnpEff Chromosome Mapping Error — Notes

What happened

Root cause

Fix attempted

Why it still failed

Impact

FilesExpand file tree

snpeff_chromosome_error_notes.md

Latest commit

History

snpeff_chromosome_error_notes.md

File metadata and controls

SnpEff Chromosome Mapping Error — Notes

What happened

Root cause

Fix attempted

Why it still failed

Impact