Skip to content

schatzlab/Watershed-SNV-WDL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Watershed pipeline for SNVs (WDL implementation)

Description of inputs:

  • File cadd_anno_header: File containing VCF header lines describing the annotations.
    • See the VCF 4.2 specification for full details.
    • For example, one line could be: ##INFO=<ID=SIFTval,Number=1,Type=String,Description="SIFT score">
    • It is best to set Type=String to be robust to missing values which are coded in unpredictable formats.
  • File cadd_cache: CADD annotations in tabular format.
  • File cadd_cache_idx: Tabix index (.tbi) file for cadd_cache.
  • File cadd_cols2keep: File indicating which columns of the . See bcftools annotate -C documentation for full details, but briefly:
    • Columns must be listed in order they appear in cadd_cache.
    • Columns representing chromosome, position, reference and alternate alleles must be labelled CHROM,POS,REF,ALT.
    • Columns to drop are listed as -. Columns to keep are given a name.
  • File chr_rename_file: A file with two columns of chromosome codes: one of the chromosome names in your vcfs, and the other with chromosomes named as 1,2,...22,X.
    • This is used to make vcfs which have the chromosome naming scheme chr1,chr2... etc. compatible with the cadd_cache`.
  • File chr_unrename_file: Similar to chr_rename_file, but maps the chromosome codes back to how they were before.
  • File gerp_bw: BigWig (.bw) file of GERP scores downloadable here (used by VEP's loftee plugin).
  • File human_ancestor_seq: Human ancestor sequence file downloadable here (used by VEP's loftee plugin).
  • File phylocsf_db: SQL database of PhyloCSF metrics downloadable here (used by VEP's loftee plugin).
  • File phylop100_bw: BigWig (.bw) file of phyloP100way scores, downloadable from UCSC here.
    • These scores represent the degree to which variants are conserved in a collection of 100 non-human vertebrate species. For more information, see this page of the UCSC Genome Browser site.
  • Array[File] vcfs: VCF (or BCF) file(s) to be annotated.
    • The files must contain INFO/AC and INFO/AN fields at minimum.
  • File vep_cache: v115 of the cache file for Ensembl's Variant Effect Predictor (VEP), (downloadable here).
  • (Optional) File filter_regions: File of regions to filter the vcfs by, one region per line.
  • (Optional) File filter_samples: File of sample ids to filter the vcfs by, one id per line.
  • (Optional) Int n_cpu: Number of cores to allocate. More cores will make the workflow finish more quickly, but also cost slightly more.
    • For example, a run that took 3hr:45min and $1.75 on 8 cores, took 1hr:30min and $2.73 on 32 cores.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors