Skip to content

gael-millot/slivar_vcf_extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

slivar_vcf_extraction


downloads


Usage Requirement
Nextflow Dependencies: Nextflow Version
Dependencies: Apptainer Version
Dependencies: Graphviz Version



TABLE OF CONTENTS



AIM

Use slivar on a VCF file to:

  • annotate (added in the INFO field, if -g option of slivar is used).
  • add familial relationship information (added in the INFO field, if --trio, --family-expr, --group-exp, --sample-expr option of slivar is used).
  • filter (line, i.e., variant removal, according to quality, family criteria, etc.).


Return both an indexed .vcf.gz and a .tsv.gz file.



WARNINGS


  • Use nextflow DSL1. To install DSL1 and use it when DSL2 is already installed, see these java and nextflow instructions. This allows to install the nextflow-dls1 command, used below.

  • The code uses these following commands of slivar (see this slivar webpage for details):

    slivar expr --js ${fun} -g ${annot1} -g ${annot2} --vcf ${vcf} --ped ${ped} ${sample_expr} ${pedigree_expr} ${filter} -o "res.vcf"
    slivar tsv --ped ${ped} -s ${tsv_sample} ${tsv_info} res.vcf > res.tsv
    

    Thus, pay attention with the family_expr, sample and info parameters in the nextflow.config file.



CONTENT


Files and folder Description
main.nf File that can be executed using a linux terminal, a MacOS terminal or Windows 10 WSL2.
nextflow.config Parameter settings for the main.nf file. Users have to open this file, set the desired settings and save these modifications before execution.
bin folder Contains files required by the main.nf file.
Licence.txt Licence of the release.



INPUT


Required files
A variant Calling Format (VCF) file (zipped or not).
A jason file containing functions for the slivar --family-expr option. This file is present in the bin folder describe above.
A pedigree file.
A Cadd annotation file.
A Gnomad annotation file.

The dataset used in the nextflow.config file, as example, is available at https://zenodo.org/records/10723664.


Files Description
example.vcf.gz VCF file. Available here.
pedigree.txt Pedigree file. Available here.
cadd-1.6-SNVs-phred10-GRCh37.zip Cadd variant annotation v1.6 filtered at phred10. Available here.
gnomad-2.1.1-genome-GRCh37.zip Gnomad variant annotation v2.1.1. Available here.



HOW TO RUN

1. Prerequisite

Installation of:
nextflow DSL1
Graphviz, sudo apt install graphviz for Linux ubuntu
Apptainer

2. Local running (personal computer)

2.1. main.nf file in the personal computer

  • Mount a server if required:
DRIVE="Z" # change the letter to fit the correct drive
sudo mkdir /mnt/share
sudo mount -t drvfs $DRIVE: /mnt/share

Warning: if no mounting, it is possible that nextflow does nothing, or displays a message like:

Launching `main.nf` [loving_morse] - revision: d5aabe528b
/mnt/share/Users
  • Run the following command from where the main.nf and nextflow.config files are (example: \wsl$\Ubuntu-20.04\home\gael):
nextflow-dsl1 run main.nf -c nextflow.config

with -c to specify the name of the config file used.

2.2. main.nf file in the public git repository

Run the following command from where you want the results:

nextflow-dsl1 run gael-millot/slivar_vcf_extraction # github, or nextflow-dsl1 run http://github.com/gael-millot/slivar_vcf_extraction
nextflow-dsl1 run -hub pasteur gmillot/slivar_vcf_extraction -r v1.0.0 # gitlab

3. Distant running (example with the Pasteur cluster)

3.1. Pre-execution

Copy-paste this after having modified the EXEC_PATH variable:

EXEC_PATH="/pasteur/helix/projects/BioIT/gmillot/slivar_vcf_extraction" # where the bin folder of the main.nf script is located
export CONF_BEFORE=/opt/gensoft/exe # on maestro

export JAVA_CONF=java/13.0.2
export JAVA_CONF_AFTER=bin/java # on maestro
export APP_CONF=apptainer/1.3.5
export APP_CONF_AFTER=bin/apptainer # on maestro
export GIT_CONF=git/2.39.1
export GIT_CONF_AFTER=bin/git # on maestro
export GRAPHVIZ_CONF=graphviz/2.42.3
export GRAPHVIZ_CONF_AFTER=bin/graphviz # on maestro

MODULES="${CONF_BEFORE}/${JAVA_CONF}/${JAVA_CONF_AFTER},${CONF_BEFORE}/${APP_CONF}/${APP_CONF_AFTER},${CONF_BEFORE}/${GIT_CONF}/${GIT_CONF_AFTER}/${GRAPHVIZ_CONF}/${GRAPHVIZ_CONF_AFTER}"
cd ${EXEC_PATH}
chmod 755 ${EXEC_PATH}/bin/*.*
module load ${JAVA_CONF} ${APP_CONF} ${GIT_CONF} ${GRAPHVIZ_CONF}

3.2. main.nf file in a cluster folder

Modify the second line of the code below, and run from where the main.nf and nextflow.config files are (which has been set thanks to the EXEC_PATH variable above):

HOME_INI=$HOME
HOME="${HELIXHOME}/slivar_vcf_extraction/" # $HOME changed to allow the creation of .nextflow into /$HELIXHOME/slivar_vcf_extraction/, for instance. See NFX_HOME in the nextflow software script
nextflow-dsl1 run main.nf -c nextflow.config
HOME=$HOME_INI

3.3. main.nf file in the public git repository

Modify the first and third lines of the code below, and run (results will be where the EXEC_PATH variable has been set above):

VERSION="v1.0"
HOME_INI=$HOME
HOME="${HELIXHOME}/slivar_vcf_extraction/" # $HOME changed to allow the creation of .nextflow into /$HELIXHOME/slivar_vcf_extraction/, for instance. See NFX_HOME in the nextflow software script
nextflow-dsl1 run gael-millot/slivar_vcf_extraction -r $VERSION -c $HOME/nextflow.config #github, or nextflow-dsl1 run http://github.com/gael-millot/slivar_vcf_extraction -r $VERSION -c $HOME/nextflow.config
nextflow-dsl1 run -hub pasteur gmillot/slivar_vcf_extraction -r $VERSION -c $HOME/nextflow.config # gitlab
HOME=$HOME_INI

4. Error messages and solutions

Message 1

Unknown error accessing project `gmillot/slivar_vcf_extraction` -- Repository may be corrupted: /pasteur/sonic/homes/gmillot/.nextflow/assets/gmillot/slivar_vcf_extraction

Purge using:

rm -rf /pasteur/sonic/homes/gmillot/.nextflow/assets/gmillot*

Message 2

WARN: Cannot read project manifest -- Cause: Remote resource not found: https://gitlab.pasteur.fr/api/v4/projects/gmillot%2Fslivar_vcf_extraction

Contact Gael Millot (distant repository is not public).

Message 3

permission denied

Use chmod to change the user rights. Example linked to files in the bin folder:

chmod 755 bin/*.*



OUTPUT

An example of results obtained with the dataset is present at this address: https://zenodo.org/records/10723664/files/slivar_vcf_extraction_1709139998.zip

Files and folder Description
reports folder containing all the reports of the different processes including the nextflow.config file used.
res.vcf.gz annotated and filtered VCF file.
res.vcf.gz.tbi Index file of res.vcf.gz.
res.tsv.gz VCF file converted into a table, each row representing a different variant and a different patient. Columns description (depending on the tsv_info parameter):
  • mode: slivar info: filtering operated.
  • family_id: ID of the family.
  • sample_id: code of the patient.
  • chr:pos:ref:alt: chromosome, position (in bp), reference allele, alternative allele.
  • genotype(sample,dad,mom): 1: , .:no info.
  • AC: allele count in genotypes, for each ALT allele, in the same order as listed.
  • AF: allele Frequency, for each ALT allele, in the same order as listed.
  • AN: total number of alleles in called genotypes.
  • BaseQRankSum: Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities.
  • DB: dbSNP Membership.
  • DP: Approximate read depth; some reads may have been filtered.
  • ExcessHet: P hred-scaled p-value for exact test of excess heterozygosity.
  • FS: Phred-scaled p-value using Fisher's exact test to detect strand bias.
  • InbreedingCoeff: Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation.
  • MLEAC: Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed.
  • MLEAF: Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed.
  • MQ: RMS Mapping Quality.
  • MQRankSum: Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities.
  • QD: Variant Confidence/Quality by Depth.
  • ReadPosRankSum: Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias.
  • SOR: Symmetric Odds Ratio of 2x2 contingency table to detect strand bias.
  • VQSLOD: Log odds of being a true variant versus being false under the trained gaussian mixture model.
  • culprit: The annotation which was the worst performing in the Gaussian mixture model, likely the reason why the variant was filtered out.
  • CSQ: Consequence annotations from Ensembl VEP. See the VCF file header for the subfield descriptions.
  • cadd_phred: CAAD_PHRED from VEP Genome anNOTATion (gnotate),i.e., cadd_phred field from the VCF file.
  • gno_non_neuro_af_all: gnomad non neuro affected all, i.e., gno_non_neuro_af_all field of the VCF file.
  • gno_non_neuro_af_nfe: gnomad non neuro affected non finnish, i.e., gno_non_neuro_af_all field of the VCF file.
  • gno_non_neuro_nhomalt_all: gnomad non neuro number of homozygous alternative all, i.e., gno_non_neuro_nhomalt_all field of the VCF file.
  • gno_non_neuro_nhomalt_nfe: gnomad non neuro number of homozygous alternative non finnish, i.e., gno_non_neuro_nhomalt_nfe field of the VCF file.
  • highest_impact_order: impact order (lower is higher) of this variant across all genes and transcripts it overlaps. this integer can be used as a look into the order list to get the actual impact.
  • aff_only: Affected patient codes.
  • depths(sample,dad,mom): slivar info.
  • allele_balance(sample,dad,mom): slivar info.



VERSIONS

The different releases are tagged here



LICENCE

This package of scripts can be redistributed and/or modified under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. Distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchandability or fitness for a particular purpose. See the GNU General Public License for more details at https://www.gnu.org/licenses or in the Licence.txt attached file.



CITATION

Not yet published



CREDITS

Freddy Cliquet, GHFC, Institut Pasteur, Paris, France

Gael A. Millot, Hub, Institut Pasteur, Paris, France



ACKNOWLEDGEMENTS

The developers & maintainers of the mentioned softwares and packages, including:

Special acknowledgement to Brent Pedersen, Utrecht, The Netherlands, for the release of slivar.



WHAT'S NEW IN

2.5

  • In the nextflow.config file, downgrade apptainer -> singularity because does not work otherwise.

2.4

  • In the nextflow.config file, upgrade singularity -> apptainer.

v2.3

  • Dataset and results are in zenodo.
  • Transfert into github.

v2.2

README improved.

v2.0

Compression added, tsv file optional

v1.0

Everything

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published