You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As input, the user provides one or more VCF files (Danecek *et al.* 2011) and a threshold value indicating a maximum proportion of missing data per individual.
49
+
50
+
For each VCF file entered, the tool calculates the proportion of missing genotypes for each individual (or sample), and removes those with a value above the user-defined threshold, resulting in the output file.
51
+
52
+
the output VCF file.
53
+
Example: if MAX_MISSING_IND = 0.25 (default value), all individuals with missing data on more than 25% of loci in the input dataset are removed from the output VCF file.
54
+
55
+
Tips
56
+
====
57
+
Threshold values for missing data depend on the study objectives.
58
+
They generally vary between 20% and 75% of loci for low-stringency filters, and between 5% and 25% loci for high-stringency filters (Hemstrom *et al.* 2024).
59
+
Using 0.0 (0%) means that only individuals with no missing data are retained for further analysis.
60
+
Using 1.0 (100%) means that no filtering is applied.
61
+
62
+
See also examples of VCF formats in : https://samtools.github.io/hts-specs/VCFv4.5.pdf
63
+
64
+
]]></help>
65
+
66
+
<citations>
67
+
<citationtype="bibtex">
68
+
@article{10.1093/gigascience/giab008,
69
+
author = {Danecek, Petr and Bonfield, James K and Liddle, Jennifer and Marshall, John and Ohan, Valeriu and Pollard, Martin O and Whitwham, Andrew and Keane, Thomas and McCarthy, Shane A and Davies, Robert M and Li, Heng},
70
+
title = "{Twelve years of SAMtools and BCFtools}",
71
+
journal = {GigaScience},
72
+
volume = {10},
73
+
number = {2},
74
+
year = {2021},
75
+
abstract = "{SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods.The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines.Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed \\>1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.}",
author = {Danecek, Petr and Auton, Adam and Abecasis, Goncalo and Albers, Cornelis A. and Banks, Eric and DePristo, Mark A. and Handsaker, Robert and Lunter, Gerton and Marth, Gabor and Sherry, Stephen T. and McVean, Gilean and Durbin, Richard and 1000 Genomes Project Analysis Group},
author={Hemstrom, William and Grummer, Jared A. and Luikart, Gordon and Beja-Pereira, Albano and Waples, Robin S. and Funk, W. Chris and Shafer, Aaron B. A. and Allendorf, Frederick W.},
101
+
title={Next-generation data filtering in the genomics era},
102
+
journal={Nature Reviews Genetics},
103
+
volume={25},
104
+
number={11},
105
+
pages={750--767},
106
+
year={2024},
107
+
doi={10.1038/s41576-024-00738-6},
108
+
url={https://doi.org/10.1038/s41576-024-00738-6}
109
+
}
110
+
</citation>
111
+
<citationtype="bibtex">
112
+
@article{Danecek2011,
113
+
author = {Danecek, Petr and Auton, Adam and Abecasis, Goncalo and Albers, Cornelis A. and Banks, Erica and DePristo, Mark A. and Handsaker, Robert E. and Lunter, Gerton and Marth, Gabor T. and Sherry, Stephen T. and McVean, Gil and Durbin, Richard and {1000 Genomes Project Analysis Group}},
0 commit comments