-
Notifications
You must be signed in to change notification settings - Fork 9
RADdata
Lindsay Clark edited this page Sep 27, 2019
·
17 revisions
"RADdata" is an S3 class for storing all data and parameters pertaining to a GBS or RAD-seq dataset.
-
alleleDepth
: Read depth for each allele in each taxon. Stored as an integer matrix with taxa in rows and alleles in columns. Rather than NA for missing data, there should be zeros to indicate no reads. -
alleles2loc
: An integer vector with one value for each column ofalleleDepth
. The number indicates the identity of the locus to which the allele belongs. A locus can have any number of alleles assigned to it (including zero). -
locTable
: A data frame, where locus names are row names. There must be at least as many rows as the highest value ofalleles2loc
; each number inalleles2loc
corresponds to a row index inlocTable
. No columns are required, although if provided a column named "Chr" will be used for indicating chromosome identities and a column named "Pos" will be used for indicating physical position. -
possiblePloidies
: A list, where each item in the list is an integer vector. Each vector indicates an inheritance pattern that SNPs in the dataset might obey.2
indicates diploid,4
indicates autotetraploid,c(2, 2)
indicates allotetraploid, etc. -
alleleNucleotides
: A character vector with one value for each column ofalleleDepth
, indicating the DNA sequence for that allele. Typically only the sequence at variable sites is provided. (Although having all sites spanning the region with any variable sites will enable better investigation of mutations in CDS later.) The attribute "Variable_sites_only" indicates whether only sequence at variable sites is provided. -
locDepth
: A matrix with taxa in rows and loci in columns, with read depth summed across all alleles for each locus. Column names are locus numbers rather than locus names. SeeGetLocDepth
for retrieving the same matrix but with locus names as column names. -
depthRatio
: A numeric matrix with taxa in rows and alleles in columns. Calculated asalleleDepth / locDepth
. Used by other polyRAD functions for rough estimation of genotypes and allele frequency. -
antiAlleleDepth
: An integer matrix with taxa in rows and alleles in columns. For each allele, the number of reads from the locus that do NOT belong to that allele. Calculated aslocDepth - alleleDepth
. Used for likelihood estimations byAddGenotypeLikelihood
.
-
alleleFreq
: Allele frequencies. This is a vector of values ranging from zero to one, with one value per allele. Added byAlleleFreqHWE
andAlleleFreqMapping
. This vector additionally has an attribute called "type" that indicates what parameters were used for estimating allele frequency; this can be "individual frequency", "posterior prob", or "depth ratio". -
depthSamplingPermutations
: An integer matrix with taxa in rows and alleles in columns. It is calculated as log(locDepth
choosealleleDepth
). This is used as a coefficient for likelihood estimations done byAddGenotypeLikelihood
. -
genotypeLikelihood
: Genotype likelihoods, i.e. the probability of the observed read count distribution for each allele and taxa, given each possible ploidy and genotype. It is formatted as a list of the arrays. There is one array in the list for each possible ploidy, ignoring differences between auto and allopolyploidy. For each array, the first dimension represents allele copy number ranging from zero to the ploidy, the second dimension is taxa, and the third dimension is alleles. Added byAddGenotypeLikelihood
. -
priorProb
: Prior probabilities of genotypes, i.e. expected genotype frequencies in the population. This is formatted as a list, with one list item per possible ploidy, counting differences between auto and allopolyploid inheritance modes. ForAddGenotypePriorProb_Mapping2Parents
andAddGenotypePriorProb_HWE
: Each list item is a matrix, with allele copy number (from zero to the total ploidy) in rows, and alleles in columns. Each value is the probability of sampling an individual with that allele copy number from the population. ForAddGenotypePriorProb_ByTaxa
: Each list item is an array, with allele copy number in the first dimension, taxa in the second dimension, and alleles in the third dimension. Each value is the probability of sampling an individual with that allele copy number from the population local to the taxon. -
priorProbPloidies
: A list in the same format aspossiblePloidies
, and the same length aspriorProb
. Each item in the list is a vector indicating the inheritance mode for the corresponding matrix inpriorProb
. Added byAddGenotypePriorProb_Mapping2Parents
,AddGenotypePriorProb_HWE
, andAddGenotypePriorProb_ByTaxa
. -
ploidyLikelihood
: Likelihoods estimated for inheritance modes usingAddPloidyLikelihood
, likely to be removed from the package. -
ploidyChiSq
: Chi-squared values estimated for each inheritance mode and allele, stored in a matrix with inheritance mode in rows (same order aspriorProb
) and alleles in columns. Low values indicate that genotype likelihoods and prior probabilities are a good match. Added byAddPloidyChiSq
. -
ploidyChiSqP
: P-values derived fromploidyChiSq
, in a matrix of the same dimensions. Added byAddPloidyChiSq
. -
priorTimesLikelihood
: A list of arrays, with one list element for each element ofpriorProb
(each inheritance mode), and array dimensions identical togenotypeLikelihood
. Genotype priors multiplied by genotype likelihoods. Added byAddPriorTimesLikelihood
. May be eliminated in the future. -
posteriorProb
: Genotype posterior probabilities. A list of arrays, with one list element for each element ofpriorProb
(each inheritance mode), and array dimensions identical togenotypeLikelihood
, with allele copy number in the first dimension, taxa in the second dimension, and alleles in the third dimension. Values should range from zero to one. Added byAddGenotypePosteriorProb
. -
alleleFreqByTaxa
: Estimated allele frequencies for the local population to which each taxon belongs. Matrix with taxa in rows and alleles in columns, and values ranging from zero to one. Added byAddAlleleFreqByTaxa
. -
PCA
: A matrix of principal component analysis scores with taxa in rows and PC axes in columns. Added byAddPCA
. -
alleleLinkages
: A list with one item per allele in the dataset. Each item is a list of two vectors, with allele numbers in the first column and correlation coefficients in the second column, listing alleles that can be used for predicting the genotype at a given allele. Added byAddAlleleLinkages
. -
priorProbLD
: A list of arrays in the same dimensions asposteriorProb
; there is one array for each possible ploidy, and arrays have allele copy number in the first dimension, taxa in the second dimension, and alleles in the third dimension. These are prior genotype probabilities based on linked loci only. Added byAddGenotypePriorProb_LD
. -
likelyGeno_donor
andlikelyGeno_recurrent
: Matrices formatted like the output ofGetLikelyGen
; these have alleles in columns, and possible ploidies in rows, ignoring differences between auto and allopolyploid types. Rows are named by total ploidy. Numbers in the matrix indicate the likely allele copy number. These slots are added byAddGenotypePriorProb_Mapping2Parents
to indicate the likely genotypes of the two parents, after correction based on progeny allele frequencies. -
donorPloidies
andrecurrentPloidies
: Lists in the same format aspossiblePloidies
indicating the parent ploidy corresponding to each ploidy listed inpriorProbPloidies
. Added byAddGenotypePriorProb_Mapping2Parents
.
-
"taxa"
: A character vector listing all taxa names, in the same order as the rows ofalleleDepth
. Can be retrieved usingGetTaxa
. -
"nTaxa"
: An integer indicating the number of taxa in the dataset. Retrieved with thenTaxa
function. -
"nLoci"
: An integer indicating the number of loci inlocTable
. -
"contamRate"
: A number ranging from zero to one (although in practice probably less than 0.01) indicating the expected sample cross-contamination rate. Can later be edited usingSetContamRate
or retrieved usingGetContamRate
.
-
"donorParent"
: A character string indicating the name or the taxon that is the donor parent, if the dataset represents a mapping population. Added bySetDonorParent
, retrieved byGetDonorParent
. -
"recurrentParent"
: A character string indicating the name or the taxon that is the recurrent parent, if the dataset represents a mapping population. Added bySetRecurrentParent
, retrieved byGetRecurrentParent
. If no backcrossing took place, it does not matter which parent is listed as "donor" or "recurrent". -
"blankTaxa"
: A character vector indicating names of any taxa that were blanks, i.e. barcoded reactions to which no genomic DNA was added during library creation. These can be useful for estimating the contamination rate, and additionally many polyRAD functions will exclude from calculations taxa that are listed as blanks. Added bySetBlankTaxa
and retrieved byGetBlankTaxa
. -
"alleleFreqType"
: A character string indicating how allele frequencies were estimated. "mapping" and "HWE" are current possible values. Added byAlleleFreqHWE
andAlleleFreqMapping
. -
"priorType"
: How the prior genotype probabilities were calculated.AddGenotypePriorProb_Mapping2Parents
andAddGenotypePriorProb_HWE
record "population" for this attribute to indicate that probabilities were estimated on a population basis.AddGenotypePriorProb_ByTaxa
records "taxon" for this attribute to indicate that probabilities were estimated on an individual basis.