Skip to content

Commit 2b30444

Browse files
committed
added column 'bonus' to janno column definition file to express the concept of less important columns that should not spark a warning in the validator if missing completely for a dataset
1 parent e11c316 commit 2b30444

File tree

1 file changed

+37
-37
lines changed

1 file changed

+37
-37
lines changed

janno_columns.tsv

Lines changed: 37 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,37 @@
1-
janno_column_name description data_type multi choice range choice_options range_lower range_upper mandatory unique
2-
Individual_ID id as defined by the genetics laboratory, needs to be unique (e.g. I1234, BOT001), needs to fit to the values in the poseidon package .fam file, if multiple datasets exist for the same individual different IDs are required (e.g. loschbour_snpAD) String FALSE FALSE FALSE TRUE TRUE
3-
Collection_ID id as defined by the provider/owner of a sample (e.g. grave 40 skeleton 2) String FALSE FALSE FALSE FALSE FALSE
4-
Source_Tissue skeletal/tissue/source elements, specific bone name should be reported with an underscore (e.g. bone_phalanx), multiple values separated by ; in case of multiple libraries String TRUE FALSE FALSE FALSE FALSE
5-
Country present-day political country String FALSE FALSE FALSE FALSE FALSE
6-
Location unspecified location information like administrative or topographic region or mountains/rivers/lakes/cities nearby String FALSE FALSE FALSE FALSE FALSE
7-
Site site name String FALSE FALSE FALSE FALSE FALSE
8-
Latitude latitude with up to 5 places after the decimal point Float FALSE FALSE TRUE -90 90 FALSE FALSE
9-
Longitude longitude with up to 5 places after the decimal point Float FALSE FALSE TRUE -180 180 FALSE FALSE
10-
Date_C14_Labnr labnr of C14 date, multiple values separated by ; in case of multiple dates String TRUE FALSE FALSE FALSE FALSE
11-
Date_C14_Uncal_BP uncalibrated years BP (as in before 1950AD), as reported by C14 labs, multiple values separated by ; in the same order as Date_C14_Labnr in case of multiple dates Integer TRUE FALSE TRUE 0 Inf FALSE FALSE
12-
Date_C14_Uncal_BP_Err standard deviation (1 sigma ±), as reported by C14 labs, multiple values separated by ; in the same order as Date_C14_Labnr in case of multiple dates Integer TRUE FALSE TRUE 0 Inf FALSE FALSE
13-
Date_BC_AD_Median calibrated median age for C14 dates, or simple mid-points for archaeological intervals, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE
14-
Date_BC_AD_Start lower (older) bound for the age, negative numbers for BC, positive numbers for AD, in case of C14 dates 95% interval post calibration, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE
15-
Date_BC_AD_Stop upper (more recent) bound for the age, negative numbers for BC, positive numbers for AD, in case of C14 dates 95% interval post calibration, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE
16-
Date_Type “C14“ if directly from the individual, “contextual“ if based on archaeology or other C14 dates from the site, “modern” for present-day individuals String FALSE TRUE FALSE C14;contextual;modern FALSE FALSE
17-
No_of_Libraries number of libraries Integer FALSE FALSE FALSE FALSE FALSE
18-
Data_Type specifics of data generation method, multiple values separated by ; String TRUE TRUE FALSE Shotgun;1240K;OtherCapture;ReferenceGenome FALSE FALSE
19-
Genotype_Ploidy ploidy of the genotypes String FALSE TRUE FALSE diploid;haploid FALSE FALSE
20-
Group_Name ideally Eisenmann rule + underscore flags, e.g. to annotate relatives or outliers or low coverage, multiple entries separated by ; to accommodate different labels, value must equal the group name in the .fam file (in case of multiple entries the first one) String TRUE FALSE FALSE TRUE FALSE
21-
Genetic_Sex “F“, “M“ or “U“ because eigenstrat and plink formats only support these three. Edge cases (XXY, XYY, X0) are undefined and should be grouped as F, M or U, with a note added Char FALSE TRUE FALSE F;M;U TRUE FALSE
22-
Nr_autosomal_SNPs number of autosomal SNPs covered for 1240K capture or SG data pulldown Integer FALSE FALSE FALSE FALSE FALSE
23-
Coverage_1240K average X-fold coverage across 1240K SNP sites after quality filtering (internal data), NOT the % SNPs of 1.2M possible Float FALSE FALSE FALSE FALSE FALSE
24-
MT_Haplogroup mitochondrial haplogroup after phylotree.org as reported by Haplofind or Haplogrep String FALSE FALSE FALSE FALSE FALSE
25-
Y_Haplogroup Y-chromosome haplogroup reported as published, for internal data, please follow syntax with main branch + most terminal derived Y-SNP (e.g. R1b-P312) String FALSE FALSE FALSE FALSE FALSE
26-
Endogenous % endogenous DNA as estimated from SG libraries (before capture), as for example estimated by EAGER, not on target and no quality filter, in case of multiple libraries report only the highest value Float FALSE FALSE TRUE 0 100 FALSE FALSE
27-
UDG “mixed” in case multiple libraries with different UDG treatment were merged String FALSE TRUE FALSE minus;half;plus;mixed FALSE FALSE
28-
Library_Built “ds” for double stranded, “ss” for single stranded, “mixed” in case multiple libraries with different protocols were merged String FALSE TRUE FALSE ds;ss;other FALSE FALSE
29-
Damage % damage on 5' end for the main shotgun library used for sequencing and/or capture, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 100 FALSE FALSE
30-
Xcontam if male for captured library, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 1 FALSE FALSE
31-
Xcontam_stderr standard error of ANGSD X contamination estimate, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 Inf FALSE FALSE
32-
mtContam mitochondrial contamination rate as estimated by ContamMix and/or Schmutzi, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 1 FALSE FALSE
33-
mtContam_stderr Standard error of ContamMix/Schmutzi estimate, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 Inf FALSE FALSE
34-
Primary_Contact Project lead or first author String FALSE FALSE FALSE FALSE FALSE
35-
Publication_Status bibtex key (e.g. “AuthorJournalYear“) or “unpublished“ String FALSE FALSE FALSE FALSE FALSE
36-
Note wildcard comments. e.g. note down aneuploidies here String FALSE FALSE FALSE FALSE FALSE
37-
Keywords Arbitrary tags separated by ; (e.g. for custom sorting purposes) String TRUE FALSE FALSE FALSE FALSE
1+
janno_column_name description data_type multi choice range choice_options range_lower range_upper mandatory unique bonus
2+
Individual_ID id as defined by the genetics laboratory, needs to be unique (e.g. I1234, BOT001), needs to fit to the values in the poseidon package .fam file, if multiple datasets exist for the same individual different IDs are required (e.g. loschbour_snpAD) String FALSE FALSE FALSE TRUE TRUE FALSE
3+
Collection_ID id as defined by the provider/owner of a sample (e.g. grave 40 skeleton 2) String FALSE FALSE FALSE FALSE FALSE TRUE
4+
Source_Tissue skeletal/tissue/source elements, specific bone name should be reported with an underscore (e.g. bone_phalanx), multiple values separated by ; in case of multiple libraries String TRUE FALSE FALSE FALSE FALSE FALSE
5+
Country present-day political country String FALSE FALSE FALSE FALSE FALSE FALSE
6+
Location unspecified location information like administrative or topographic region or mountains/rivers/lakes/cities nearby String FALSE FALSE FALSE FALSE FALSE TRUE
7+
Site site name String FALSE FALSE FALSE FALSE FALSE FALSE
8+
Latitude latitude with up to 5 places after the decimal point Float FALSE FALSE TRUE -90 90 FALSE FALSE FALSE
9+
Longitude longitude with up to 5 places after the decimal point Float FALSE FALSE TRUE -180 180 FALSE FALSE FALSE
10+
Date_C14_Labnr labnr of C14 date, multiple values separated by ; in case of multiple dates String TRUE FALSE FALSE FALSE FALSE FALSE
11+
Date_C14_Uncal_BP uncalibrated years BP (as in before 1950AD), as reported by C14 labs, multiple values separated by ; in the same order as Date_C14_Labnr in case of multiple dates Integer TRUE FALSE TRUE 0 Inf FALSE FALSE FALSE
12+
Date_C14_Uncal_BP_Err standard deviation (1 sigma ±), as reported by C14 labs, multiple values separated by ; in the same order as Date_C14_Labnr in case of multiple dates Integer TRUE FALSE TRUE 0 Inf FALSE FALSE FALSE
13+
Date_BC_AD_Median calibrated median age for C14 dates, or simple mid-points for archaeological intervals, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE FALSE
14+
Date_BC_AD_Start lower (older) bound for the age, negative numbers for BC, positive numbers for AD, in case of C14 dates 95% interval post calibration, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE FALSE
15+
Date_BC_AD_Stop upper (more recent) bound for the age, negative numbers for BC, positive numbers for AD, in case of C14 dates 95% interval post calibration, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE FALSE
16+
Date_Type “C14“ if directly from the individual, “contextual“ if based on archaeology or other C14 dates from the site, “modern” for present-day individuals String FALSE TRUE FALSE C14;contextual;modern FALSE FALSE FALSE
17+
No_of_Libraries number of libraries Integer FALSE FALSE FALSE FALSE FALSE FALSE
18+
Data_Type specifics of data generation method, multiple values separated by ; String TRUE TRUE FALSE Shotgun;1240K;OtherCapture;ReferenceGenome FALSE FALSE FALSE
19+
Genotype_Ploidy ploidy of the genotypes String FALSE TRUE FALSE diploid;haploid FALSE FALSE FALSE
20+
Group_Name ideally Eisenmann rule + underscore flags, e.g. to annotate relatives or outliers or low coverage, multiple entries separated by ; to accommodate different labels, value must equal the group name in the .fam file (in case of multiple entries the first one) String TRUE FALSE FALSE TRUE FALSE FALSE
21+
Genetic_Sex “F“, “M“ or “U“ because eigenstrat and plink formats only support these three. Edge cases (XXY, XYY, X0) are undefined and should be grouped as F, M or U, with a note added Char FALSE TRUE FALSE F;M;U TRUE FALSE FALSE
22+
Nr_autosomal_SNPs number of autosomal SNPs covered for 1240K capture or SG data pulldown Integer FALSE FALSE FALSE FALSE FALSE FALSE
23+
Coverage_1240K average X-fold coverage across 1240K SNP sites after quality filtering (internal data), NOT the % SNPs of 1.2M possible Float FALSE FALSE FALSE FALSE FALSE FALSE
24+
MT_Haplogroup mitochondrial haplogroup after phylotree.org as reported by Haplofind or Haplogrep String FALSE FALSE FALSE FALSE FALSE FALSE
25+
Y_Haplogroup Y-chromosome haplogroup reported as published, for internal data, please follow syntax with main branch + most terminal derived Y-SNP (e.g. R1b-P312) String FALSE FALSE FALSE FALSE FALSE FALSE
26+
Endogenous % endogenous DNA as estimated from SG libraries (before capture), as for example estimated by EAGER, not on target and no quality filter, in case of multiple libraries report only the highest value Float FALSE FALSE TRUE 0 100 FALSE FALSE FALSE
27+
UDG “mixed” in case multiple libraries with different UDG treatment were merged String FALSE TRUE FALSE minus;half;plus;mixed FALSE FALSE FALSE
28+
Library_Built “ds” for double stranded, “ss” for single stranded, “mixed” in case multiple libraries with different protocols were merged String FALSE TRUE FALSE ds;ss;other FALSE FALSE FALSE
29+
Damage % damage on 5' end for the main shotgun library used for sequencing and/or capture, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 100 FALSE FALSE FALSE
30+
Xcontam if male for captured library, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 1 FALSE FALSE FALSE
31+
Xcontam_stderr standard error of ANGSD X contamination estimate, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 Inf FALSE FALSE FALSE
32+
mtContam mitochondrial contamination rate as estimated by ContamMix and/or Schmutzi, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 1 FALSE FALSE FALSE
33+
mtContam_stderr Standard error of ContamMix/Schmutzi estimate, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 Inf FALSE FALSE FALSE
34+
Primary_Contact Project lead or first author String FALSE FALSE FALSE FALSE FALSE FALSE
35+
Publication_Status bibtex key (e.g. “AuthorJournalYear“) or “unpublished“ String FALSE FALSE FALSE FALSE FALSE FALSE
36+
Note wildcard comments. e.g. note down aneuploidies here String FALSE FALSE FALSE FALSE FALSE TRUE
37+
Keywords Arbitrary tags separated by ; (e.g. for custom sorting purposes) String TRUE FALSE FALSE FALSE FALSE TRUE

0 commit comments

Comments
 (0)