Skip to content

Commit 2a72e08

Browse files
authored
Merge pull request #42 from poseidon-framework/janno_column_names_and_order
changes for Poseidon 2.5.0
2 parents db03dc7 + 09199b6 commit 2a72e08

File tree

1 file changed

+45
-39
lines changed

1 file changed

+45
-39
lines changed

janno_columns.tsv

Lines changed: 45 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,45 @@
1-
janno_column_name description data_type multi choice range choice_options range_lower range_upper mandatory unique bonus
2-
Individual_ID id as defined by the genetics laboratory, needs to be unique (e.g. I1234, BOT001), needs to fit to the values in the poseidon package .fam file, if multiple datasets exist for the same individual different IDs are required (e.g. loschbour_snpAD) String FALSE FALSE FALSE TRUE TRUE FALSE
3-
Collection_ID id as defined by the provider/owner of a sample (e.g. grave 40 skeleton 2) String FALSE FALSE FALSE FALSE FALSE TRUE
4-
Source_Tissue skeletal/tissue/source elements, specific bone name should be reported with an underscore (e.g. bone_phalanx), multiple values separated by ; in case of multiple libraries String TRUE FALSE FALSE FALSE FALSE FALSE
5-
Country present-day political country String FALSE FALSE FALSE FALSE FALSE FALSE
6-
Location unspecified location information like administrative or topographic region or mountains/rivers/lakes/cities nearby String FALSE FALSE FALSE FALSE FALSE TRUE
7-
Site site name String FALSE FALSE FALSE FALSE FALSE FALSE
8-
Latitude latitude with up to 5 places after the decimal point Float FALSE FALSE TRUE -90 90 FALSE FALSE FALSE
9-
Longitude longitude with up to 5 places after the decimal point Float FALSE FALSE TRUE -180 180 FALSE FALSE FALSE
10-
Date_C14_Labnr labnr of C14 date, multiple values separated by ; in case of multiple dates String TRUE FALSE FALSE FALSE FALSE FALSE
11-
Date_C14_Uncal_BP uncalibrated years BP (as in before 1950AD), as reported by C14 labs, multiple values separated by ; in the same order as Date_C14_Labnr in case of multiple dates Integer TRUE FALSE TRUE 0 Inf FALSE FALSE FALSE
12-
Date_C14_Uncal_BP_Err standard deviation (1 sigma ±), as reported by C14 labs, multiple values separated by ; in the same order as Date_C14_Labnr in case of multiple dates Integer TRUE FALSE TRUE 0 Inf FALSE FALSE FALSE
13-
Date_BC_AD_Median calibrated median age for C14 dates, or simple mid-points for archaeological intervals, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE FALSE
14-
Date_BC_AD_Start lower (older) bound for the age, negative numbers for BC, positive numbers for AD, in case of C14 dates 95% interval post calibration, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE FALSE
15-
Date_BC_AD_Stop upper (more recent) bound for the age, negative numbers for BC, positive numbers for AD, in case of C14 dates 95% interval post calibration, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE FALSE
16-
Date_Type “C14“ if directly from the individual, “contextual“ if based on archaeology or other C14 dates from the site, “modern” for present-day individuals String FALSE TRUE FALSE C14;contextual;modern FALSE FALSE FALSE
17-
No_of_Libraries number of libraries Integer FALSE FALSE FALSE FALSE FALSE FALSE
18-
Data_Type specifics of data generation method, multiple values separated by ; String TRUE TRUE FALSE Shotgun;1240K;OtherCapture;ReferenceGenome FALSE FALSE FALSE
19-
Genotype_Ploidy ploidy of the genotypes String FALSE TRUE FALSE diploid;haploid FALSE FALSE FALSE
20-
Group_Name ideally Eisenmann rule + underscore flags, e.g. to annotate relatives or outliers or low coverage, multiple entries separated by ; to accommodate different labels, value must equal the group name in the .fam file (in case of multiple entries the first one) String TRUE FALSE FALSE TRUE FALSE FALSE
21-
Genetic_Sex “F“, “M“ or “U“ because eigenstrat and plink formats only support these three. Edge cases (XXY, XYY, X0) are undefined and should be grouped as F, M or U, with a note added Char FALSE TRUE FALSE F;M;U TRUE FALSE FALSE
22-
Nr_autosomal_SNPs number of autosomal SNPs covered for 1240K capture or SG data pulldown Integer FALSE FALSE FALSE FALSE FALSE FALSE
23-
Coverage_1240K average X-fold coverage across 1240K SNP sites after quality filtering (internal data), NOT the % SNPs of 1.2M possible Float FALSE FALSE FALSE FALSE FALSE FALSE
24-
MT_Haplogroup mitochondrial haplogroup after phylotree.org as reported by Haplofind or Haplogrep String FALSE FALSE FALSE FALSE FALSE FALSE
25-
Y_Haplogroup Y-chromosome haplogroup reported as published, for internal data, please follow syntax with main branch + most terminal derived Y-SNP (e.g. R1b-P312) String FALSE FALSE FALSE FALSE FALSE FALSE
26-
Endogenous % endogenous DNA as estimated from SG libraries (before capture), as for example estimated by EAGER, not on target and no quality filter, in case of multiple libraries report only the highest value Float FALSE FALSE TRUE 0 100 FALSE FALSE FALSE
27-
UDG “mixed” in case multiple libraries with different UDG treatment were merged String FALSE TRUE FALSE minus;half;plus;mixed FALSE FALSE FALSE
28-
Library_Built “ds” for double stranded, “ss” for single stranded, “mixed” in case multiple libraries with different protocols were merged String FALSE TRUE FALSE ds;ss;other FALSE FALSE FALSE
29-
Damage % damage on 5' end for the main shotgun library used for sequencing and/or capture, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 100 FALSE FALSE FALSE
30-
Xcontam if male for captured library, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 1 FALSE FALSE FALSE
31-
Xcontam_stderr standard error of ANGSD X contamination estimate, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 Inf FALSE FALSE FALSE
32-
mtContam mitochondrial contamination rate as estimated by ContamMix and/or Schmutzi, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 1 FALSE FALSE FALSE
33-
mtContam_stderr Standard error of ContamMix/Schmutzi estimate, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 Inf FALSE FALSE FALSE
34-
Genetic_Source_Accession_IDs ENA or SRA Accession ID(s) pointing to the source data used to generate the genotyping data. If multiple are given they should be arranged by descending specificity (e.g. project id > sample id > sequencing run id). String TRUE FALSE FALSE FALSE FALSE FALSE
35-
Data_Preparation_Pipeline_URL URL pointing to a description of the pipeline used to generate the genotype data from the source data String FALSE FALSE FALSE FALSE FALSE FALSE
36-
Primary_Contact Project lead or first author String FALSE FALSE FALSE FALSE FALSE FALSE
37-
Publication_Status bibtex key (e.g. “AuthorJournalYear“) or “unpublished“ String TRUE FALSE FALSE FALSE FALSE FALSE
38-
Note wildcard comments. e.g. note down aneuploidies here String FALSE FALSE FALSE FALSE FALSE TRUE
39-
Keywords Arbitrary tags separated by ; (e.g. for custom sorting purposes) String TRUE FALSE FALSE FALSE FALSE TRUE
1+
janno_column_name description data_type multi choice range choice_options range_lower range_upper mandatory unique
2+
Poseidon_ID id as defined by the genetics laboratory, needs to be unique (e.g. I1234, BOT001), needs to fit to the values in the poseidon package .fam file, if multiple datasets exist for the same individual different IDs are required (e.g. loschbour_snpAD) String FALSE FALSE FALSE TRUE TRUE
3+
Genetic_Sex “F“, “M“ or “U“ because eigenstrat and plink formats only support these three, edge cases (XXY, XYY, X0) are undefined and should be grouped as F, M or U, with a note added Char FALSE TRUE FALSE F;M;U TRUE FALSE
4+
Group_Name ideally Eisenmann rule + underscore flags, e.g. to annotate relatives or outliers or low coverage, multiple entries separated by ; to accommodate different labels, value must equal the group name in the .fam file (in case of multiple entries the first one) String TRUE FALSE FALSE TRUE FALSE
5+
Alternative_IDs other identifiers for the same individual, e.g. IDs in other databases or popular names (e.g. Ötzi/Iceman) String TRUE FALSE FALSE FALSE FALSE
6+
Relation_To other individuals (by Poseidon_ID) that are related/identical to this individual, multiple entries separated by ; String TRUE FALSE FALSE FALSE FALSE
7+
Relation_Degree relationship degree for relatives mentioned in Related_To, multiple values separated by ; in the same order as Related_To in case of multiple relations String TRUE TRUE FALSE identical;first;second;thirdToFifth;sixthToTenth;unrelated;other FALSE FALSE
8+
Relation_Type relationship type for relatives mentioned in Related_To as an arbitrary string (e.g. sister_of, child_of, nephew_of, ...), multiple values separated by ; in the same order as Related_To in case of multiple relations String TRUE FALSE FALSE FALSE FALSE
9+
Relation_Note arbitrary comments about the relations of this individual String FALSE FALSE FALSE FALSE FALSE
10+
Collection_ID id as defined by the provider/owner of a sample (e.g. grave 40 skeleton 2) String FALSE FALSE FALSE FALSE FALSE
11+
Country present-day political country String FALSE FALSE FALSE FALSE FALSE
12+
Location unspecified location information like administrative or topographic region or mountains/rivers/lakes/cities nearby String FALSE FALSE FALSE FALSE FALSE
13+
Site site name String FALSE FALSE FALSE FALSE FALSE
14+
Latitude latitude with up to 5 places after the decimal point Float FALSE FALSE TRUE -90 90 FALSE FALSE
15+
Longitude longitude with up to 5 places after the decimal point Float FALSE FALSE TRUE -180 180 FALSE FALSE
16+
Date_Type “C14“ if directly from the individual, “contextual“ if based on archaeology or other C14 dates from the site, “modern” for present-day individuals String FALSE TRUE FALSE C14;contextual;modern FALSE FALSE
17+
Date_C14_Labnr labnr of C14 date, multiple values separated by ; in case of multiple dates String TRUE FALSE FALSE FALSE FALSE
18+
Date_C14_Uncal_BP uncalibrated years BP (as in before 1950AD), as reported by C14 labs, multiple values separated by ; in the same order as Date_C14_Labnr in case of multiple dates Integer TRUE FALSE TRUE 0 Inf FALSE FALSE
19+
Date_C14_Uncal_BP_Err standard deviation (1 sigma ±), as reported by C14 labs, multiple values separated by ; in the same order as Date_C14_Labnr in case of multiple dates Integer TRUE FALSE TRUE 0 Inf FALSE FALSE
20+
Date_BC_AD_Start lower (older) bound for the age, negative numbers for BC, positive numbers for AD, in case of C14 dates 95% interval post calibration, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE
21+
Date_BC_AD_Median calibrated median age for C14 dates, or simple mid-points for archaeological intervals, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE
22+
Date_BC_AD_Stop upper (more recent) bound for the age, negative numbers for BC, positive numbers for AD, in case of C14 dates 95% interval post calibration, 2000 for modern samples Integer FALSE FALSE TRUE -Inf 2050 FALSE FALSE
23+
Date_Note a free text field for arbitrary comments about the dating information String FALSE FALSE FALSE FALSE FALSE
24+
MT_Haplogroup mitochondrial haplogroup after phylotree.org as reported by Haplofind or Haplogrep String FALSE FALSE FALSE FALSE FALSE
25+
Y_Haplogroup Y-chromosome haplogroup reported as published, for internal data, please follow syntax with main branch + most terminal derived Y-SNP (e.g. R1b-P312) String FALSE FALSE FALSE FALSE FALSE
26+
Source_Tissue skeletal/tissue/source elements, specific bone name should be reported with an underscore (e.g. bone_phalanx), multiple values separated by ; in case of multiple libraries String TRUE FALSE FALSE FALSE FALSE
27+
Nr_Libraries number of libraries Integer FALSE FALSE FALSE FALSE FALSE
28+
Capture_Type specifics of data generation method, multiple values separated by ; String TRUE TRUE FALSE Shotgun;1240K;OtherCapture;ReferenceGenome FALSE FALSE
29+
UDG “mixed” in case multiple libraries with different UDG treatment were merged String FALSE TRUE FALSE minus;half;plus;mixed FALSE FALSE
30+
Library_Built “ds” for double stranded, “ss” for single stranded, “mixed” in case multiple libraries with different protocols were merged String FALSE TRUE FALSE ds;ss;other FALSE FALSE
31+
Genotype_Ploidy ploidy of the genotypes String FALSE TRUE FALSE diploid;haploid FALSE FALSE
32+
Data_Preparation_Pipeline_URL URL pointing to a description of the pipeline used to generate the genotype data from the source data String FALSE FALSE FALSE FALSE FALSE
33+
Endogenous % endogenous DNA as estimated from SG libraries (before capture), as for example estimated by EAGER, not on target and no quality filter, in case of multiple libraries report only the highest value Float FALSE FALSE TRUE 0 100 FALSE FALSE
34+
Nr_SNPs number of SNPs covered Integer FALSE FALSE FALSE FALSE FALSE
35+
Coverage_on_Target_SNPs average X-fold coverage across targeted SNP sites after quality filtering (internal data) Float FALSE FALSE FALSE FALSE FALSE
36+
Damage % damage on 5' end for the main shotgun library used for sequencing and/or capture, in case of multiple libraries report a value from the merged read alignment Float FALSE FALSE TRUE 0 100 FALSE FALSE
37+
Contamination (modern) contamination as measured by the method in Contamination_Measure, multiple values can be separated by ;, Contamination_Err, Contamination_Meas and Contamination_Note must have the same number and order of entries, in case of multiple libraries report a value from the merged read alignment String TRUE FALSE FALSE FALSE FALSE
38+
Contamination_Err (modern) contamination estimate error String TRUE FALSE FALSE FALSE FALSE
39+
Contamination_Meas method to measure contamination, should be a software tool (ANGSD, Schmutzi, …) and the respective software versions, details should go to Contamination_Note String TRUE FALSE FALSE FALSE FALSE
40+
Contamination_Note arbitrary comments about the contamination estimate String FALSE FALSE FALSE FALSE FALSE
41+
Genetic_Source_Accession_IDs ENA or SRA Accession ID(s) pointing to the source data used to generate the genotyping data, if multiple are given they should be arranged by descending specificity (e.g. project id > sample id > sequencing run id) String TRUE FALSE FALSE FALSE FALSE
42+
Primary_Contact Project lead or first author String FALSE FALSE FALSE FALSE FALSE
43+
Publication bibtex key (e.g. “AuthorJournalYear“) or “unpublished“ String TRUE FALSE FALSE FALSE FALSE
44+
Note wildcard comments, e.g. note down aneuploidies here String FALSE FALSE FALSE FALSE FALSE
45+
Keywords arbitrary tags separated by ; (e.g. for custom sorting purposes) String TRUE FALSE FALSE FALSE FALSE

0 commit comments

Comments
 (0)