You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
poseidonVersion0poseidon v.2 package version (e.g. 2.0.1)StringTRUETRUE
2
+
poseidonVersion0Poseidon package format version (e.g. 2.0.1)StringTRUETRUE
3
3
title0title of the packageStringTRUETRUE
4
4
description0some descriptive words about the packageStringFALSETRUE
5
-
contributor0list of contributors to the package (not the data producer/publication author, but the poseidon package creator), each with name and emailArrayTRUETRUE
5
+
contributor0list of contributors to the package (not the data producer/publication author, but the Poseidon package creator), each with name and emailArrayTRUETRUE
6
6
name1contributorname of one contributorStringTRUEFALSE
7
7
email1contributoremail of one contributor (must be a valid email address)StringEmailTRUEFALSE
8
-
packageVersion0version of the package (should be changed/incremented when the package is changed)StringFALSETRUE
9
-
lastModified0date of last modification of the poseidon package (should be updated when the package is changed)DateYYYY-MM-DDFALSETRUE
10
-
bibFile0file name (.bib)StringPathFALSETRUE
8
+
packageVersion0version of the package (should be changed/incremented when the package is changed)StringTRUETRUE
9
+
lastModified0date of last modification of the Poseidon package (should be updated when the package is changed)DateYYYY-MM-DDTRUETRUE
11
10
genotypeData0genotype file name sectionTRUETRUE
12
-
format1genotypeDatafile format definition, only allows PLINK right nowStringTRUETRUE
13
-
genoFile1genotypeDatafile name (.bed)StringPathTRUETRUE
14
-
snpFile1genotypeDatafile name (.bim)StringPathTRUETRUE
15
-
indFile1genotypeDatafile name (.famStringPathTRUETRUE
16
-
jannoFile0file name (.janno)StringPathTRUETRUE
11
+
format1genotypeDatafile format definition, allows EIGENSTRAT and PLINKStringTRUETRUE
12
+
genoFile1genotypeDatarelative path to genoFileStringPathTRUETRUE
13
+
genoFileChkSum1genotypeDatamd5 checksum of the genoFileStringFALSETRUE
14
+
snpFile1genotypeDatarelative path to snpFileStringPathTRUETRUE
15
+
snpFileChkSum1genotypeDatamd5 checksum of the snpFileStringFALSETRUE
16
+
indFile1genotypeDatarelative path to indFileStringPathTRUETRUE
17
+
indFileChkSum1genotypeDatamd5 checksum of the indFileStringFALSETRUE
18
+
jannoFile0relative path to jannoFileStringPathFALSETRUE
19
+
jannoFileChkSum1genotypeDatamd5 checksum of the jannoFileStringFALSETRUE
20
+
bibFile0relative path to bibFileStringPathFALSETRUE
21
+
bibFileChkSum1genotypeDatamd5 checksum of the bibFileStringFALSETRUE
22
+
readmeFile0relative path to readmeFileStringPathFALSETRUE
23
+
changelogFile0relative path to changelogFileStringPathFALSETRUE
Poseidon v.2 is a solution for genotype data organisation established within the Department of Archaeogenetics at the Max Planck Institute for the Science of Human History (MPI-SHH) in Jena.
3
+
Poseidon is a solution for genotype data organisation established within the Department of Archaeogenetics at the Max Planck Institute for the Science of Human History (MPI-SHH) in Jena.
The `POSEIDON.yml` file lists metainformation in a standardized, machine-readable format.
47
+
The `POSEIDON.yml` file lists relative file paths and metainformation in a standardized, machine-readable format.
51
48
52
-
-The `POSEIDON.yml` file must be a valid [YAML file](https://yaml.org/).
53
-
-The fields of the `POSEIDON.yml` file are documented in the [POSEIDON_yml_fields.tsv file](https://github.com/poseidon-framework/poseidon2-schema/blob/master/POSEIDON_yml_fields.tsv) in this repository.
49
+
-It must be a valid [YAML file](https://yaml.org/).
50
+
-Its fields of the `POSEIDON.yml` file are documented in the [POSEIDON_yml_fields.tsv file](https://github.com/poseidon-framework/poseidon2-schema/blob/master/POSEIDON_yml_fields.tsv) in this repository.
54
51
55
52
Example:
56
53
57
54
```
58
-
poseidonVersion: 2.0.1
59
-
title: Schiffels_2016
60
-
description: Genetic data published in Schiffels et al. 2016
55
+
poseidonVersion: 2.0.2
56
+
title: Switzerland_LNBA_Roswita
57
+
description: LNBA Switzerland genetic data not yet published # optional
When a package is modified in any way (e.g. updates of the context information in the `.janno` file), then the `packageVersion` field should be incremented and the `lastModified` field updated to the current date.
78
82
79
-
### The `X.janno` file [mandatory]
80
-
81
-
The `.janno` file is a UTF-8 encoded, tab-separated text file with a header line. It holds a clearly defined set of context information (columns) for each sample (rows) in a package.
82
-
83
-
- The variables (columns), variable types and possible content of the janno file are documented in the [janno_columns.tsv file](https://github.com/poseidon-framework/poseidon2-schema/blob/master/janno_columns.tsv) in this repository.
84
-
- A `.janno` file must have all of these columns in exactly this order with exactly these column names.
85
-
- If information is unknown or a variable does not apply for a certain sample, then the respective cell(s) can be filled with the NULL value `n/a`. Ideally, a `.janno` file should have the least number of n/a-values possible.
86
-
- The order of the samples (rows) in the `.janno` file must be equal to the order in the files that hold the genetic data.
87
-
- The values in the columns **Individual_ID** and **Group_Name** must be equal to the terms used in the first and second column of the `.fam` file.
88
-
- Multiple columns of the `.janno` file are list columns that hold multiple values (either strings or numerics) separated by `;`
89
-
- The decimal separator for all floating point numbers is `.`
83
+
### Genotype data
90
84
91
-
### The `X.bed`, `X.bim`, `X.fam` files [mandatory]
85
+
Genotype data in Poseidon packages is stored either in PLINK (binary) or EIGENSTRAT format.
The README.txt file contains arbitrary, human-readable information.
98
-
99
-
Example:
95
+
The `.janno` file is a tab-separated text file with a header line. It holds a clearly defined set of context information (columns) for each sample (rows) in a package.
100
96
101
-
```
102
-
This package contains a rather interesting set of samples.
103
-
@Uebertruplf_2021 even claimed that they are the most important for this particular area and time period.
104
-
```
97
+
- The variables (columns), variable types and possible content of the janno file are documented in the [janno_columns.tsv file](https://github.com/poseidon-framework/poseidon2-schema/blob/master/janno_columns.tsv) in this repository.
98
+
- A `.janno` file must have all of these columns in exactly this order with exactly these column names.
99
+
- If information is unknown or a variable does not apply for a certain sample, then the respective cell(s) can be filled with the NULL value `n/a`.
100
+
- The order of the samples (rows) in the `.janno` file must be equal to the order in the files that hold the genetic data.
101
+
- The values in the columns **Individual_ID** and **Group_Name** must be equal to the terms used in the genetic data files.
102
+
- Multiple columns of the `.janno` file are list columns that hold multiple values (either strings or numerics) separated by `;`.
103
+
- The decimal separator for all floating point numbers is `.`.
105
104
106
-
### The `CHANGELOG.txt` file[optional]
105
+
### The `.bib` file
107
106
108
-
Documentation of important changes in the history of a package.
107
+
[BibTeX](http://www.bibtex.org/) file with all references listed in the `.janno` file. The bibtex keys must fit to ones used in the `.janno` file.
109
108
110
109
Example:
111
110
112
111
```
113
-
- 2021_10_01: Fixed a spelling mistake in the site name "Hosenacker"->"Rosenacker".
114
-
- 2021_05_05: The authors of @Gassenhauer_2021 made some previously restricted samples for their publication available later and we added them.
publisher = {Proceedings of the National Academy of Sciences},
118
+
volume = {113},
119
+
number = {2},
120
+
pages = {368--373},
121
+
author = {Lara M. Cassidy and Rui Martiniano and Eileen M. Murphy and Matthew D. Teasdale and James Mallory and Barrie Hartwell and Daniel G. Bradley},
122
+
title = {Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome},
123
+
journal = {Proceedings of the National Academy of Sciences}
124
+
}
116
125
```
117
126
118
-
### The `LITERATURE.bib` file [optional]
119
-
120
-
Bibtex file with all references mentioned in `POSEIDON.yml`, `README.txt` and `CHANGELOG.txt`
121
-
122
-
***
123
-
124
-
## Naming Poseidon v.2 `package`s
127
+
### The `README.txt` file
125
128
126
-
The naming of packages should follow a simple scheme:
129
+
Informal information accompanying the package.
127
130
128
-
Ancient published: YEAR_NAME_IDENTIFIER
129
-
130
-
```
131
-
2018_Lamnidis_Fennoscandia
132
-
2019_Wang_Caucasus
133
-
2019_Flegontov_PaleoEskimo
134
-
```
135
-
136
-
Ancient unpublished: IDENTIFIER_NAME
131
+
Example:
137
132
138
133
```
139
-
Switzerland_LNBA_Roswita
140
-
Italy_Mesolithic_Paul
141
-
SouthEastAsia_Simon
134
+
This package contains a rather interesting set of samples relevant for the peopling of the Territory of Christmas Island in the Indian Ocean. We consider this especially relevant, because ...
142
135
```
143
136
144
-
Modern published: YEAR_(NAME)_IDENTIFIER
137
+
### The `CHANGELOG.txt` file
145
138
146
-
```
147
-
2015_1000_Genomes-1240K_haploid_pulldown
148
-
2016_Mallick_SGDP1240K_diploid_pulldown
149
-
2014_Lazaridis_HOmodern
150
-
2016_Lazaridis_HOmodern
151
-
2019_Flegontov_HO_NewSiberian
152
-
2018_Lipson_SEA
153
-
```
139
+
Documentation of important changes in the history of a package.
154
140
155
-
Modern unpublished: IDENTIFIER_NAME
141
+
Example:
156
142
157
143
```
158
-
Eurasia_newHO_Paul
159
-
Afrika_newHO_Andrea
160
-
```
161
-
162
-
Identifiers can be somewhat informal as long as the project is ongoing, they just need to be unique. As soon as a project gets published, we create a final version of the respective package with the YEAR_NAME_IDENTIFIER label.
163
-
164
-
External projects can be integrated similarly by using their publication name, or by temporary internal identifiers such as `Iron_Age_Boston_Share`.
144
+
## 1.2.0
145
+
- Fixed a spelling mistake in the site name "Hosenacker"->"Rosenacker".
165
146
166
-
***
147
+
## 1.1.1
148
+
- Added mtDNA contamination estimation to .janno file
167
149
168
-
## DAG internal procedures
150
+
## 1.1.0
151
+
- The authors of @Gassenhauer_2021 made some previously restricted samples for their publication available later and we added them.
169
152
170
-
Individual contributors would create packages in dedicated poseidon folders in their user project directories, e.g. `/project1/user/xyz/poseidon/2018_Lamnidis_Fennoscandia`. That way, subfolders belong to individual maintainers and be writable only by them.
171
-
172
-
The poseidon admins would then link these packages into the official `/projects1/poseidon` repo, located on the HPC storage unit of the MPI-SHH, where we distinguish ancient and modern genotype data:
0 commit comments