Skip to content

Adding Clearinghouse documentation #175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
d5e6b61
added tag_querying documentation for the new tagging functionality.
Woolly-at-EBI Oct 13, 2023
8d30ebe
ignoring ../.vscode/
Woolly-at-EBI Oct 17, 2023
e93de1b
Created initial markdown for the sample checklist introductions and b…
Woolly-at-EBI Jan 16, 2024
e6a1f9f
Created initial markdown for the sample checklist MIxS_V6.2 update
Woolly-at-EBI Jan 16, 2024
60cb778
added sample checklists docs
Woolly-at-EBI Jan 16, 2024
95578cd
Multiple edits to improve the clarity.
Woolly-at-EBI Jan 17, 2024
aba8bed
fixed typos and formatting errors
Woolly-at-EBI Jan 17, 2024
3c22989
fixed table
Woolly-at-EBI Jan 17, 2024
c8075e0
initial clearinghouse_submission_template.sh
Woolly-at-EBI Jan 19, 2024
6213e65
initial clearinghouse_rtd
Woolly-at-EBI Jan 19, 2024
2c43168
added images and use cases to clearinghouse_rtd
Woolly-at-EBI Jan 19, 2024
cc4c761
Update clearinghouse_for_ENA_users.md
z-w123 Jan 26, 2024
6ac19aa
Update clearinghouse_for_ENA_users.md
z-w123 Jan 26, 2024
7f1afe0
Update clearinghouse_for_ENA_users.md
z-w123 Jan 26, 2024
b5a357a
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
6bfa430
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
16ef64e
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
9eff519
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
db90635
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
1a5c86b
Add files via upload
z-w123 Jan 30, 2024
e54f24d
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
81bd4a9
update covid curation image
z-w123 Jan 30, 2024
5614406
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
ea8fbd8
Add files via upload
z-w123 Jan 30, 2024
a19c123
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
94f8f5c
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
5f80fd8
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
0e00543
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
d3d2fad
Update clearinghouse_for_ENA_users.md
z-w123 Jan 30, 2024
fd23f3a
Update clearinghouse_for_ENA_users.md
z-w123 Jan 31, 2024
fe74e90
Update clearinghouse_for_ENA_users.md
z-w123 Jan 31, 2024
097a1cd
Update clearinghouse_for_ENA_users.md
z-w123 Jan 31, 2024
3878ec4
Update clearinghouse_for_ENA_users.md
z-w123 Jan 31, 2024
86148fe
Merge pull request #1 from z-w123/patch-5
Woolly-at-EBI Feb 2, 2024
53580fd
Update clearinghouse_submission_template.sh
Woolly-at-EBI Feb 2, 2024
a330123
Add files via upload
z-w123 Feb 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
.idea
.DS_Store
.vscode/
5 changes: 5 additions & 0 deletions retrieval/programmatic-access/tag_querying.rst
Original file line number Diff line number Diff line change
Expand Up @@ -258,3 +258,8 @@ Miscellaneous
The tags are all less than 21 Unicode characters in length.

N.B. The tags described in this page are not to be confused with Locus Tags.
<<<<<<< HEAD


=======
>>>>>>> origin/sample_checklist_docs
405 changes: 405 additions & 0 deletions retrieval/programmatic-access/tag_querying_highleveltable.rst

Large diffs are not rendered by default.

262 changes: 262 additions & 0 deletions retrieval/programmatic-access/tag_querying_works.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
========================
Text Tags On ENA Objects
========================

-----------------
Table of Contents
-----------------

* What are Tags and Why are they Useful?
* How many Tags can an Object Possess?
* What Tags are Available?
* How are the Tags Created?
* Miscellaneous

.. _my-reference-label:

--------------------------------------
What are Tags and Why are they Useful?
--------------------------------------
The tags are controlled textual annotations provided to objects, such as sample and taxonomy.

The purpose of these is to make searching and filtering much easier. In ENA they are often used to determine object membership of certain data portals. Vice versa they can also be used to easily access vignettes of data from which to build a new data portal rapidly.

Examples:

* Find all pathogenic samples by using the “pathogen” tag (this is used to drive the data coverage of the `Pathogens Portal <https://www.pathogensportal.org>`_.)
* Use “marine:high_confidence” tag to find all samples that are highly likely to be from the marine environment.
* Find all records in ENA data that have a corresponding record cross-referenced to the `WoRMS - World Register of Marine Species <https://www.marinespecies.org/>`_, by searching “xref:worms”.

The tagging system has proved useful in determining the object membership of certain domain specific data portals such as Pathogens Portal. Conversely they can also be used to easily obtain vignettes of data from which to build a new data portal rapidly.

------------------------------------
How many Tags can an Object Possess?
------------------------------------
An object such as a sample can have zero or multiple tags.

A sample for example could be tagged as both “marine:high_confidence” and “terrestrial:low_confidence”.

------------------------
What Tags are Available?
------------------------

Most of the sample and taxonomy tags have the format: high_level_tag:low_level_tag. The high level tag is often used to provide some extra context to the more granular tag.


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Table of Object High Level Tags
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


.. csv-table:: High Level Tags
:header: "high level tag", "description", "object type"
:widths: 20, 300, 50

"pathogen", "The sample has been automatically determined to belong to the Pathogens Portal", "assembly; sample; sequence; study; secondary_study; taxonomy"
"coastal_brackish", "The sample has been automatically determined by evaluation of GPS and other parameters to have some evidence of being collected from either a coastal or brackish environment.", "read_run; sample; taxonomy"
"freshwater", "The sample has been automatically determined by evaluation of GPS and other parameters to have some evidence of being collected from a freshwater environment.", "read_run; sample; taxonomy"
"marine", "The sample has been automatically determined by evaluation of GPS and other parameters to have some evidence of being collected from a marine environment.", "read_run; sample; taxonomy"
"terrestrial", "The sample has been automatically determined by evaluation of GPS and other parameters to have some evidence of being collected from a terrestrial environment.", "read_run; sample; taxonomy"
"datahub", "The sample has been automatically determined to belong to a datahub. Currently tags have been generated for `FAANG <https://data.faang.org/home>`_ and `Pathogen <https://www.pathogensportal.org/datahubs.>`_", "analysis; read_run; sample; secondary_study"
"xref", "The sample has been referenced in an external to the EMBL-EBI repository. Currently tags have been generated for WORMS and UniEUK.", "Depends on how the user submitted"
"covid19", "The sample has been automatically determined to belong to the COVID19 portal.", "analysis; read_run; sample; sequence; study"



^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Table of All Object High and Low Level Tags
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

These are at several types of objects especially sample and taxonomy. Please see the object types in the previous (high level tag)
table, to see what they apply to.

.. list-table:: Object High and Low Level Tags
:widths: 15 10 30 10 10
:header-rows: 1

* - tag presentation
- high level tag
- low level tag
- description
- comment
* - pathogen
- pathogen
-
- Object is some type of pathogen
- There will likely be other low level tags to provide context.
* - pathogen:priority
- pathogen
- priority
-
-
* - pathogen:bacterium
- pathogen
- bacterium
- Object is of a bacterium organism.
- At time of documentation the bacterium is not specifically pathogenic.
* - pathogen:fungus
- pathogen
- fungus
- Object is of a fungus orgnism.
- At time of documentation the fungus is not specifically pathogenic.
* - pathogen:helminth
- pathogen
- helminth
-
- At time of documentation the helminth is not specifically pathogenic.
* - pathogen:protozoan
- pathogen
- protozoan
- Object is of a protozon organism.
- At time of documentation the protozoan is not specifically pathogenic.
* - pathogen:virus
- pathogen
- virus
- Object is of a virus organism.
- At time of documentation the virus is not specifically pathogenic.
* - coastal_brackish
- coastal_brackish
-
- Some evidence that the object is “coastal or brackish” environment associated.
- There will likely be other low level tags to provide context.
* - coastal_brackish:high_confidence
- coastal_brackish
- high_confidence
- strong evidence that the object is “coastal or brackish” environment associated.
-
* - coastal_brackish:medium_confidence
- coastal_brackish
- medium_confidence
- moderate evidence that the object is “coastal or brackish” environment associated.
-
* - coastal_brackish:low_confidence
- coastal_brackish
- low_confidence
- weak evidence that the object is “coastal or brackish” environment associated.
-
* - freshwater
- freshwater
-
- Some evidence that it is “freshwater” environment assosciated
- There will likely be other low level tags to provide context.
* - freshwater:high_confidence
- freshwater
- high_confidence
- Strong evidence that the object is freshwater environment associated.
-
* - freshwater:medium_confidence
- freshwater
- medium_confidence
- moderate evidence that the object is freshwater environment associated.
-
* - freshwater:low_confidence
- freshwater
- low_confidence
- weak evidence that the object is freshwater environment associated.
-
* - marine
- marine
-
- Some evidence that it is “marine” environment assosciated
- There will likely be other low level tags to provide context.
* - marine:high_confidence
- marine
- high_confidence
- Strong evidence that the object is marine environment associated.
-
* - marine:medium_confidence
- marine
- medium_confidence
- moderate evidence that the object is marine environment associated.
-
* - marine:low_confidence
- marine
- low_confidence
- weak evidence that the object is marine environment associated.
-
* - terrestrial
- terrestrial
-
- Some evidence that it is terrestrial(land) environment associated.
- There will likely be other low level tags to provide context.
* - terrestrial:high_confidence
- terrestrial
- high_confidence
- Strong evidence that the object is terrestrial(land) environment associated.
-
* - terrestrial:medium_confidence
- terrestrial
- medium_confidence
- moderate evidence that the object is terrestrial(land) environment associated.
-
* - terrestrial:low_confidence
- terrestrial
- low_confidence
- weak evidence that the object is terrestrial(land) environment associated.
-
* - datahub:faang
- datahub
- Faang
- Is a `Functional Annotation of ANimal Genomes project (FAANG) <https://data.faang.org/home>`_ sample and present in that datahub
-
* - datahub:metagenome
- datahub
- metagenome
- Is a metagenome and present in that datahub
-
* - xref:arrayexpress
- xref
- arrayexpress
- Object associated with an `ArrayExpress <https://www.ebi.ac.uk/biostudies/arrayexpress>`_ record
- A xref is available that links to ArrayExpress
* - xref:europepmc
- xref
- europepmc
- Object associated with a `European PubmedCentral <https://europepmc.org>`_ record
- A xref is available that links to European PubmedCentral
* - xref:pubmed
- xref
- pubmed
- Object associated with an `NCBI Pubmed <https://pubmed.ncbi.nlm.nih.gov>`_ record
- A xref is available that links to NCBI Pubmed
* - xref:worms
- xref
- worms
- Object associated with a `WoRMS <https://www.marinespecies.org/>`_ record
-
* - xref:unieuk
- xref
- unieuk
- Object associated with a `UNIEUK /(Universal taxonomic framework and integrated reference gene databases for Eukaryotic biology, ecology, and evolution ) <https://unieuk.net>`_ record
- A xref is available that links to UNIEUK
* - covid19
-
- covid19
- Object associated with covid19
-
* - covid19Host
-
- covid19Host
- Object associated with a covid19 Host
-

-------------------------
How are the Tags Created?
-------------------------

The tags are typically assigned by automatic processes analysing the user supplied metadata around an object.

For example, the identification of “marine” sample records is systematically assessed by a combination of geo-coordinates and taxonomic evidence. We can further qualify such identification by a level of confidence which is dictated by a combination of the evidence available on the record to support said assertion.

This is an evolving and continuously improving process, where the algorithms and the rule-sets used for classification can be updated as new insights are obtained and thus results in the assigned tags being regularly refreshed. The flexibility of this system allows for new classifications to be easily created allowing the definition of new, high-level contextual groupings for ENA data making the process of discovery more intuitive for certain user communities.


-------------
Miscellaneous
-------------

The tags are all less than 21 Unicode characters in length.

N.B. The tags described in this page are not to be confused with Locus Tags.


Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions submit/annotations/bearer_file
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
eyJhbGciOiJSUzI1NiJ9.eyJpc3MiOiJodHRwczovL2FhaS5lYmkuYWMudWsvc3AiLCJqdGkiOiItaGhBOGIyYVhtR2FzTXlrQy1FWl9BIiwiaWF0IjoxNzA1NjY0MjUzLCJzdWIiOiJ1c3ItNmZlZjc3NWEtNGM5OS00NWE1LThmYzMtMzZjYjMyYmFhZWVhIiwiZW1haWwiOiJ3b29sbGFyZEBlYmkuYWMudWsiLCJuaWNrbmFtZSI6InB3b29sbGFyZF9wcm9kIiwibmFtZSI6IlBldGVyIFdvb2xsYXJkIiwiZG9tYWlucyI6W10sImV4cCI6MTcwNTY2Nzg1M30.TmS_z0qEzZeCa-GG1A9uPi3Id4LDTh9CzUy8Q95ArVXXbj_25y8sAEYUDttnqeBxizlrrBmN7VKAPAmPPlZ0glBcK7_vAl6XvnSlNAISO3KYFbYm_8Y7-7hsmchMSw9VrL47s_1SGgjjcklhLKZ4T_3IiZscLCVQ0uBkVZ-mNzhNj3Gf8BDevBofTUHVO9tJuBxqH6hZeKFGEQEZfidQdU7o6Q6rt1i6NR9zkDd_0kkONQy987oN3gGlC_iuwVN8usRCkLw1_Oa0SGWuOAlNu0Gs33ts5Pu54TWEPcBcOESdpb6eyiPUzGVHA3NFiHwI2tme4Fz6DXjKTL2HB8ij3A
Loading