Skip to content

Commit a450c95

Browse files
author
Kenneth Daily
authored
Merge pull request #344 from Sage-Bionetworks/develop
Develop
2 parents 9ee4515 + a879140 commit a450c95

File tree

8 files changed

+403
-80
lines changed

8 files changed

+403
-80
lines changed

CONTRIBUTING.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
## About this project
2+
3+
Welcome! This project is for managing annotations and controlled vocabularies for use in Synapse. These have been developed with communities supported by Sage Bionetworks in mind, but are not restricted to that use. If you want the files in your projects to be discoverable in the same fashion as the Sage-supported communities, then feel free to use these!
4+
5+
## Contributing
6+
7+
This contributing document focuses on the guidelines for users related to the Sage Bionetworks supported communities - that is to say Sage Bionetworks employees and members of the communities who are responsible for metadata and annotations.
8+
9+
## Guidelines for proposing new terms
10+
11+
Our strategy is to rely on annotation terms and definitions that have already been made and standardized whenever possible for use with Sage Bionetworks supported communities. In general, we will not include terms in this repository that are not needed and vetted by our communities - but don't let that stop you from using this! Feel free to fork and include terminology that you require for your own use.
12+
13+
If you are proposing a new term, then we require a source for the definition. The first place to look for an existing term is the [EMBL-EBI Ontology Lookup Service](https://www.ebi.ac.uk/ols). We have some preferred ontology term sources: EDAM, EFO, OBI, and NCIT. It's OK if your term comes from another source, but use the preferred sources whenever possible. If your term does not currently exist, or has a different definition than existing ones, then:
14+
15+
1. Provide a different source URL - this may be a Wikipedia entry, link to a commercial web site, or other URL.
16+
1. If you are a Sage Bionetworks employee and cannot find a source URL, then use "Sage Bionetworks" as the `source` and your own definition.
17+
2. If you are not (nor are you working with a Sage Bionetworks supported community) it is up to you for a strategy for controlling new terms to be added.
18+
19+
## Guidelines for specific term types
20+
21+
In some situations (e.g. drug names), terms are not always well-captured by the ontologies found in the Ontology Lookup Service. We've defined some best practices for contributing these terms here.
22+
23+
### Contribution of drug terms
24+
25+
The preferred first-pass strategy for chemical name annotation is to search the EMBL-EBI ontology lookup service to find names, descriptions, and sources. Typically, the NCI Thesaurus will provide a suitable description for drugs and other biologically active molecules. In situations where the query molecule is not found in [EMBL-EBI Ontology Lookup Service](https://www.ebi.ac.uk/ols), a helpful secondary location to find chemical descriptions is [MeSH](https://meshb.nlm.nih.gov/).
26+
27+
Example:
28+
29+
```
30+
{
31+
"value": "DEFACTINIB",
32+
"description": "An orally bioavailable, small-molecule focal adhesion kinase (FAK) inhibitor with potential antiangiogenic and antineoplastic activities.",
33+
"source": "http://purl.obolibrary.org/obo/NCIT_C79809"
34+
},
35+
```
36+
37+
In situations where novel molecules (such as newly-synthesized research compounds or proprietary pharmaceutical molecules) require annotation, the only suitable description and source might be the paper describing the synthesis or discovery, or information from the pharmaceutical company that created the identifier.
38+
39+
Example:
40+
41+
```
42+
{
43+
"value": "IPC-12345",
44+
"description": "An small-molecule target of importance 4 (TOI4) inhibitor with potential antineoplastic activities.",
45+
"source": "Important Pharma Company"
46+
},
47+
{
48+
"value": "BestChemist-00913",
49+
"description": "An investigational small molecule discovered by Best Chemist et al.",
50+
"source": "PubMed Link Goes Here"
51+
},
52+
```
53+
54+
55+
## Contribution procedure
56+
57+
Again, this is focused on Sage Bionetworks supported communities and employees. This focuses on the logistical components to contributing - for the technical components, please see the [Development](https://github.com/Sage-Bionetworks/synapseAnnotations#development) section of the [README.md](README.md) document.
58+
59+
1. Propose a change, either through a Github [issue](https://github.com/Sage-Bionetworks/synapseAnnotations/issues) or [pull request](https://github.com/Sage-Bionetworks/synapseAnnotations/pulls). Your change should be as atomic as possible - e.g., don't lump together many unrelated changes into a single issue or pull request. You may be requested to split them out.
60+
1. Label your issue or pull request with the appropriate labels. For example, if you are suggesting a new value be added to an existing key, then `create value` would be the appropriate label.
61+
1. Assign the issue to yourself and anyone else who will be involved in completing it. At a minimum, the issue creator should be assigned initially.
62+
1. If this is a pull request a review from someone in Github - this can be found under 'Reviewers' on the right side of the screen when viewing a Github issue. It's fine if they can review your pull request without meeting. Otherwise, set up a meeting on your own to meet with your reviewer.
63+
1. If your reviewer has no problems with the change, then the change can be merged.
64+
The issue creator is responsible for merging. Note that you can use [keywords](https://help.github.com/articles/closing-issues-using-keywords/) to close issues via your pull request. See the [Development](https://github.com/Sage-Bionetworks/synapseAnnotations#development) section of the [README.md](README.md) document for the merging procedure.
65+
1. If you and the reviewer decide that a larger discussion is necessary, the issue can be brought to the larger annotations working group for discussion.

README.md

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,30 @@
11
[![Build Status](https://travis-ci.org/Sage-Bionetworks/synapseAnnotations.svg?branch=master)](https://travis-ci.org/Sage-Bionetworks/synapseAnnotations)
22

3+
> Use our [Annotation UI](https://shiny.synapse.org/users/nsanati/annotationUI/) application to easily view and search our existing annotation definitions:
4+
>
5+
36
# Introduction
47

5-
Sage Bionetworks derived standards for annotating content in Synapse.
8+
Sage Bionetworks derived standards for annotating content in Synapse. This provide a mechanism for defining, managing, and implementing controlled vocabularies when annotating content in Synapse. The standards here are developed for Sage Bionetworks supported communities and consortiums by the [Sage Bionetworks Synapse Annotations working group](https://www.synapse.org/annotation) but are available for any other use.
9+
10+
# Annotation definitions
611

7-
# Schemas
12+
This repository contains schemas that define the things we want to annotate as well as the controlled values that may be used. You can think of these as 'columns' of a table, each with a limited set of values that can occur in each column.
813

9-
Schemas are stored here in [Synapse Table Schema](http://docs.synapse.org/articles/tables.html) format. A schema is a list of [`Column Model`s](http://docs.synapse.org/rest/org/sagebionetworks/repo/model/table/ColumnModel.html) in JSON format.
14+
Our schema definitions are stored here in [Synapse Table Schema](http://docs.synapse.org/articles/tables.html) format. A schema is a list of [`Column Model`s](http://docs.synapse.org/rest/org/sagebionetworks/repo/model/table/ColumnModel.html) in JSON format. Using this format allows us to use them in a straightforward manner with other features of Synapse, including [file views](http://docs.synapse.org/articles/fileviews.html) and [tables](http://docs.synapse.org/articles/tables.html).
1015

1116
Column types are required, and the valid types can be found [here](http://docs.synapse.org/rest/org/sagebionetworks/repo/model/table/ColumnType.html).
1217

18+
# Organization
19+
20+
All schema definitions can be found in the [synapseAnnotations/synapseAnnotations/data/](synapseAnnotations/synapseAnnotations/data/) folder. There are three high level schemas: [experimental data](synapseAnnotations/synapseAnnotations/data/experimentalData.json), [tool](synapseAnnotations/synapseAnnotations/data/tool.json), and [analysis](synapseAnnotations/synapseAnnotations/data/analysis.json).
21+
22+
Schemas for specific communities and consortia are also defined, such as for the [neurodegenerative diseases consortiums](synapseAnnotations/data/neuro.json), [cancer consortiums](synapseAnnotations/data/cancer.json), and specific group such as (but not limited to) [Project GENIE](synapseAnnotations/data/genie.json).
23+
1324
# Development
1425

26+
This section discusses the technical steps for developing on this repository. See the [CONTRIBUTING.md](CONTRIBUTING.md) document for more information on how to contribute annotations to this project.
27+
1528
Internal development can be performed by branching from `develop` to your own feature branch, making changes, pushing the branch to this repository, and opening a pull request. Pull requests against the `develop` branch require a review before merging. The only pull requests that will go to `master` are from `develop`, and will trigger a new release (see below for release procedures). If you are editing using the Github web site, make sure you switch to the `develop` branch first before clicking the `Edit this file` button. If you accidentally open a pull request against `master`, you can change this in your pull request using the `Edit` button.
1629

1730
All pushed branches and pull requests are also tested through the continuous integration service [Travis CI](https://travis-ci.org/Sage-Bionetworks/synapseAnnotations). All JSON files are linted using [demjson's](deron.meranda.us/python/demjson/) `jsonlint` command line program.
@@ -33,37 +46,33 @@ pip install -r requirements.txt
3346
1. Make changes on your feature branch.
3447
1. Request and complete a review from someone on the team.
3548
1. When review is completed, note it to be reviewed and merged at the weekly meeting.
36-
1. Finalize merge into the `master` branch.
49+
1. Finalize merge into the `develop` branch.
3750
1. Update the version and make a versioned release (with assistance from @teslajoy)
3851

3952
# Release Versioning Annotations
4053
Releases are made through Github tags and are available on the [Releases](https://github.com/Sage-Bionetworks/synapseAnnotations/releases) page.
4154

42-
The release version structure **v0.0.0** follows [semantic versioning](http://semver.org/) guidelines. New releases are made using the following rules:
55+
The release version structure **vX.X.X** follows [semantic versioning](http://semver.org/) guidelines. New releases are made using the following rules:
4356

44-
Major version **v0** increments by:
57+
Major version increments by:
4558
1. Changes in data structure (ex. yaml to json or json to mongodb)
4659
2. Changes to existing keys
4760
3. Changes to existing values
4861

49-
Minor version **.0.** increments by:
62+
Minor version increments by:
5063
1. Adding keys
5164
2. Adding values
5265

53-
Patch version **.0** increments by:
66+
Patch version increments by:
5467
1. Errors or corrections that don't break the API
5568

56-
To optimize usability, the release tags should be placed on two required and one optional locations:
57-
1. A Synapse Project annotation as a single value vs. list of versions and defined by the key **`annotationVersion`**.
58-
2. The Shiny application [Annotation UI](https://github.com/Sage-Bionetworks/annotationUI)'s **title**
69+
To optimize usability, the release tags should be placed on two required and one optional locations:
70+
1. A Synapse Project annotation as a single value defined by the key **`annotationReleaseVersion`**.
71+
2. The Shiny application [Annotation UI](https://github.com/Sage-Bionetworks/annotationUI)'s **title**
5972
3. OPTIONAL: Documented in a Synapse Project wiki.
6073

61-
**Note:** _Git Commit messages including the words **added** or **changed** would facilitate the release version incrementation process_
62-
6374
## Update `CHANGELOG.md` and release notes
6475

65-
After drafting a release, use this [Ruby package](https://github.com/skywinder/github-changelog-generator) to autogenerate a `CHANGELOG.md` locally that can be committed to the repository. It requires a Github Personal Access Token.
76+
After drafting a release, use this [Ruby package](https://github.com/skywinder/github-changelog-generator) to auto-generate a `CHANGELOG.md` locally that can be committed to the repository. It requires a [Github Personal Access Token](https://github.com/settings/tokens).
6677

6778
```
68-
69-

synapseAnnotations/data/analysis.json

Lines changed: 81 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,7 @@
276276
},
277277
{
278278
"name": "alignmentMethod",
279-
"description": "DNA or RNA sequence alignement method",
279+
"description": "DNA or RNA sequence alignment method",
280280
"columnType": "STRING",
281281
"maximumSize": 250,
282282
"enumValues": [
@@ -384,11 +384,36 @@
384384
"columnType": "STRING",
385385
"maximumSize": 250,
386386
"enumValues": [
387+
{
388+
"value": "Variant calling",
389+
"description": "Identify and map genomic alterations, including single nucleotide polymorphisms, short indels and structural variants, in a genome sequence.",
390+
"source": "http://edamontology.org/operation_3227"
391+
},
392+
{
393+
"value": "Sequence alignment",
394+
"description": "Alignment of reads to a reference genome",
395+
"source": "http://edamontology.org/operation_0292"
396+
},
387397
{
388398
"value": "genotypeImputation",
389399
"description": "The statistical inference of unobserved genotypes",
390400
"source": ""
391401
},
402+
{
403+
"value": "DNAmethylationImputation",
404+
"description": "",
405+
"source": "https://dx.doi.org/10.1534/genetics.115.185967"
406+
},
407+
{
408+
"value": "polygenicRiskScore",
409+
"description": "A sample level estimate of the genetic component of disease risk based on genome-wide association studies",
410+
"source": "http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003348"
411+
},
412+
{
413+
"value": "Enrichment Analysis",
414+
"description": "Over-representation analysis",
415+
"source": "http://edamontology.org/operation_3501"
416+
},
392417
{
393418
"value": "statisticalNetworkReconstruction",
394419
"description": "",
@@ -493,6 +518,56 @@
493518
"value": "peakCalling",
494519
"description": "",
495520
"source": ""
521+
},
522+
{
523+
"value": "Copy number estimation",
524+
"description": "Estimate the number of copies of loci of particular gene(s) in DNA sequences typically from gene-expression profiling technology based on microarray hybridisation-based experiments.",
525+
"source": "http://edamontology.org/operation_3233"
526+
},
527+
{
528+
"value": "Gene expression profile comparison",
529+
"description": "Comparison of gene expression profiles.",
530+
"source": "http://edamontology.org/operation_0315"
531+
},
532+
{
533+
"value":"de-novo assembly",
534+
"description": "",
535+
"source": "http://purl.obolibrary.org/obo/GENEPIO_0001628"
536+
},
537+
{
538+
"value":"correlation",
539+
"description":"The degree to which two or more quantities or events are linearly associated, a statistical relation between two or more variables such that systematic changes in the value of one variable are accompanied by systematic changes in the others.",
540+
"source":"http://purl.obolibrary.org/obo/NCIT_C48834"
541+
},
542+
{
543+
"value":"network analysis",
544+
"description":"A data transformation that takes as input data that describes biological networks in terms of the node (a.k.a. vertex) and edge graph elements and their characteristics and generates as output properties of the constituent nodes and edges, the sub-graphs, and the entire network.",
545+
"source":"http://purl.obolibrary.org/obo/OBI_0200080"
546+
},
547+
{
548+
"value":"purity",
549+
"description":"A quantitative assessment of the homogeneity or uniformity of a mixture. Alternatively, purity refers to the degree of being free of contaminants or heterogeneous components",
550+
"source":"http://purl.obolibrary.org/obo/NCIT_C62352"
551+
},
552+
{
553+
"value":"dose response study",
554+
"description":"A study of the effect of dose changes on the efficacy of a drug in order to determine the dose-response relationship and optimal dose of a therapy.",
555+
"source":"http://purl.obolibrary.org/obo/NCIT_C127803"
556+
},
557+
{
558+
"value":"assessment",
559+
"description":"The final result of a determination of the value, significance, or extent of.",
560+
"source":"http://purl.obolibrary.org/obo/NCIT_C25217"
561+
},
562+
{
563+
"value":"quality control",
564+
"description":"A technique used to ensure a certain level of quality in a product or service.",
565+
"source":"http://purl.obolibrary.org/obo/ERO_0001219"
566+
},
567+
{
568+
"value":"comparison",
569+
"description":"The examination of two or more people or things in order to detect similarities and differences.",
570+
"source":"http://purl.obolibrary.org/obo/NCIT_C49156"
496571
}
497572
]
498573
},
@@ -569,6 +644,11 @@
569644
"value": "GaussianProcessRegression",
570645
"description": "",
571646
"source": ""
647+
},
648+
{
649+
"value": "elastic net",
650+
"description": "Elastic net is a shrinkage and selection regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.",
651+
"source": "http://purl.enanomapper.org/onto/ENM_8000083"
572652
}
573653
]
574654
},

synapseAnnotations/data/compoundScreen.json

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -518,5 +518,12 @@
518518
"columnType":"STRING",
519519
"maximumSize": 250,
520520
"enumValues": []
521-
}
521+
},
522+
{
523+
"name": "compoundDoseRange",
524+
"description": "The minimum and maximum values of a treatment range; e.g. 1-25",
525+
"columnType": "STRING",
526+
"maximumSize": 250,
527+
"enumValues": []
528+
}
522529
]

0 commit comments

Comments
 (0)