Mapping between BioKno and BioLink
This is a first mapping between the two models.
Background, Knetminer
Background, BioLink Model
General questions
Common attributes
I've seen the biolink:Attribute class and a its subclasses. We have a number of common datavalue properties in Knetminer (see below), which we use to attach plain data values (numbers, strings) to nodes and relations. For instance: name, description, p-value, score, provenance, evidence. I couldn't find the equivalents of these in BioLink, should I look at some other part of the model?
qualifiers and publications attached to Association
how is this managed in models like property graphs and Neo4j? Normally, in these models you can have two endpoints, a predicate identifiers and multiple attributes like strings or numbers, but you can't point at further nodes from a relation, the only way would be URI attributes or alike. Is it managed this way? Do we have many datasets using these things?
subclasses of Association
For example, let's take GeneToDiseaseAssociation. When generating data in BioLink format, should I always use this specific association? Or is it there for possible inference?
Namely, say I have genep53 encodes p53, would it be fine to say this is an instance of Association and then, possibly, some reasoner can entail that it's also a GeneToDiseaseAssociation (from the type of genep53)? Or should I detect what genep53 is and use the appropriate Association subtype?
The former case is easier to implement, the other is more complicated, cause Knetminer data aren't always so clean that we can always make entailments like above from the node type (ie, in some datasets, some relations falling under things like GeneRegulatoryRelationship haven't been defined using very standard vocabularies, so it's hard to recognise them).
Degree of formality for certain relations
For instance, biolink:has_participant has occurrent as domain. In formal OWL ontologies, this entails that, for example, the particular P53 coming from a particular sample has participated to a particular reaction. But do you really model data this way? Knteminer has a has_participant property, but this links things like apoptosis, intended as the description of the process that can happen at some point and the concept of the protein named P53, which could have zillion of instances. In other words, we use this property to link the correspondent continuants, not their specific instances.
Mappings, classes
- bk:Disease = biolink:Disease
- bk:Molecule < biolink:MolecularEntity
- bk:Compound = biolink:ChemicalSubstance
- bk:Drug = biolink:Drug
- bk:Metabolite = biolink:Metabolite
- bk:MoleculeComplex = biolink:MacromolecularComplex ?
- bk:Protein = biolink:Protein
- bk:Enzyme ? (biolink:Protein + biolink:qualifiers)
- bk:TF ? (biolink:Protein + biolink:qualifiers)
- bk:Process ? intended as reaction, transport, and other BioPax processes
- bk:Reaction
- bk:Transport
- bk:Experiment ? biolink:AdministrativeEntity + biolink:qualifiers
- bk:Tissue ? biolink:BiologicalEntity + biolink:qualifiers
- bk:OntologyTerms = biolink:OntologyClass TODO: subclasses
- bk:Path = biolink:Pathway ? (TODO: subclasses)
- bk:Publication = biolink:Publication
- bk:Gene = biolink:Gene
- bk:Treatment > biolink:Treatment (our treatment is general, not just exposure to substance)
Mapping, relations (ie, object properties)
- bk:enc < biolink:has_gene_product (encodes, link a gene to the protein it expresses, or other molecular entities, eg, ncRNA, probably better to add a biolink:qualifiers)
- bk:en_by < biolink:produced_by (encoded by)
- bk:attributeUnit < biolink:QuantityValue (TODO: is it fine to entail an attribute is a QuantityValue too?)
- bk:asso_wi = biolink:related_to (this is associated_with)
- bk:cooc_wi < biolink:coexists_with (this is
co-occurs with and is often used to match entities that co-occurs in the same publications
- bk:produces = biolink:produces
- bk:produced_by = biolink:produced_by
- bk:has_participant > biolink:has_participant (> because in our case the domain is more generic than occurent)
- bk:participates_in > biolink:participates_in (same as above)
- bk:occ_in = biolink:occurs_in
- bk:has_part = biolink:has_part
- bk:part_of = biolink:part_of
- bk:publication_features < biolink:related_to (this is subproperty of dc:subject, schema:about, probably needs a new property in biolink, or to use biolink:related_to + biolink:qualifiers)
- bk:related biolink:related_to
- bk:cs_by < biolink:contributes_to this is "consumed by", should be: subprop (inverse biolink:has_input)
- bk:consumed_by < biolink:contributes_to
- bk:consumes < biolink:has_input
Mapping, attributes (ie, data properties)
- TODO: relevant ones are: title, description, comment, creation date, p-value, score, evidence (including evidence code), provenance. For most of them, I cannot find equivalents in biolink, we need real datasets and guidance from them.
TODO
So far, I've gone in one direction only (bk->biolink). We need to check the other direction, to see if there are biolink entities that should be mapped in bk with additions.
Mapping between BioKno and BioLink
This is a first mapping between the two models.
Background, Knetminer
Background, BioLink Model
General questions
Common attributes
I've seen the
biolink:Attributeclass and a its subclasses. We have a number of common datavalue properties in Knetminer (see below), which we use to attach plain data values (numbers, strings) to nodes and relations. For instance: name, description, p-value, score, provenance, evidence. I couldn't find the equivalents of these in BioLink, should I look at some other part of the model?qualifiers and publications attached to Association
how is this managed in models like property graphs and Neo4j? Normally, in these models you can have two endpoints, a predicate identifiers and multiple attributes like strings or numbers, but you can't point at further nodes from a relation, the only way would be URI attributes or alike. Is it managed this way? Do we have many datasets using these things?
subclasses of Association
For example, let's take GeneToDiseaseAssociation. When generating data in BioLink format, should I always use this specific association? Or is it there for possible inference?
Namely, say I have genep53 encodes p53, would it be fine to say this is an instance of Association and then, possibly, some reasoner can entail that it's also a GeneToDiseaseAssociation (from the type of genep53)? Or should I detect what genep53 is and use the appropriate Association subtype?
The former case is easier to implement, the other is more complicated, cause Knetminer data aren't always so clean that we can always make entailments like above from the node type (ie, in some datasets, some relations falling under things like GeneRegulatoryRelationship haven't been defined using very standard vocabularies, so it's hard to recognise them).
Degree of formality for certain relations
For instance,
biolink:has_participanthas occurrent as domain. In formal OWL ontologies, this entails that, for example, the particular P53 coming from a particular sample has participated to a particular reaction. But do you really model data this way? Knteminer has a has_participant property, but this links things like apoptosis, intended as the description of the process that can happen at some point and the concept of the protein named P53, which could have zillion of instances. In other words, we use this property to link the correspondent continuants, not their specific instances.Mappings, classes
Mapping, relations (ie, object properties)
co-occurs withand is often used to match entities that co-occurs in the same publicationsMapping, attributes (ie, data properties)
TODO
So far, I've gone in one direction only (bk->biolink). We need to check the other direction, to see if there are biolink entities that should be mapped in bk with additions.