Skip to content

Mapping to BioLink Model #1

@marco-brandizi

Description

@marco-brandizi

Mapping between BioKno and BioLink

This is a first mapping between the two models.

Background, Knetminer

Background, BioLink Model

General questions

Common attributes

I've seen the biolink:Attribute class and a its subclasses. We have a number of common datavalue properties in Knetminer (see below), which we use to attach plain data values (numbers, strings) to nodes and relations. For instance: name, description, p-value, score, provenance, evidence. I couldn't find the equivalents of these in BioLink, should I look at some other part of the model?

qualifiers and publications attached to Association

how is this managed in models like property graphs and Neo4j? Normally, in these models you can have two endpoints, a predicate identifiers and multiple attributes like strings or numbers, but you can't point at further nodes from a relation, the only way would be URI attributes or alike. Is it managed this way? Do we have many datasets using these things?

subclasses of Association

For example, let's take GeneToDiseaseAssociation. When generating data in BioLink format, should I always use this specific association? Or is it there for possible inference?

Namely, say I have genep53 encodes p53, would it be fine to say this is an instance of Association and then, possibly, some reasoner can entail that it's also a GeneToDiseaseAssociation (from the type of genep53)? Or should I detect what genep53 is and use the appropriate Association subtype?

The former case is easier to implement, the other is more complicated, cause Knetminer data aren't always so clean that we can always make entailments like above from the node type (ie, in some datasets, some relations falling under things like GeneRegulatoryRelationship haven't been defined using very standard vocabularies, so it's hard to recognise them).

Degree of formality for certain relations

For instance, biolink:has_participant has occurrent as domain. In formal OWL ontologies, this entails that, for example, the particular P53 coming from a particular sample has participated to a particular reaction. But do you really model data this way? Knteminer has a has_participant property, but this links things like apoptosis, intended as the description of the process that can happen at some point and the concept of the protein named P53, which could have zillion of instances. In other words, we use this property to link the correspondent continuants, not their specific instances.

Mappings, classes

  • bk:Disease = biolink:Disease
  • bk:Molecule < biolink:MolecularEntity
  • bk:Compound = biolink:ChemicalSubstance
  • bk:Drug = biolink:Drug
  • bk:Metabolite = biolink:Metabolite
  • bk:MoleculeComplex = biolink:MacromolecularComplex ?
  • bk:Protein = biolink:Protein
  • bk:Enzyme ? (biolink:Protein + biolink:qualifiers)
  • bk:TF ? (biolink:Protein + biolink:qualifiers)
  • bk:Process ? intended as reaction, transport, and other BioPax processes
  • bk:Reaction
  • bk:Transport
  • bk:Experiment ? biolink:AdministrativeEntity + biolink:qualifiers
  • bk:Tissue ? biolink:BiologicalEntity + biolink:qualifiers
  • bk:OntologyTerms = biolink:OntologyClass TODO: subclasses
  • bk:Path = biolink:Pathway ? (TODO: subclasses)
  • bk:Publication = biolink:Publication
  • bk:Gene = biolink:Gene
  • bk:Treatment > biolink:Treatment (our treatment is general, not just exposure to substance)

Mapping, relations (ie, object properties)

  • bk:enc < biolink:has_gene_product (encodes, link a gene to the protein it expresses, or other molecular entities, eg, ncRNA, probably better to add a biolink:qualifiers)
  • bk:en_by < biolink:produced_by (encoded by)
  • bk:attributeUnit < biolink:QuantityValue (TODO: is it fine to entail an attribute is a QuantityValue too?)
  • bk:asso_wi = biolink:related_to (this is associated_with)
  • bk:cooc_wi < biolink:coexists_with (this is co-occurs with and is often used to match entities that co-occurs in the same publications
  • bk:produces = biolink:produces
  • bk:produced_by = biolink:produced_by
  • bk:has_participant > biolink:has_participant (> because in our case the domain is more generic than occurent)
  • bk:participates_in > biolink:participates_in (same as above)
  • bk:occ_in = biolink:occurs_in
  • bk:has_part = biolink:has_part
  • bk:part_of = biolink:part_of
  • bk:publication_features < biolink:related_to (this is subproperty of dc:subject, schema:about, probably needs a new property in biolink, or to use biolink:related_to + biolink:qualifiers)
  • bk:related biolink:related_to
  • bk:cs_by < biolink:contributes_to this is "consumed by", should be: subprop (inverse biolink:has_input)
  • bk:consumed_by < biolink:contributes_to
  • bk:consumes < biolink:has_input

Mapping, attributes (ie, data properties)

  • TODO: relevant ones are: title, description, comment, creation date, p-value, score, evidence (including evidence code), provenance. For most of them, I cannot find equivalents in biolink, we need real datasets and guidance from them.

TODO

So far, I've gone in one direction only (bk->biolink). We need to check the other direction, to see if there are biolink entities that should be mapped in bk with additions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions