TokenIssues

Rationale for this approach on tokens (a.k.a. "evidence")

Tokens as a discrete class was a major point of discussion in the model. Originally we created the "Token" class to represent the evidence that was used to document an Occurrence. However, after some discussion we got rid of it because I (Steve) don't really think that the various types of tokens (images, text records, etc.) should be subclasses of one kind of thing. The token itself may have many uses. They can serve as objects of the hasEvidence property, but they could also serve many other uses, such as an illustration for a book, as a source of DNA, as the basis of a species description, etc. The Tokens themselves don't really share any properties that I can think of and so there didn't seem to be a compelling reason to define a class for them. However, after further discussion, we put the class back into the model with the understanding that since in RDF a resource can have more than one rdf:type, a resource could be an instance of the Token class (due to its relationships with other DSW classes) but could also be an instance of another class on the basis of the properties that describe its nature.

Some of the kinds of resources that can be Tokens already have classes/types defined in other vocabularies (such as MultiMediaObject in the Audubon Core MRTG schema) and we don't need to re-define them here. One kind of token that does not have a class definition elsewhere (outside of TDWG) is Specimen. For that reason, we have defined Specimen as a class. Several terms that are currently listed under dwc:Occurrence probably actually belong in the class we have defined as Specimen. We have noted them on the ClassSpecimen page. Classes such as MultiMediaObject are NOT subclasses of Token because we do not assert that all MultiMediaObjects are Tokens. However, a resource like a StillImage can be both a MultiMediaObject AND a Token if it serves as evidenceFor an Occurrence, is derivedFrom an IndividualOrganism, or isBasisForIdentification of an Identification.

General principles

Resource type determines appropriate object and data properties One of the problems with the existing Darwin Core (DwC) standard is that there is some overloading of classes. In particular, the dwc:Occurrence class includes properties of both the instance of documentation (i.e. Occurrence) and the resource that serves as the evidence that the Occurrence occurred (i.e. the specimen). In dsw, we differentiate these things. The purpose of an Occurrence is to record the "where" and "when" of an individual organism. Thus Events are linked to Occurrences via the dsw:atEvent property. The atEvent property should not be applied to IndividualOrganisms or Specimens. Likewise, the properties of an organism will depend on how it it typed. A documented organism will be rdf:type dsw:IndividualOrganism and therefore will have dsw:hasIdentification and dsw:hasOccurrence properties. If that organism is also documented as part of a collection (i.e. also rdf:type dsw:LivingSpecimen), then it will also have properties appropriate for specimens, such as dwc:catalogNumber.

Separation of object properties based on purpose In the same way we're avoiding overloading resource types, we are avoiding overloading token object properties. In Baskauf (2010; https://journals.ku.edu/index.php/jbi/article/view/3664) I made the mistake of using the properties sernec:derivativeOccurrence/sernec:derivedFrom to serve the purposes of tracking the origin of tokens and also to show that in some circumstances they documented the presence of organisms. In dsw, these purposes have been separated into three kinds of properties: those that indicate that a token documents an Occurrence (dsw:hasEvidence/dsw:evidenceFor), those that indicate how a token is related to a "parent" resource (any relationship such as dcterms:isPartOf or foaf:depicts), and those that facilitate tracking the provenance of a token (dsw:derivedFrom/dsw:hasDerivative). A fourth property pair (dsw:identifiedBasedOn/dsw:isBasisForIdentification) is used to relate tokens and Identifications for which they served as evidence. The diagram above shows three of the four types of properties; the fourth is illustrated below. We have defined domains and ranges of the object properties based on their appropriate use with the classes included in dsw. See the wiki page ClassToken for more discussion of this.

Separation of data properties based on resource type Because we intend that existing DwC terms be used as properties, we have not felt it necessary to incorporate them formally into the darwin-sw ontology. In cases where it seems clear that a DwC term should serve as a property of a particular dsw class, we have listed that term on the corresponding class wiki page. However, since this placement may be subject to debate and since some terms may function as properties of multiple classes, we have not formally defined domains and ranges for these properties.

Examples of token properties and relationships

Below are four examples of situations where tokens are used to document an IndividualOrganism. For each example, a diagram and explanation will be provided. Following the four examples, there will be an explanation how dsw object properties are used to define the three types of token relationships described in #2 above.

The tree in the forest is the IndividualOrganism. At Event 1, a branch is removed from the tree to become an herbarium specimen (PreservedSpecimen). At a later time that specimen is imaged (MultiMediaObject 1). At a different time (Event 2), a digital photograph (MultiMediaObject 2) is taken.

An insect in the wild (IndividualOrganism) is photographed (MultiMediaObject 1). The insect is then collected and pinned as a specimen (PreservedSpecimen 1). Both of these actions are considered to be a part of the single Event. At the museum, the pinned specimen is photographed (MultiMediaObject 2). At a later time, the specimen is taken apart and those parts are assigned identifiers as new specimens. A foreleg (PreservedSpecimen 2) is photographed under the microscope (MultiMediaObject 3).

Note: it would also be possible to model this by specifying a second type (dsw:PreservedSpecimen) for the IndividualOrganism itself since although dead, the entire organism became a preserved specimen. In that case, the IndividualOrganism and PreservedSpecimen 1 bubbles would be merged and the dsw:hasEvidence arrow would point back to the merged bubble. See the next example for this approach. Which model is correct? I think either could be used depending on the community consensus that emerges on what can be included within the definition of IndividualOrganism and Specimen. See the wiki page on the IndividualOrganism class for references related to this discussion.

John Wieczorek's favorite wildebeest calf (IndividualOrganism) is photographed (MultiMediaObject 1) as it is being captured for a zoo. At the zoo, it becomes a living specimen (LivingSpecimen). At a later time, a blood sample is taken from the calf (PreservedSpecimen if that use is allowed). DNA is extracted and sequenced (DnaSequence).

In this case, the calf is clearly both an IndividualOrganism and a LivingSpecimen because during at least part of its life it is both at the same time (vs. the insect in the previous example which may or may not be considered to still be an IndividualOrganism after it was killed and pinned in the museum. This is a philosophical question that has no one correct answer and would need to be decided by consensus or perhaps metadata providers could choose to model it as they wished). Its status as a LivingSpecimen has a defined beginning (the dwc:eventTime of its capture) and as a LivingSpecimen it serves as evidence for its own Occurrence because it is in captivity and under the control of an institution which has an interest in providing biodiversity metadata, (presumably) has accession records, and can be examined at will. This is different from the tree in the forest which cannot necessarily be examined at will and is not necessarily under the control of an institution with a vested interest in tracking its status.

This is the most complex example. A tree in a forest in Borneo is photographed (MultiMediaObject 1). At the same time (Event), a cutting is taken from a branch. This cutting is propagated into a specimen in a botanical garden (LivingSpecimen). When the living specimen matured, it was photographed as a part of the garden's documentation project (MultiMediaObject 2) and a branch of the living specimen was collected to become a specimen (PreservedSpecimen 1) in the herbarium of a neighboring university. That herbarium specimen was imaged (MultiMediaObject 3). Part of the tissue from the herbarium specimen (PreservedSpecimen 2) was collected and sent to a different university that was trying to generate a phylogenetic tree for the family that contained the tree. The resulting sequence (DnaSequence) was submitted to GenBank.

Note: although not shown here, the LivingSpecimen could simultaneously be typed as an IndividualOrganism with a separate GUID from the source IndivdiualOrganism. In that case, the MultiMediaObject 2 and PreservedSpecimen 1 would serve as evidence for two Occurrences that documented the presence of the IndividualOrganism in the botanical garden. The dsw:hasDerivative property of the source IndividualOrganism (or a dsw:derivedFrom property of the IndividualOrganism in the garden) allows a semantic client to know that Identifications of either IndividualOrganism should apply to either individual.

The diagram could be even more complex if an herbarium specimen had been collected from the original tree in Borneo. Darwin-sw allows for any degree of complexity of relationships.

Types of token relationships illustrated in the above examples

General principle #2 stated above stated that in darwin-sw different object properties are used to describe how tokens are related to each other according to the purpose that one wishes to achieve in describing the relationship.

Documentation of an Occurrence

In an Occurrence, we wish to document that the IndividualOrganism was present at an Event. In its simplest form, an Occurrence may simply be asserted by the subject of dwc:recordedBy (i.e. the person or persons who observed the organism) without any evidence. One could call such an Occurrence an observation (Observations need not be totally lacking of associated resources since they could be documented by physical resources,such as lab notebooks, which could be scanned with a representation delivered electronically.) However, it is preferable to have some kind of objective resource that was collected at the time at which the Occurrence was recorded. In darwin-sw, an Occurrence can have the property dsw:hasEvidence to connect the Occurrence resource to the token that documents it. A token can be assigned the inverse property dsw:evidenceFor which would assert exactly the same relationship.

In the example above, each of the two Occurrences of the tree is documented by a token: Occurrence 1 by the herbarium specimen and Occurrence 2 by the live plant image.

In the example above, the single Occurrence of the insect documented by the collection has two pieces of evidence: the image of the live insect, and the pinned insect itself in the museum collection.

In the example above, the Occurrence of the wildebeest calf in the wild has two forms of evidence: the image of the calf taken at the capture, and the calf itself as a living specimen in the zoo.

In the example above, the Occurrence of the tree in Borneo is documented by the live plant image taken when the tree was observed in the field and by the living specimen in the botanical garden that was grown from the cutting taken from the tree.

Relationship of a token to a parent resource

There are relationships defined by vocabularies functioning outside the biodiversity informatics community, such as Dublin Core(DCIM) and Friend-of-a-friend (FOAF), that can be used to explain how two resources are related. Because terms from these vocabularies are widely understood, it is desirable to use them to describe relationships among resources in the context of darwin-sw so that general-purpose semantic clients which are not familiar with dsw can "understand" the connection between the resources.

In the example above, foaf:depiction is used to indicate that the live-pant image is a picture of the tree and that the herbarium specimen image is a picture of the specimen. The inverse property foaf:depicts could be used as a property of the images and impart the same relationship. dcterms:hasPart is used as a property of the tree to indicate that the specimen was part of it. The inverse property isPartOf could have been used as a property of the specimen to indicate the same relationship.

dcterms:hasPart is used to indicate that the pinned specimen is part of (in this case the whole part of) the insect and that the leg specimen is part of the pinned specimen. foaf:depiction is used to relate the images to the resource they depict.

dcterms:hasPart and foaf:depiction are used as above. I don't if there is a standard term to indicate the relationship between a DNA sequence and its source.

The same terms are used as above, but in a more complex network.

Tracking provenance of a token

In darwin-sw, the term dsw:hasDerivative indicates that a resource of any sort is derived from the subject resource. The inverse property dsw:derivedFrom serves the same purpose but has the derived resource as the subject. Both of these properties are defined in the ontology as type owl:TransitiveProperty . This means that if a subject resource A (e.g. a specimen image) is related to object resource B (e.g. the specimen) by the transitive derivedFrom property, and if subject resource B (e.g. the specimen) is also related to object resource C (e.g. the tree from which the specimen is collected) by the transitive derivedFrom property, then resource A (the specimen image) is also derivedFrom resource C (the tree). Thus, no matter how many steps removed a derived resource is from the original source, the relationship of that derived resource to the original source (e.g. the organism in the wild) can always be inferred by a client through a chain of derivedFrom (or hasDerivative) relationships.

The ability to do this kind of tracking seems somewhat trivial in simple cases (such as the tree specimen or wildebeest - see below). However, in more complex cases (e.g. the insect case below), or in cases where metadata for diverse types of resources are being managed across several institutions (e.g. the botanical garden example below), the ability to track provenance through the Internet using globally unique identifiers could be very important.

The example above illustrates how a client can use multiple hasDerivative properties to infer that the specimen image is derived from the tree.

In the example above, each set of colored arrows shows a chain of hasDerivative relationships that would show that the terminal resource was derived from the insect. If the same arrows were pointed in the other direction, they would show how derivedFrom could be used to infer that each terminal resource was derived from the insect.

Chains of hasDerivative properties in the wildebeest example.

In even the most complex example, hasDerivative/derivedFrom can be used to determine the source IndividualOrganism. This is particularly important in situations such as botanical gardens where living specimens are frequently propagated and sent to other institutions. As long as each institution is careful to maintain a record of the source from which its specimens is derived, it is possible (in theory) to track each specimen to the source individual in the wild. Whether this level of diligence would be attained remains to be seen, but this system would at least make it possible.

Although not shown in any of these examples, derivedFrom could be used to track the provenance of multiple electronic resources. Any of the following examples could be related to their source IndividualOrganism using derivedFrom: a still image cropped from a larger image, a still image captured from a film, an image used as part of a key, or an image generated from non-image digital data.

dsw:idBasedOn/dsw:isBasisForId

This pair of inverse properties is used to connect a token with a dwc:Identification when the token was examined and used as the basis for identifying the IndividualOrganism. In simple cases, such as a single specimen per IndividualOrganism, the benefit of this term is not so clear. But in more complex situations, such as duplicate specimens from the same individual organism which end up in different institutional collections, it would be important to know the basis by which the determiner decided to assign that individual organism to a particular taxon. Another useful situation would be where a specimen is used as a voucher for a set of live organism images. One might place greater weight on the Identification if it were based on the specimen, the specimen and live organism images, or many live organism images, than on a single live organism image that does not convey much information.

A reference related to a philosophical discussion about the nature of a "document"

Note the section entitled "The antelope as document" where the nature of a document is similar to what we might call a token or specimen. It's a document in the sense that it documents something.

What is a "Document"? Journal of the American Society of Information Science 48, no. 9 (Sept 1997): 804-809

TokenIssues

Rationale for this approach on tokens (a.k.a. "evidence")

General principles

Examples of token properties and relationships

Types of token relationships illustrated in the above examples

Documentation of an Occurrence

Relationship of a token to a parent resource

Tracking provenance of a token

dsw:idBasedOn/dsw:isBasisForId

A reference related to a philosophical discussion about the nature of a "document"

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally