Skip to content
Steve Baskauf edited this page May 28, 2015 · 1 revision

Design principles

Basic principles

  1. If a Darwin Core (DwC) or Dublin Core (DCMI) Class exists, for (essentially) the same concept that we would like to use, we will use the same Class name, and the dwc: or dcterms: namespace.
  2. We will link DwC superclasses (Occurrence, Event, dcterms:Location, GeologicalContext, Identification, Taxon) with new dsw namespace object properties, which will have an explicit domain and range. (Note: we have not dealt with GeologicalContext in DSW thus far.)
  3. New concepts that we think are fundamental to the SW-ization of DwC will be added as new Classes with new linking predicates (with domain and range specified). E.g. IndividualOrganism and Token.
  4. We propose a re-conceptualization of Occurrence to be not the physical or digital record of the Organism, but the abstract presence of the organism in space and time. These Occurrences are documented (hasEvidence) by a range of tokens.
  5. We will avoid using most of the http://rs.tdwg.org/ontology/voc/ vocabulary because its status and permanence is not clear. This means we need to re-create several classes such as Specimen. However, the Taxon class and its properties appears to be being used, so we will refer to this for the moment.
  6. For the numerous data properties in DwC, we neither specify domains or ranges, but add a comment in our ontology about suggested domain, and suggested data type of range (e.g., dwc:eventDate, dwc:identifiedBy). However, where the DwC term is clearly a data property, we will add corresponding DSW object property with an explicit URI range (e.g., dwc:recordedBy => dsw:recBy).
  7. In accordance with LOD principles, we expect all resources referred to to have URIs.

Some elaboration of design principles

Design principles based on comments made on the tdwg-content list during Sep-Oct 2010

Posted on the list by Steve Baskauf 2010-10-23 (http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001742.html)

Modified on 2011-01-25 to reflect its application in the darwin-sw (DSW) ontology

Origin and meaning of DSW classes

We have a number of kinds of "things" (henceforth referred to as "resources") that are useful for describing and organizing metadata that we collect in our attempts to document biodiversity. For many of these types of resources, classes have been defined in the Darwin Core (DwC) standard to categorize the terms that can be used to describe the properties of resources that are instances of that class. Describing the class helps us to understand the type of resources that constitute instances of that class. In the darwin-sw ontology, we formally define classes that can be used to specify the rdf:type of resources. In DSW, we have imported the Darwin Core classes themselves. Because DwC does not specify properties of classes, or specify domains and ranges for terms, there is no formal "meaning" ascribed to classes in the context of the Darwin Core standard (other than the description in the term definitions). However, there does seem to be a general consensus in the DwC community about what those classes "mean" and it is our intention that this general understanding of what the Darwin Core classes connote applies to those classes in DSW. In an attempt to document this consensus, each class in DSW has a wiki page which cross-references the description of that class to relevant posts on the tdwg-content email list that discuss ideas about what that class "means". In contrast to DwC, which is designed to be a "general-purpose" vocabulary, DSW is designed to reflect a particular outlook on the relationships among classes of biodiversity resources. No one is required to agree with this outlook, but those who conceptualize the relationships among classes in a different way should not use DSW. Using the DSW object properties to represent relationships for which they were not intended would be "naughty" (sensu Bob Morris1) because that action could cause unintended type declarations. (See the following section for more on this.)

In the specific case of the dsw:Taxon class, that class is also defined to be equivalent to the Taxon/TaxonConcept class in the TDWG Ontology. This allows DSW to be connected to existing instances of taxa described by the TDWG Ontology and by inference to be equivalent to taxon concepts as they are described by the TDWG Taxon Concept Schema (TCS) standard.

1 http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001726.html

Appropriate properties of instances of the DSW classes

In the Darwin Core standard, a conscious decision was made to avoid formally defining rdfs:domain for terms. This decision was made to provide flexibility in the way the terms can be used and to avoid the situation where semantic clients would draw incorrect or silly conclusions about what kind of things resources are. However, this decision does not excuse us from thinking carefully about whether a term can be appropriately applied to a resource that is a member of some class (e.g. should we say that a digital photograph has a scientific name?). Placing a term within a DwC class is a suggestion that the term would appropriately be applied as a property of an instance of that class. In general, if there is an appropriate term in DwC which can be used as a property in DSW, we will use it rather than defining a new property. However, because some classes we recognize in DSW do not exist in DwC, there will be some DwC terms listed under a DwC class which are not appropriately used as properties in corresponding class in DSW. On the DSW wiki pages, we suggest some specific properties of DwC classes when it is clear (at least to us!) where they belong, although we do not go so far as to declare domains for those terms.

We do explicitly declare domains and ranges for the object properties that we have defined to "connect" the classes. These domains and ranges reflect what we consider to be the community consensus about the relationships among classes. We do this because the existing DwC standard does not formally attempt to define relationships among classes. Such relationships are suggested by the xxxxxID terms (e.g. taxonID, occurrenceID, etc.). However, there is significant ambiguity in the intended use of those terms, and because those terms were not designed to be used in a linked data context we felt the need to define object properties specifically designed to relate the DSW classes. For more discussion on this topic, see the wiki page ClassesAndTypes.

How complex should DSW be?

Advantages of more "flat" or more "fully normalized models

When users want to "flatten" and simplify their databases, they tend to eliminate one-to-many (1:M) relationships in favor of one-to-one (1:1) relationships. The result of that is differences like we saw in http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif (which allows 1:M relationships between Occurrences and Events and between Events and Locations) and http://bioimages.vanderbilt.edu/pages/rich-diagram2.gif (which "atomizes" every Occurrence by considering it to have its own separate eventTime and Location information).

  1. There is nothing intrinsically "right" or "wrong" about either of these approaches, because they each have their own advantages. The 1:M approach is more efficient, but results in a more complicated database, while the 1:1 approach results in a simpler database but may require repeating some or many term values in the records.
  2. The choices that users make in these situations is the cause of much of the disagreement about whether a certain class should exist or not since the people taking the 1:1 approach "collapse" the relationship diagram and eliminate classes they don't need while people who take the 1:M approach need instances of the class to act as nodes to connect their "many" resources to some other thing.
  3. This collapsing of the diagram is also the reason for some disagreement about whether a term belongs in a certain class or not. In the example above, 1:1 people would say that eventDate is a property of an Occurrence, while 1:M people would say that eventDate is a property of an Event.
  4. The choice of users on this issue influences their decision about whether or not to create resources that are instances of classes and hence to assign them identifiers. If users take the 1:M approach, they need identifiers for resources that are acting as connecting nodes so that they can make reference to that resource in the metadata of the many things they are connecting to it. If users take the 1:1 approach, they probably will skip creating explicit resources (and their corresponding identifiers) for resources of the class that they are "collapsing" out of the diagram).

Criteria for deciding on the level of complexity

We take the perspective that the "correct" relationship diagram is not necessarily one that caters to a certain "correct" philosophical point of view. Rather, the "right" diagram is the one that allows users to define the relationships that they need for the organization of their metadata in the simplest manner, and which provides the most clarity about what resources of various kinds are, and how they are connected.

  1. "Right" as I have defined it above depends on how broadly applicable the relationship diagram is intended to apply. An individual person or organization with limited interests may have a relationship diagram that is simpler than the diagram http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif or might choose to add classes for other things that are their personal interest. An organization interested focused on different issues or with broader interests might opt for many more or different classes that would be connected to those shown in the diagram.
  2. Given what was said in A, what is "right" for DSW is going to be defined by the needs of the Darwin Core constituency. In evaluating alternative conceptual systems for organizing resources, the question has to be asked as to the extent that an alternative allows broad segments of the DwC constituency to organize their metadata in an efficient and conceptually sensible way. If one alternative is more broadly applicable and conceptually clear than another, then that alternative is "better" regardless of the philosophical underpinnings of arguments for or against it. Since DSW seeks to define a broadly-applicable system, a "fully normalized" model (i.e. providing the means to define 1:M relationships in whatever parts of the model where a significant fraction of the constituency needs it) is probably the best.

Classifying resources according to their type, not their use

We believe that there should be a separation between what a resource IS and what we want to use a resource FOR. To use technical terms, we need to separate the "type" of a resource from its fitness of use. A digital image IS a digital image. It might be used FOR documenting that an organism was at a particular location at a particular time, but it could be used as evidence for an Identification, to illustrate a character, as a part of a visual key, as media for an educational presentation, as art, and probably many other things. Much of the confusion about "what is an Occurrence" probably comes from a failure to make this distinction. (See the wiki page TokenIssues for more discussion on this.) A consequence of a fully normalized model is that the nodes of the model each can represent distinct types of resources. When a model is simplified by "collapsing" the parts of it where only 1:1 relationships occur for the user, the resulting nodes become entities that are a combination of the nodes that would have existed in the fully normalized model. For example, in a model where each individual organism is documented in a single occurrence record by a single specimen, the resulting entity is not clearly defined and is assumed to have the properties of all three things. This is probably the cause of the confusion about what a dwc:Occurrence represents - it has traditionally been assumed to have properties of the organism, the documenting occurrence, and the specimen all wrapped into one. Specifying the type of the resource becomes difficult under this circumstance.

Clone this wiki locally