Skip to content
Steve Baskauf edited this page May 28, 2015 · 1 revision

Introduction

Beginning on 2013-04-03, there was an off-list email discussion about the nature of specimens. Most of the participants in the discussion were present at the RCN4GSC planning meeting in Seattle from 2013-03-26 through 2013-03-29, although several were not. This page was created with the implied consent of the participants to archive that conversation. The discussion was not directly related to Darwin-SW and was placed in this wiki simply as a matter of convenience.

Contents

Email thread

2013-04-03T13:04:17-0500 Initial Post: What is a specimen/What is a document? (Steve Baskauf)

-------- Original Message -------- Subject: What is a specimen/What is a document? Date: Wed, 03 Apr 2013 13:04:17 -0500 From: Steve Baskauf [email protected] Organization: Vanderbilt University Dept. of Biological Sciences To: John Deck [email protected], Ramona Walls [email protected], Robert Robbins [email protected], Leonard Krishtalka [email protected], Pelin Yilmaz [email protected], John Wieczorek [email protected], Stan Blum [email protected], Rob Guralnick [email protected], John Deck [email protected], [email protected] CC: Cam Webb [email protected] References: <CAH_VTFUzwMjXNmhQefbPBBUHhkbOp20iGv9607-s-LDL78deRA@mail.gmail.com>

At the RCN4GSC meeting, there was an informal discussion that started with a discussion of the sampling process and drifted to the topic of "what is a living specimen?". The Darwin Core type vocabulary includes dwctype:LivingSpecimen as a term theoretically suitable for use as an object of dwc:basisOfRecord . In that discussion, I brought up the case of the Bicentennial Oak which is part of the Vanderbilt Arboretum. The Vanderbilt Arboretum is a collection (having URI http://biocol.org/urn:lsid:biocol.org:col:35259 ) which contains living specimens in the form of trees. Most of these trees were planted intentionally and that planting could be broadly construed to be part of a material sampling process. However, the Bicentennial Oak (having URI http://bioimages.vanderbilt.edu/vanderbilt/7-314 ) predates the university. I believe that it is a dwctype:LivingSpecimen even though nobody actually did anything to cause it to be there in the arboretum. I asserted that it was different than a tree in a forest (e.g. http://bioimages.vanderbilt.edu/ind-baskauf/51010 ) which is an organism we could document but which I would not call a dwctype:LivingSpecimen. I said that there was something about the process of accessioning the tree, asserting that it was part of the arboretum, and making some kind of commitment that it would remain available for examination in the future that made it a dwctype:LivingSpecimen , things that don't apply to a tree that we just find in the forest. That was about all the further we got in the discussion before we moved on to other things.

I've had this kind of discussion before about what makes a starfish in a jar on a museum collection different from a starfish on a beach. Aside from deadness (which isn't really relevant because the starfish could be alive in a tank), what makes the two different? I usually come across as a nutcase after such discussions. However, I coincidentally was made aware of http://people.ischool.berkeley.edu/~buckland/whatdoc.html in the context of a discussion about "documents" in the context of Audubon Core. The section "The antelope as document" on that webpage gets at exactly the issue I was discussing above. In the case of that section, the term "document" is used rather than "specimen" but I would assert that they are functionally the same. In this section Briet's rules for determining when an object has become a document (think specimen) involve:

  1. intentionality

  2. processing

  3. perceived to be a document (specimen)

Applying these rules to the Bicentennial Oak and the tree in the forest, the Bicentennial Oak is a document/specimen because we intend for it to be one, we have processed it (e.g. it has been included in the records of the arboretum), and we perceive it to be one (we advertise it as part of the arboretum, a thing we don't do for the tree in the forest).

With reference to the Darwin-SW viewpoint, we have invented a class we call dsw:Token . There is no requirement for class membership other than that the dsw:Token should be used as some kind of evidence:

  1. evidence that an organism was at a place at a certain time (i.e. "organism at a place and time" is what DSW considers to be an Occurrence; the evidence is the object of the dsw:hasEvidence property with the Occurrence as the subject)
  2. a physical or information-resource voucher of the organism (the object of dcterms:hasPart or foaf:depiction with the organism as the subject)
  3. evidence for an identification instance (the object of dsw:idBasedOn with the Identification instance as the subject) This is very similar to Briet's concept of a "document". She considers that the antelope in a zoo is a document because "it has become physical evidence being used by those who study it" in a way that it was not when it was running wild on the plains of Africa. If you drill down and look at the RDF of the Bicentennial Oak and the tree in the forest (http://bioimages.vanderbilt.edu/vanderbilt/7-314.rdf and http://bioimages.vanderbilt.edu/ind-baskauf/51010.rdf respectively), you'll notice that the Oak serves as the evidence for the Occurrence of itself:

http://bioimages.vanderbilt.edu/vanderbilt/7-314#occ dsw:hasEvidence http://bioimages.vanderbilt.edu/vanderbilt/7-314

while the tree in the forest does not have itself as the evidence for it's own occurrence. That's because the Bicentennial Oak meets the requirements of Briet while the tree in the forest doesn't.

I realize this is a rather esoteric point, but if we are getting serious about ontology development and thinking about how we intend to use dwc:basisOfRecord with the Darwin Core type vocabulary, we should consider these points. It seems to me that dwc:basisOfRecord has (or should have) a lot in common with dsw:hasEvidence. The object of dwc:basisOfRecord is the evidence upon which the record is based and could be a dwctype:PreservedSpecimen, dwctype:LivingSpecimen, dwctype:MachineObservation, dwctype:MaterialSample, etc.

Steve

2013-04-03T12:15:38-0600 The difference is a set of material processing steps (Rob Guralnick)

Steve --- This is my failure of imagination or cogency, but the difference between a starfish in a jar on a museum collection and a starfish on a beach is a set of material processing steps that happen during collection and curation. That simple, really.

Best, Rob

2013-04-03T18:28:03+0000 Agrees with Rob (Kris Krishtalka)

And Rob beat me to this comment. Agreed.

Kris

2013-04-03T15:34:03-0300 basisOfRecord is the type of evidence (John Wieczorek)

Right, 2 in Briet's Rules. But 2 carries along 1 and 3 anyway, so I fail to see a lack of simplicity.

Key in Steve's assessment, for me, is the realization that basisOfRecord is really the type of evidence.

2013-04-03T13:46:44-0500 Briet's actual rule (Steve Baskauf)

The actual rule as inferred on the webpage is:

"The objects have to be processed: They have to be made into documents"

It does not say that it is a material process nor does it say that it has to be a sampling process. Maybe it can an "accessioning" process. The point is that people intend for it to be a "document" (a thing that documents, a.k.a. specimen) and take action accordingly.

2013-04-03T13:30:50-0500 A car isn't a specimen (Steve Baskauf)

Rob,

Well, it is true that the starfish is a product of material processing steps. But I would say that is not what makes it a specimen. My car is the product of material processing steps when it goes from iron ore to a car. But that doesn't make it a specimen.

I'm interested in what makes something a specimen in the broadest sense. Your material processing steps definition doesn't work for a specimen which has no material processing steps, like the Bicentennial Oak. You could argue that you don't consider the Bicentennial Oak to be a specimen for that reason, but then the Arboretum would have to distinguish between it's accessioned objects that got planted by people and the ones that weren't. I think they would be happier to think that all of the trees in their collection were specimens regardless of whether somebody planted them or not.

Steve

2013-04-03T19:57:21+0000 A specimen is processed and accessioned (Kris Krishtalka)

The car-iron ore analogy is not fitting here, as it starts as iron ore, which through processing is manufactured into at least a car shell. The starfish is not being manufactured into anything. It is the same starfish on the beach as it is in the jar.

In practice of course, a specimen becomes one when it crosses the threshold of the repository. On the highway, it's roadkill. In the door of the museum, it's a specimen, because 1. between the door and the jar the material processing occurs; and (2) the repository, through that material processing, which includes accessioning, takes legal possession of the object.

Kris

2013-04-03T17:13:54-0300 OK, complicated (John Wieczorek)

OK, so it's more complicated in (someone's) reality, not less. ;-)

2013-04-03T14:44:21-0700 Questions about many cases; the role of data collection; definitions according to biologists and computer scientists (Robert J Robbins)

Let's start with the Bicentennial Oak. It was there before the arboretum. The arboretum "grew up" around the oak. It seems to me that what makes it a "specimen" is that someone decided to start paying attention to it, either by labelling it in the field, or noting its existence in records, or in some other way.

Chances are, there are also some fungal assemblages in the arboretum, that like the oak, were there before the arboretum, but which are not specimens because no one is paying attention to them.

Let's jump over the starfish and consider some other organisms. We all seem to agree that a starfish becomes a specimen when I pick it up, take it back to the museum, and do whatever accessioning stuff needs to be done.

What if I leave it in the field, but tag it or otherwise label it, then periodically collect data about it. Specimen or wild? Is the physical tagging somehow important?

How about a bird that I have banded and released? Specimen, or not?

Researchers who work on Orcas can identify individual whales by their markings. Often an individual whale is studied for years without ever being touched physically -- observations only. Are these observed Orcas specimens or not? Suppose we take a tissue sample from one observed whale. Does that have any effect on the specimen status of the tissue sample? What about of the whale?

Suppose a whale dies and the carcass is recovered. Surely then it is a specimen. Does that retroactively affects its specimen status in any way?

How about a bird that is observed and counted in a christmas bird count, but no other data are available. Specimen or not? Suppose during a christmas bird count a bird flies into a window, so I pick up the body and later turn it into a study skin. Specimen? What if I collect the body only verify the information on the bird count (it's some warbler and I can confirm the species only with the bird in hand) and then I discard the body. Specimen (deaccessioned?) or not?

I get the feeling from what has been discussed earlier that specimenhood seems to lie in the relationship between an individual organism and the collection of data about that individual organism.

But what if the organism is not an individual, such as a lichen. With some lichens, I can "capture an individual" (a composite with one fungal component and one algal component). Then I can treat it to kill off the algae and I can then induce the surviving fungal component to merge with a different algal component? Under this circumstance, how many specimens are involved?

What about lichens that turn out to be more complex than just one fungus and one algae?

What about a biofilm? I scrape it off the rock and put in a jar. Specimen?


I think that Steve has raised some very subtle, yet very important ontological issues here -- some that cut to the heart of new thinking about biodiversity.

If our very fundamental ontological concepts in biodiversity and collections management end up being anchored on the ontological concept of "individual organism" and if "individual organism" turns out to be a "useful approximation", not a "fundamental construct in objective reality", then there is some serious ontological work ahead of us...


Years ago when I was involved in designing and deploying an institutional HR database, there was a lot of concern about defining an employee, vs a patient, vs a board member, vs a volunteer, vs ...

As we looked at how other institutions had handled these distinctions, we found one place that just had a collection of "people of interest to our organization". That became (more or less) the root class for all classes of people. Then all of the classes that were actually more important to the functioning of the institution became daughter classes of this master class.


One of the challenges here is that, speaking broadly, biologists tend to be interested in central tendencies whereas computer scientists are interested in boundary properties. An information system to support biological research must handle central tendencies (i.e., typical cases) well enough (and conceptually simply enough) to seem useful to a typical biologist. It must also handle boundary conditions (outliers, weird situations, special cases) with sufficient robustness to serve as a functioning information system.

When the genome data base was built at Hopkins as part of the human genome project, to make it work like biologists we had to build in way more complexity and subtlety than any biologist thought necessary (or even reasonable). In some cases, we could never get enough complexity into it to make it generically useful. One would think that a database of human genes would need some good, comprehensive, exhaustive definition of "gene" in order to work.

However, I consistently ran into situations where I would ask a biologist for a definition of a gene, then apply that definition to a complex region of some genome and say to the biologist that "according to your definition of gene, there are nine genes in that region." At this point the biologist might say, "No, there are only four genes there." Even if I showed that the definition REQUIRED that one say that there were nine genes present, I would still get a response to the effect that (a) there were only four genes present, (b) the definition was fine as is (meaning that it satisfied a central tendency definition of gene), and (c) I should quit being so obsessed over details that did not affect one's ability to do biology.

2013-04-03T15:14:38-0700 How we define classes (Ramona Walls)

It doesn't matter semantically what you call a class (although it does matter to the people using the ontology), but it matters a lot how you define it. If you want the class "specimen" to include live plants in an arboretum, then define it so it encompasses them. The key is to think of the characteristics that are necessary and sufficient for your class, then include them in the logical definition (as well as possible). So far in the BCO, we have only defined a class "material sample", and it specifies that some material sampling process took part (which includes selection, extraction, and submission). Clearly this does not cover a tree in an arboretum. However, we could make a broader class called something like "institutional specimen" that would cover material samples plus live specimens. This is straightforward, once we agree on the criteria for an institutional specimen (and that doesn't seem very hard).

We don't currently have any restrictions on what kind of material goes into a material sample, so it could be an organism, a collection of organisms, or a rock.

Ramona

2013-04-03T15:15:52-0700 Roles, processes, and samples sensu BCO/OBI (John Deck)

Applying BCO/OBI to this issue....

BCO/OBI (as subprojects of BFO) models material samples, processes, and roles somewhat separately from each other. Each of these concepts live in different parts of the BFO hierarchy. This gives some flexibility in assigning roles independent from sampling process or what type of entity it is. Thus, the important distinction here is between "role" and "material sample". The definition of "role" in OBI is:

"A realizable entity the manifestation of which brings about some result or end that is not essential to a continuant in virtue of the kind of thing that it is"

E.g. the tree in the arboretum can have multiple roles: a "living specimen" (as Steve suggests), a "preserved specimen" (if a leaf was put in a herbarium back in 1900), and even a "cultural artifact" (e.g. suppose John Brown was hung on this particular tree as well), in addition to being just a tree. I think separating "roles" and "processes" here is especially important in considering how we want to go about integrating MIxS and DwC as certainly the roles we think about in assigning to objects will vary depending on the type of investigation one is intending. And the point being, the "objects" in question need to be able to have multiple "roles" as well as multiple types of sampling "processes". Examples of processes that can act on this tree are: observation (information stored in someones head or written down), processes that yield an information artifact of the entity (e.g. photographs), and physical extraction (taking a leaf from the tree).

John

2013-04-04T06:34 +1 for Ramona's assessment (Stan Blum)

+1 for Ramona's assessment. I'm going to try to follow you down that road, John D.

-Stan

From: Ramona Walls [email protected]

John, I think you are ready to take over editing of the BCO!

Ramona

2013-04-04T09:15:57+0200 MIMARKS perspective (Pelin Yilmaz)

I wanted to share our take on specimen in MIxS, but reading through this whole thread I realized we haven't given as much thought into the description as we should have.

Nevertheless, the MIMARKS checklist (the MIxS that covers single gene amplification metadata), distinguishes between two investigation types; survey and specimen. MIMARKS-survey is supposed to apply to bulk sampling and sequencing type studies of single genes (the usual study type in microbial ecology), while we defined MIMARKS-specimen for marker gene sequences obtained from any material identifiable by means of specimens. Here, I think the idea behind was to be apply the MIMARKS-specimen checklist to pure cultures of microbial organisms.

I realize we are a bit vague, but I'm looking forward to this group resolving what a specimen is, so that we can apply the description to MIMARKS.

Pelin

2013-04-04T15:56:20-0500 Process/resource creation event (Steve Baskauf)

John Deck wrote:

multiple types of sampling "processes". Examples of processes that can act on this tree are: observation (information stored in someones head or written down), processes that yield an information artifact of the entity (e.g. photographs), and physical extraction (taking a leaf from the tree). Ooo! I'm loving this. Replace "process" with "resource creation event" and you get Figs. 2 and 6 from Biodiversity Informatics, 7, 2010, pp. 17-44 (https://journals.ku.edu/index.php/jbi/article/view/3664 ).

Note that wasn't in original email: I've created another wiki page that displays the images that 
were figures in the paper so that they can be examined without having to download the paper.  
They may be useful as starting points for further discussion.

http://code.google.com/p/darwin-sw/wiki/ProcessResourceDiagrams

However, I disagree when you say "the tree in the arboretum can have multiple roles: a "living specimen" (as Steve suggests), a "preserved specimen" (if a leaf was put in a herbarium back in 1900)". The leaf is NOT just the tree having a different role. The leaf has a dcterms:hasPart relationship with the tree. It isn't the tree.

Steve

2013-04-04T13:59:52-0700 Roles of the tree (Ramona Walls)

True, in that case, the role of the tree was as the source of the specimen rather than as a preserved specimen itself. However, John was right on when he said that a tree could have many roles.

2013-04-04T14:14:37-0700 Roles across transitive relationships (John Deck)

However, I disagree when you say "the tree in the arboretum can have multiple roles: a "living specimen" (as Steve suggests), a "preserved specimen" (if a leaf was put in a herbarium back in 1900)". The leaf is NOT just the tree having a different role. The leaf has a dcterms:hasPart relationship with the tree. It isn't the tree.

As Ramona indicated this wasn't necessarily the point of the statements (and maybe not the best example!). However, another way to think about this... suppose we used ro:part_of to describe the relationship between tree and leaf here, ro:part_of being transitive (see http://obofoundry.org/ro/). Seems to me that we can infer roles applying across transitive relationships. Logically, this makes sense to me, and is the way BiSciCol would handle it. OTOH, you indicated dcterms:hasPart, and this is NOT declared as transitive and hence the role would not apply as you suggested. Perhaps the difference comes from the communities that created the relationship terms (biologists vs. librarians) and probably worth a separate thread to cover all the implications of the types of relationship operators applied in different situations.

2013-04-04T14:14:37-0700 Biologists vs. librarians (Robert Robbins)

With regard to the librarian vs biologist distinction:

Librarians deal with the defined attributes of human-created artifacts.

Biologists (and scientists in general) deal with the discovered attributes of real-world (i.e., not human-created) objects.

When it comes to building ontologies, this distinction is potentially very significant.

2013-04-04T22:07:40-0500 Distinction between a generic organism and a specimen (Steve Baskauf)

I want to thank everyone who's contributed to this thread so far. I think that the questions asked and points made are very important and I would like to archive it somewhere (either the DSW or TDWG RDF sites) so that what was said isn't lost. However, since this was not on a public list, I don't want to post it if anyone objects. Let me know if you would prefer me not to include your contribution.

I'm actually rather excited about this discussion because it is digging into important issues that go beyond what the collections community has typically concerned itself with (specimens on a pin, in a jar, or glued to paper) and includes situations where we collect data on organisms that aren't specimens. Whatever system for recording metadata that we devise needs to be able to handle most, if not all, of the situations described in Robert's email below.

I want to take some pieces of what several people have said and apply them to the questions in Robert's email, which to a large extent deal with the question of making the distinction between specimens and generic organisms. In Kris' email, he distinguished between roadkill and a specimen by noting two types of processing that happen to a specimen and that don't happen to roadkill: 1. material processing and 2. taking legal possession of the object. In a general way, I would agree with what he said for preserved specimens. They have to be subjected to some kind of material processing step that preserves them; that's what makes them "preserved" specimens. As for the second step, I think I would prefer to say "taking responsibility for" the object, rather than "taking legal possession of" the object. I think that for most specimens both would be true, but I can imagine there might be situations where an institution takes responsibility for a specimen without asserting a legal right of possession. But I'm not too concerned about the distinction at this point.

If we consider the broader case of a "biological specimen" superclass which includes both living and preserved specimens, and if we allow for things like the Bicentennial Oak to be living specimens without any material processing step occurring, then it's really the second step which gives something "specimenhood". John's point about entities being capable of having multiple roles is actually critical here. If we ask ourselves why we want to have specimens, it is not because we require them to "be" a particular kind of thing (well, I guess we are restricting specimens to "be" material objects), but because we expect them to be capable of fulfilling certain kinds of roles. In particular, we expect them to be able to serve as evidence that a member of some taxon occurred at a certain time and place, serve as a source of tissue which was known to have come from a particular organism, be used for taxonomic identification of the source population, etc. (In any given instance, they may not actually fulfill all of these roles, but we would expect them to be ABLE to fulfill at least one of them.) We cannot expect them to fulfill those roles if we we have not taken responsibility for/taken legal possession of them. If a user of our collection desires for a specimen to fulfill one of those roles, we need to be able to ensure that the user can access that specimen at will, and to be able to do that we have to exert some degree of control over it.

If you apply this criterion to the examples below, then the distinction between a generic organism and a specimen becomes clear in most instances. The Bicentennial Oak is a biological specimen because Vanderbilt has taken responsibility for keeping track of where it is, asserting control over it by not letting somebody sell or log it, and making it accessible to people who might want to sample or examine it for any of the purposes I listed above. The fungal assemblage is not being tracked or protected, so it isn't a specimen. The tagged starfish on the reef isn't a specimen because nobody is making sure that it doesn't get eaten or collected by a tourist. The bird banded and released is not a specimen because it can't be produced on demand. The observed bird is a specimen if the study skin is put in a museum where its existence is controlled, but not if the body is discarded. I don't think that just collecting information about the organisms is enough to make them specimens, even if the data collection is repeated for years.

Ramona says that the key to defining a class is to think of the characteristics that are necessary and sufficient for your class to encompass all of the instances that belong in it. It seems to me that the critical characteristic for specimens is the aspect of a material object that someone has "taken responsibility for/taken legal possession of it for the purpose of allowing access to it to allow it to serve in some sort of scientific role" such as the ones I enumerated above. I say that because all of the things that seem like they should be specimens (to me) would be included under that definition and things that I don't think should be specimens are NOT included. Criteria such as "subjected to labeling/tagging" or "being observed or measured" don't seem to work for me. In these examples, I've concerned myself with biological specimens by defining the kinds of roles that biologists want their specimens to play. If we broadened the "specimen" class to include non-biological specimens (e.g. rocks in a museum), then the "taken responsibility for/taken legal possession of it for the purpose of allowing access to it to allow it to serve in some sort of scientific role" still seems to work, but there might be a different set of specific roles we expect it to fulfill.

I've gone back and looked at my original email about "documents" and "tokens". If one considers "documents" to broadly include things that "document" something, and "tokens" to be things that serve as "evidence" of some sort (i.e. sensu Darwin-SW), then they are very similar, if not the same because they both serve in a sort of vouchering role. However, I'm thinking that I was probably wrong to equate them with specimens. I think that tokens/documents have possible roles that overlap with specimens, because several common roles that we want specimens to play are evidential. But I could imagine that somebody might intend for a biological specimen to serve in the role as a source of material for some qualitative analysis which is unrelated to serving as any kind of evidence of presence or taxonomic identity. In addition, I think that specimens probably need to be physical objects, whereas "tokens" could be digital entities such as sensor logs or images. (The actual Briet rules for determining if an object was a document included "materiality", but I left that out of my post because her manifesto was published in 1951 and I don't think people were really thinking about digital documents at that time.)

If I get around to it, I think a Venn diagram would be very useful to go with this.

Steve

2013-04-05T10:52:01-0500 Specimen-hood (Andrea Thomer [email protected])

Hey Steve et al,

Rob forwarded me your documents/specimens thread once it got to talk of Briet and the distinction between biologists' and librarians' was of viewing the world -- I'm really interested in this thread, have written on similar topics in some of my doctoral coursework here at Illinois, and consequently wanted to add in my $.02 on a couple points:

  1. I really like the idea that specimens are collected objects that someone has taken some sort of responsibility for -- this jives with a lot of work on cultural heritage objects I've read -- but I would push it one step further, or perhaps generalize the statement a bit more. Specimens are collected, natural objects that have undergone a) material processing (physical preservation processes), and b) social processing (the assumption of legal responsibility via accessioning processes; the specimens' use in different scientific studies).

  2. While Bob brings up a good point that librarians (and cultural heritage curators for that matter) deal with defined attributes of human-created objects, we can't forget that specimens are not entirely 'natural' -- those social processes described either confer or derive human-created or -interpreted attributes to these natural objects.

  3. Thus, one of the fundamental tensions underlying specimen-hood is between their human-conferred vs. 'naturally-occurring' attributes. It's this tension -- and interpretive flexibility -- that allows their use in multiple settings (whether it's in different scientific applications or even in completely non-scientific contexts (see Star and Griesemer's boundary objects paper for more on this -- which I almost dare not muddy the discussion with, because boundary objects tend to take over whatever discourse they touch... but it's relevant).

Does this make sense? Or am I just unnecessarily re-hashing your guys' discussion plus, like, Latour? Either way -- I'm very, very interested in all this -- if you all wind up working toward a paper or anything along this vein I'd love to help out.

Best, Andrea

2013-04-05T11:27:29-0500 Specimen-like things that aren't specimens (Steve Baskauf)

This stuff has been rolling around in my head and that has caused me to remember something else that is related. In 2008, Bruce Kirchoff and I wrote a paper (Vulpia 7:16-30, http://www.cals.ncsu.edu/plantbiology/ncsc/vulpia/pdf/Baskauf_&_Kirchoff_Digital_Plant_Images.pdf ) whose title was "Digital Plant Images as Specimens...". Aspects of this paper are what propelled me into getting involved first in SERNEC, then TDWG. One of the points of that paper was that if properly taken and archived, sets of digital images of live plants could serve many (but not all) of the functions of specimens. At some point, I was actually calling such sets "specimens", although I probably wouldn't do that now.

In the context of this current discussion, I can frame the viewpoint that we took in that paper by saying that there are a number of roles that biological specimens (both preserved and living) can take on which can also be taken by sets of images. For example, they can be used in most of the evidential roles I mentioned below. If someone takes responsibility for making sure that they are preserved, findable, and accessible (as I've done in Bioimages vs. a random plant picture on somebody's Facebook page), then they have the same characteristic of "taken responsibility for/taken legal possession of it for the purpose of allowing access to it to allow it to serve in some sort of scientific role" as I've claimed is critical for traditional specimens. What they DON'T have is physicalness. You can't extract DNA from them or conduct a qualitative analysis on their atoms. So what sets of archived live plant images are is something that shares many roles with specimens, but which lacks the particular roles that can only be filled by physical materials. This class of "specimen-like but not physical specimen" things doesn't have a name (at least yet).

Another thing to put on the Venn diagram...

Steve

2013-04-05T09:51:57-0700 Specimen as used in the biomedical community (OBI) (Ramona Walls)

This is in response to Andrea's comments (which I think there is general agreement on) that a specimen undergoes some "social processing (the assumption of legal responsibility via accessioning processes; the specimens' use in different scientific studies).

I don't know if those two point (legal responsibility and use in scientific study) are both requirements or if it could be one or the other. The first is much more restrictive than the second.

The word specimen as is used in the biomedical community (see the OBI), is broad in the sense that it does not to include an assumption of legal responsibility for maintenance of the specimen (a biomedical specimen may be destroyed), but it does include the use of the specimen in a scientific study (what they call an investigation). Because material entity is a sub-class of specimen, it inherits these properties, and it does not have any additional restrictions that the entity be preserved. Therefor, if we want a class for a specimen for which some institution has taken legal responsibility, I think we will need to create a more specific class with a more descriptive name, like "museum specimen" or "institutional specimen".

I know I said that the names don't matter for reasoning, but they are in fact something that some users will fixate on and which can cause a lot of problems in getting people to adopt an ontology, so it is probably best not to get too attached to the label "specimen" in this case.

On the positive side, I think we are talking about creating a new and very useful class that is not yet in OBI, and thereby making a valuable contribution to biodiversity informatics.

Ramona

2013-04-05T10:14:58-0700 Cases where there are organisms but no specimens (Robert J Robbins)

Couple of thoughts:

  1. I am very supportive of getting these ideas out to a wider audience, so enthusiastically support the idea of posting some of the discussion. We might also want to consider rolling up some of these ideas into the paper that will come out of our meeting.

  2. On a bigger picture viewpoint: the original stated goal of the RCN4GSC award to bring together genomics and other communities, including biodiversity, collections management, community ecology, long term ecological research, etc. Early on, we decided to focus on MIxS and DwC because these were established standards across which there was an obvious need for harmonization. However, now that this work is well underway, we should not lose sight of the bigger picture which includes, essentially, all of biology, but with special emphasis on biodiversity, community ecology, etc.

At a fairly general level, what we are trying to do is develop approaches for managing data and metadata so that relater information, that might be stored in different repositories and that might have been collected for different purposes can be "reunited" to facilitate further investigations or analyses.

A lot of our discussion below has focussed on "specimen", as a unit in a collection, because we have been emphasizing collection-management information systems. This puts the idea of "specimen" very much in the middle of conceptual analyses. But, a person engaged in field work on community ecology has many of the same needs as collections-focussed research (e.g., the need to accurately denote the location and conditions under which the data collection, as opposed to specimen collection, was done). Also, as part of a trap, release, and recapture study, blood samples might be taken for storage and later analysis. Blood samples might also be taken as an organism is being accessioned into becoming a specimen in a collection.

From the point of view of, say, a population geneticist, access to DNA sequences from organisms localized accurately in space and time would be equally valuable, whether or not the organism had ever been accessioned into specimen status. This suggests that however we end up representing the attachment of sequence data to an organismal occurrence at a particular place and time, it would be a mistake to make the specimen-hood of the organism a critical part of that representation. Biologists wishing to look at, say, allele occurrence across different environments (perhaps to produce isoclines of allele frequency) would be justifiably annoyed if there were significantly different approaches to managing such information, depending upon whether the sequences were generated from "specimens" that were later accessioned into a collection or from "subjects" that were released immediately after the blood draw.

As Steve mentioned at the meeting, while it is a mistake to try to solve all problems all at once, it is also important to keep an eye on future problems, not far over the horizon, to make sure that today's solutions do not impede tomorrow's efforts. In that context, I suggest that we should take care to make sure that as we develop our efforts to allow the integration of sequence data with traditional data on organisms collected in the wild to become specimens that we do so in a way that will not impede other efforts to allow the integration of sequence data with other data on organisms that are studied or observed in the wild without ever becoming specimens in a collection. (or those studied in the lab)

Since any organism, whether studied in the wild or in captivity, might someday become a specimen (but most won't), that suggests that "specimen" is a subset of "organism of interest". Since blood draws and other tissue extraction that could lead to sequencing can be taken from any "organism of interest", not just from specimens, it is important that the underlying data models reflect that appropriately.

Furthermore, since an "organism of interest" could be a lichen, there is no requirement that an "organism of interest" need be "taxonomically homogeneous" (if taxonomic homogeneity is taken to imply genotypic homogeneity).

2013-04-05T15:39:46-0700 Material target of observation (Ramona Walls)

In the BCO there is a term for "material target of observation" which is fairly broad, but could cover the kinds of situations you describe where an organism (or other material entity) is observed but not collected. We are really just starting out with the BCO, but I think everyone's intention has been to cover not only entities that are collected, but also those that are observed, measured, photographed, etc.

Ramona

2013-04-06T20:29:51-0600 samples/inferred samples (Robert Guralnick)

Hi all --- I have been following this thread, enjoying it tremendously. I agree with Bob Robbins that the basis of these discussions form a basis for something akin to a published meeting report/paper. It certainly is the clearest exposition of how to think about specimens I have heard, and given that I curate a collection of those things, it is not only philosophically (or ontologically) helpful, but also rather practically useful.

I was thinking about Bob's comments regarding needs of other communities not simple those interested in "specimens". I resonate with that comment but wonder if we need to think through different ways to models those needs on a case by case basis as opposed to something like an "organism of interest" (although I doubt that was Bob's intent).

I have been thinking about this because one of the outcomes of the meeting was to submit a Darwin Core ticket to include a new set of terms that represents "samples". This has actually been submitted, and I am appending it below for your full perusal. It represents something of a step forward, in my opinion. However, I realize that the difference between a soil sample containing a lot of critters and genes is really different from a blood sample or tissue sample taken from one individual.

I have also been thinking about those cases such as gene sequences etc. and, although I agree with Bob that a geneticist is not necessarily deeply interested in the fate of the specimens, necessarily (thereby perhaps itself defining something important and related back to "roles" and "intentionality"), but there is still an inferred specimen in those kinds of analyses, and its true for metagenomic sequencing as well. Its even still true for those doing functional analyses where the genes are more of interest given some environmental characteristic (e.g. metal resistance genes in contaminated soils).

Anyway, my main reason for writing was to pass along the Darwin Core ticket for your perusal. Feedback welcome and hopefully this will go to the TDWG's Technical Architecture Group and might even get approved!

Best, Rob

Included next was the proposal for DwC term additions, not pasted in here

The URL for the DwC term addition proposal is http://code.google.com/p/darwincore/issues/detail?id=167

2013-04-08T12:47:43-0500 What things are/what things do and the dwc:Individual class proposal (Steve Baskauf)

I had some thoughts which came to mind earlier but which I didn't have time to put into writing. This thread has helped to clarify in my mind some of the reasons why the discussion from Oct-Nov 2010 about my proposal for the dwc:Individual class was so long and painful (see http://code.google.com/p/darwin-sw/wiki/ClassIndividual for background).

I think that when people try to define a class, they usually believe that their task is to describe what something "is". However, as I think Ramona has pointed out, it is quite possible to define a class according to what it "does", i.e. the roles that class members can or must have to be included in it. I've pondered this for a while and although I think that it is fine to be careful to distinguish between describing what something "is" and the roles that thing can have, I think that to some extent this is an artificial distinction because very often our ideas about what something "is" are actually defined by what we want that thing to do. For example, if we want to define a class we call "Fathers", we could say that the members must "be" males and they must have children. But what does it mean to "be" a male? Again one could define that by listing the kinds of things that males must be able to do. Even something as basic as "material object", which could be defined as things that have mass, could be related to what those things can "do". In that case, things that have mass have inertia, which means that what they "do" is resist changes to their motion; we could say that they have the role of resisting acceleration. In many cases, the sets of roles that we have in mind about what we expect classes of things to "do" are so firmly entrenched in our brains that we loose sight of the fact that we actually (perhaps subconsciously) have defined what they "are" by the roles we want them to play. I'm sure that I'm going to get into trouble here for playing so fast and loose with "roles" and "occurrents", but hopefully you can get the gist of what I'm trying to say.

In the case of the dwc:Individual class proposal (i.e. a proposal to add a class for organism-like things), many people in the TDWG community (the museum people) couldn't see why that class was needed. If their organisms were dead and on a pin or in a jar, the organisms could still serve all of the purposes that they cared about. However, a purpose that I cared about was the one that was implied in the DwC term dwc:individualID (http://rs.tdwg.org/dwc/terms/#individualID ), i.e. something that was capable of being resampled or monitored.

I'm going to engage in a brief aside to define "resampled or monitored". In this context, what I'm talking about is NOT synonymous with a material sampling process, although a material sampling process could be related to it. What I'm talking about is a documented occurrence of an organism or small group of organisms at a particular location at a particular time. Hilmar Lapp succinctly defined it as "a tuple of (Individual, Event), and has properties for referring to the Individual and the Event" (http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001782.html with much more of the discussion summarized at http://code.google.com/p/darwin-sw/wiki/ClassOccurrence ) where the Individual is an organism or small group of organisms. An instance of this is what Cam and I assert to be dwc:Occurrence in DSW, although it was pretty clear at the RCN4GSC meeting that this is NOT what other people (notably Tuco) consider dwc:Occurrence to be. So it probably needs some other name. For the moment I'll call it a dsw:Occurrence to differentiate it from the more nebulous dwc:Occurrence. A dsw:Occurrence instance is what the agent who is the object of dwc:recordedBy creates. It is a thing that is instantiated when someone observes a particular orca and notes the time and location (makes an observation?). It is the thing that is instantiated when a collector records the time and place where she collected a preserved specimen (which might be all or only part of the organism). It is the thing that is instantiated when I take an image of an organism and record the time and place where I did it, with the intention that the image should serve as evidence that it happened (as opposed to somebody taking a picture of a pretty flower for their Facebook page).

So returning from the aside to the issue of the dwc:Individual class proposal. There were three things that I wanted an instance of a dwc:Individual to "do":

  • to be capable of being monitored multiple times (i.e. to be associated with multiple dsw:Occurrence instances)
  • to be the anchor point for one or more dwc:Identification instances.
  • to be the source of material or electronic representations of itself (e.g. material samples a.k.a. preserved specimens and various forms of electronic depictions a.k.a. images, sounds, etc.)

When I first started talking about the proposal, I had in mind individual biological organisms because most of the time those would be the things that people would monitor, identify, and sample. But in practice, people carry out those kinds of processes on entities that are actually small populations of organisms (e.g. clumps of moss) or entities where it is not clear where one organism ends and another begins (e.g. coral reef, fungal nets). I didn't really care about the distinction between those kinds of things and actual individual biological organisms. As long as the entity could be monitored repeatedly, have one or more identification instances, and be "sampled" in some way, I was fine with including it in the definition of dwc:Individual. Even things like packs of wolves and tissue cultures probably could fit in the definition if one was open minded enough.

But then people (OK, it was mostly Rich Pyle) wanted the class to include more kinds of things. Why couldn't dwc:Individuals be dead? (I said "it's pointless to resample them".) Why couldn't they be vials of mixed species plankton sitting on a shelf? (I said "you can't hang unambiguous Identification instances on them.") Why couldn't they be whole ecosystems? When the discussion was framed about what a dwc:Individual "was", it was very difficult to explain why stretching the boundaries in some ways (allowing wolf packs) was OK, but in other ways (plankton trawls) was not. I can see now that the problem boils down to the the fact that I wanted to define the class based on the roles that I wanted its instances to be able to take on, and not actually on an abstract idea about what an Individual "was". I complained that if the definition of Individual were expanded too far, it would make it impossible to do what I wanted the class to "do".

At a certain point, I think I came right out and said that a dwc:Individual was just the node that connected dwc:Identification instances with [dsw:]Observation instances, and which also connects to evidential sorts of things derived from it. Kevin Richards said it in a different way in http://lists.tdwg.org/pipermail/tdwg-content/2010-November/001956.html "the Individual more closely resembles a many-to-many joining table in a database (ie doesn't serve much use other than connecting two tables/classes together". Both of these ways of defining an Individual reflected what I wanted dwc:Individual instances to "do" more than what they were. Kevin even went so far as to say that my concept of Individuals didn't "normally relate to a 'real world' type of object" at all, although I think that there are "real world" objects that fit into the roles I've laid out for dwc:Individual. But in any case, these definitions were really based on the roles of Individuals I wanted them to play more than the kind of thing they were.

Anyway, the reason why I've gone on about this is not to convince anyone that the DSW view of "IndividualOrganism" (which is what we decided to call the proposed dwc:Individual in order to disambiguate it from Individual sensu OWL) is correct. The point is that I don't think that it is going to be productive to set out to define various classes for organisms, populations, tissue cultures, specimens, ecosystems, etc. unless we also set out the roles that we expect these things to play. What do we want them to "do"? I think that this is why in the discussion of BFO and BCO at the meeting I kept asking the question of what we were actually going to "do" with the classes we define when we are done. I think that there is real potential for using the language of BFO/BCO to describe clearly and logically the relationships among the classes that we think we need to describe the biodiversity informatics universe. But I don't think that the process will be productive unless we have a clear idea about what we expect these classes to "do" when we are done because those ideas must inform the way we describe them. I think we have a real start on this process because of the questions that have been asked in this thread and the use-cases that were raised in the meeting. But I would feel pretty hesitant about starting to write a paper without feeling a lot more clear about what we would be trying to accomplish with it.

Below I've pasted in some bits of previous emails relevant to what I said above.

Steve


Bit number 1:

Robert Guralnick wrote:

I was thinking about Bob's comments regarding needs of other communities not simple those interested in "specimens". I resonate with that comment but wonder if we need to think through different ways to models those needs on a case by case basis as opposed to something like an "organism of interest" (although I doubt that was Bob's intent).


Bit number 2:

On Fri, Apr 5, 2013 at 10:14 AM, Robert J Robbins [email protected] wrote:

A lot of our discussion below has focussed on "specimen", as a unit in a collection, because we have been emphasizing collection-management information systems. This puts the idea of "specimen" very much in the middle of conceptual analyses. But, a person engaged in field work on community ecology has many of the same needs as collections-focussed research (e.g., the need to accurately denote the location and conditions under which the data collection, as opposed to specimen collection, was done). Also, as part of a trap, release, and recapture study, blood samples might be taken for storage and later analysis. Blood samples might also be taken as an organism is being accessioned into becoming a specimen in a collection.

From the point of view of, say, a population geneticist, access to DNA sequences from organisms localized accurately in space and time would be equally valuable, whether or not the organism had ever been accessioned into specimen status. This suggests that however we end up representing the attachment of sequence data to an organismal occurrence at a particular place and time, it would be a mistake to make the specimen-hood of the organism a critical part of that representation. Biologists wishing to look at, say, allele occurrence across different environments (perhaps to produce isoclines of allele frequency) would be justifiably annoyed if there were significantly different approaches to managing such information, depending upon whether the sequences were generated from "specimens" that were later accessioned into a collection or from "subjects" that were released immediately after the blood draw.

2013-04-08T15:09:10-0300 Caution moving forward (John Wieczorek)

This is a really informative realization, both about the past discussion in tdwg-content and in our current challenges moving forward. I would like to propose a little caution moving forward, and that is to limit our modeling to what we already know we have to do rather than commit now to modeling what we think might be interesting some day. If we get what we know right first. we'll be better off as we move forward.

Clone this wiki locally