Skip to content

Namespaces and Vocabulary

johnataylor edited this page Feb 24, 2016 · 5 revisions

If we want others to understand what we are saying choosing the right words generally helps. However, at first, the most important thing is to be consistent. Looking back at our Book examples, in addition to identifying and labeling the key entities, the other thing we did was make an effort to use the same property names. These property names form our vocabulary.

Realizing that the ultimate value in our data comes when we combine it with other data on the network, it is natural that we use namespaces for our property names to avoid any unforeseeable future conflicts. Namespaces turn out to be one of the most important simplifying techniques in software, perhaps unfortunately, raw JSON lacks explicit namespaces, however, raw JSON has also typically been served up from a REST endpoint and so in a practical sense has often inherited an implicit namespace. JSON-LD strengthens the informal semantics here significantly and introduces namespaces explicitly in the document. The good news is that it manages to do so in a light non-invasive way that shouldn't cause any JavaScript programmers to trip up.

Something that can really help when we start to share our data is adopting a common vocabulary. Luckily there are some industry standard vocabularies we are free to just pick up and use. There really is no downside to using a standard vocabulary. A great example is the Dublin Core vocabulary that includes a bunch of terms used to describe published artifacts another effort is http://schema.org.

One of the key motivations for adopting a standard vocabulary is that it allows our data to be consumed by agents we had not anticipated. For example, search engines, whether public or private are immediately able to be just that little bit smarter about indexing our dcouments. That can't be a bad thing.

Perhaps a little ironically the example we introduced in our first couple of articles was metadata to describe a published book. In hindsight we could have just used Dublin Core rather than struggling to find our own words. The good news is the mistake is easily corrected. All we would need to do is evaluate a series of rules that express the equivalence.

So this is our priorities when we are looking at handling our legacy data: firstly, get the URIs right and consistent (this might be easy or might not) and then secondly, use a consistent vocabulary (that's certainly easy) and then finally, (and only "finally" because this one is correctable) use an industry standard vocabulary when it fits.

TODO: add example

Clone this wiki locally