Skip to content

Commit 7d9391c

Browse files
committed
Ported another post
1 parent 5822d73 commit 7d9391c

1 file changed

Lines changed: 89 additions & 0 deletions

File tree

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
layout: post
3+
title: "Open Data: license, rights, aggregation, clean interfaces?"
4+
date: 2009-05-18
5+
blogger-link: https://chem-bla-ics.blogspot.com/2009/05/open-data-license-rights-aggregation.html
6+
doi: 10.59350/hvqxm-xnq47
7+
tags: opendata nmrshiftdb rdf dbpedia bio2rdf
8+
---
9+
10+
A [recent post](http://blog.openwetware.org/scienceintheopen/2009/05/15/a-breakthrough-on-data-licensing-for-public-science/) by
11+
[Cameron](http://blog.openwetware.org/scienceintheopen/) on his visit last week with [Nico](http://wwmm.ch.cam.ac.uk/blogs/adams/),
12+
[Peter](http://wwmm.ch.cam.ac.uk/blogs/murrayrust/) and [Jim](http://wwmm.ch.cam.ac.uk/blogs/downing/), discussed
13+
[Open Data](http://en.wikipedia.org/wiki/Open_data) licensing. This lead to an interesting discussion on these matters, and
14+
questions by me on why people care so much about only public domain data (or licensed with
15+
[PDDL](http://www.opendatacommons.org/licenses/pddl/1.0/) or [CC0](http://wiki.creativecommons.org/CC0)).
16+
17+
Open licensing for data has not as much matured as for software, and international law seems to be more confusing about the
18+
issues. I guess that is because data aggregation has been around for way before the computer era. The PDDL and CC0 both try to
19+
overcome this fuzziness. But there is another issue we need to keep in mind. A lot of useful Data was aggregated and made Open
20+
*before* these licenses came about, and use, for example, the [GNU FDL](http://www.gnu.org/copyleft/fdl.html) license, such as
21+
the [NMRShiftDB](http://www.nmrshiftdb.org/).
22+
23+
## Rights
24+
25+
Right now, there are two Open Data camps, much like the BSD-vs-GPL wars in Open Source: one that believes in waiving any rights
26+
on the Data, indicating that facts are free; others that believe that data must be protected to not be eaten by big companies
27+
and lost to the community (e.g. [the WolframAlpha arragnements are suspect](http://friendfeed.com/onssolubility/cf6afd52/should-we-contribute-solubility-data-to)).
28+
29+
Of course, both camps are not that far apart, and both believe Open is important. Interestingly, there are some noteworthy
30+
differences with the Open Source wars. I see parallels between the two, which details an important difference: Open Source has
31+
algorithms (uncopyrightable) and implementations (copyrightable); Open Data has Data (uncopyrightable) and aggregation
32+
(copyrightable). Open Source talks mostly about the implementation, not the algorithm; it's Open Source, not Open Algorithms
33+
after all. In cheminformatics it is even often the case that the algorithms are not even specified and that there only truly
34+
is source.
35+
36+
However, Open Data in title does not make distinction. Data is fairly cheap and acquisition can be automated and computerized;
37+
Aggregation, on the other hand, requires human involvement: curation and thinking about data models, etc. This is where added
38+
value is. Consider an assigned NMR spectrum or the raw data returned from the spectrometer.
39+
40+
It is this added value that people want to protect, not the data itself. I think.
41+
42+
## Aggregation
43+
44+
One important argument that tend to show up when people argument for PDDL and CC0 is that it makes data aggregation easier.
45+
This is most certainly true: if you can do whatever you like with a blob of data, that also means aggregate with any other
46+
blob of data. However, copyleft licenses, like the GNU FDL, require the aggregation to have a compatible license too. It is
47+
the license incompatibilities that make this impossible. Or ... ?
48+
49+
Open Source has matured to such a point that it is fairly clear what the intended behaviour is, regarding derivatives. An
50+
aggregation of software (typically refered to as a distribution) is only a derivative under certain conditions. This makes
51+
it possible to run proprietary software on top of GNU/Linux, which uses the GNU GPL but does not require software to run on
52+
top of it to be GPL too. Unless... unless, not a clear well-defined interface has been used, indicating a strong dependency.
53+
Now, surely, these things have not been confirmed to match actual law in court, but the intentions are clear.
54+
55+
## Clean Data Interfaces?
56+
57+
Now, if we would translate this to Open Data, would there be the equivalent of a clean interface? Can we build a data
58+
distribution with data of various licenses? I think we can! I am not a lawyer and please consider this an invitation
59+
to discuss these matters...
60+
61+
Let's start simlpe... if I put a GNU FDL image in this blog, by linking to it with a open, free, clean HTML interface
62+
(`<img src=""/>`), would that make my blog GNU FDL too? I don't think so. Surely, I would need to list copyright owner,
63+
and actually would be required to put the GNU FDL in my blog too, but hope linking to the license text would suffice too.
64+
(Let's skip fair use at this moment, and assume the use goes beyond fair use). Question: am I not using a clean interface,
65+
and would this not make the image's license no infect my blog?
66+
67+
A more difficult example, consider [rdf.openmolecules.net](http://rdf.openmolecules.net/), which surely aggregated facts,
68+
including data from the NMRShiftDB and [DBPedia](http://dbpedia.org/). I am using a unique identifiers here, the NMRShiftDB
69+
compound ID, and the DBPedia URL, which surely is GNU FDL, and use this to make a `<owl:sameAs>` statement. Again, please do
70+
not consider fair use, which this certainly is. But, let's say I put in some more DBPedia and NMRShiftDB data in this
71+
aggregation. The GNU FDL data on rdf.openmolecules.net would be separate RDF blocks, with proper dc:license, dc:author
72+
annotation. But the block would be part of a larger aggregation. The clean interface here is
73+
[Resource Description Framework](http://en.wikipedia.org/wiki/Resource_description_framework).
74+
75+
This second case does not only affect my rdf.openmolecules.net website, but, for example, [bio2rdf.org](http://bio2rdf.org/)
76+
is also in the same situation and aggregated and distribute DBPedia's GNU FDL data (e.g.
77+
[hexinanose](http://bio2rdf.org/searchns/dbpedia/hexokinase). Does that make the
78+
whole of bio2rdf database GNU FDL. They too use RDF as clean interface.
79+
80+
## Call for Discussion
81+
82+
Despite what one of the two camps like to see, the mere fact of added value when making data aggregations will keep
83+
copyleft license stay around, and instead of trying to convince everyone of the virtues of PDDL- and CC0-like licenses,
84+
we should think about to what extend it really matters.
85+
86+
I can do my data analysis with data sources of various licenses. I can search and retrieve data from various sources
87+
with various licenses. What obstacles are really there that disallow us to do science? Do the data interfaces we have
88+
now not provide enough technical means to address the license incompatibilities? They have in Open Source, why would
89+
that not apply to Open Data too?

0 commit comments

Comments
 (0)