Skip to content

Commit 0e09b6e

Browse files
committed
Ported from https://github.com/egonw/compact-ids-in-reports (originally planned for FAIR CookBook)
1 parent cdede43 commit 0e09b6e

File tree

1 file changed

+140
-0
lines changed

1 file changed

+140
-0
lines changed
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
layout: post
3+
title: "Using compact identifiers in project reports"
4+
date: 2026-03-29
5+
doi: 10.59350/re9j2-hk972
6+
tags: identifier semweb cito cito:usesMethodIn,includesQuotationFrom:10.1038/sdata.2018.29
7+
cito:obtainsBackgroundFrom:10.1007/s12021-015-9284-3 cito:usesMethodIn:10.1093/bioinformatics/btaa864
8+
cito:usesMethodIn:10.1038/s41597-022-01807-3 cito:obtainsBackgroundFrom:10.1038/sdata.2016.18
9+
cito:includesQuotationFrom:10.1186/s13321-022-00614-7 cito:includesQuotationFrom:10.1186/s13321-020-00448-1
10+
grants:
11+
- grant:
12+
title: "FAIR4ChemNL: Accelerating the adoption of universal data standards in chemistry"
13+
acronym: "FAIR4ChemNL"
14+
id: doi:10.61686/XVYQV45374
15+
funder:
16+
name: "Dutch Research Council"
17+
ror: 04jsz6e67
18+
#comments:
19+
# host: social.edu.nl
20+
# username: egonw
21+
# id: ...
22+
---
23+
24+
This document describes how you can improve the FAIR-ness of your project report by using
25+
compact identifiers. Of course, it can be applied to any other document too, and has been used
26+
in, for example, journal articles and online documentation already.
27+
28+
Compact identifiers find a balance between compactness in writing and being a persistent, unique,
29+
and global identifier. It "is a string constructed by concatenating a namespace prefix, a separating colon,
30+
and a locally unique identifier (LUI)" (doi:[10.1038/sdata.2018.29](https://doi.org/10.1038/sdata.2018.29)).
31+
For example, for proteins it can represent the PDB structure [2gc4](https://bioregistry.io/pdb:2gc4) as
32+
*pdb:2gc4*. There is a clear similarity with the SciCrunch [Research Resource Identifiers](https://rrid.site/)
33+
(RRIDs) as used by several journals, like
34+
[eLife](https://elifesciences.org/inside-elife/ff683ecc/rrids-how-did-we-get-here-and-where-are-we-going)
35+
(doi:[10.1007/s12021-015-9284-3](https://doi.org/10.1007/s12021-015-9284-3)).
36+
37+
When the prefixes are defined by community standards, then a compact identifier can be resolved.
38+
There currently are multiple providers of prefix files (doi:[10.1038/sdata.2018.29](https://doi.org/10.1038/sdata.2018.29)),
39+
including Identifiers.org (doi:[10.1093/bioinformatics/btaa864](https://doi.org/10.1093/bioinformatics/btaa864))
40+
and Bioregistry (doi:[10.1038/s41597-022-01807-3](https://doi.org/10.1038/s41597-022-01807-3)).
41+
The Bioregistry has an overview of more than twenty registries of prefixes and their metadata
42+
(doi:[10.1038/s41597-022-01807-3](https://doi.org/10.1038/s41597-022-01807-3)). The metadata commonly
43+
includes information on the URL pattern for each identifier. Often this is more than one pattern, as
44+
there may more several databases with information for the same identifier.
45+
46+
It is the URL pattern in the database that allows services to *resolve* the compact identifier
47+
into a link to a database. The above registries correspond to three existing *resolvers* that will take a compact
48+
identifier as part of a resolver URL and redirect to the database with the record matching
49+
that identifier:
50+
51+
* Name-to-Thing (N2T): [https://n2t.net/](https://n2t.net/)
52+
* Identifiers.org: [https://identifiers.org/](https://identifiers.org/)
53+
* The Bioregistry: [https://bioregistry.io/](https://bioregistry.io/)
54+
55+
Each of these URLs can be extended with a compact identifier. For example, a taxon record
56+
from the NCBI databases or the PDB entry mentioned earlier:
57+
58+
* [https://bioregistry.io/pdb:2gc4](https://bioregistry.io/pdb:2gc4)
59+
* [https://identifiers.org/col:6MB3T](https://identifiers.org/col:6MB3T) (`col` is the prefix for the Catalogue of Life)
60+
61+
## Why use in reports?
62+
63+
Using persistent identifiers is generally accepted as a good practice that benefits science
64+
and has been part of the ideas of FAIR data (doi:[10.1038/sdata.2016.18](https://doi.org/10.1038/sdata.2016.18))
65+
and of Open Science. Compact
66+
identifiers make it easy to be precise in reports about what things the reports talk about: they
67+
are relatively short but very precise at the same time. also, that has the benefit that they
68+
are much easier to reuse than labels of things and concepts that intrinsically have a certain
69+
level of uncertainty; a database entry has commonly a very specific meaning.
70+
71+
## Examples uses
72+
73+
The use of compact identifiers can be used in two ways. The simplest is to just put the
74+
compact identifier as plain text in the document, possibly in parentheses
75+
(with the compact identifier highlighted here in bold):
76+
77+
<ul>
78+
<i>This report is only about the experimental data of the human (<b>NCBITaxon:9606</b>) cell lines.</i>
79+
</ul>
80+
81+
Or:
82+
83+
<ul>
84+
<i>We found that BRCA1 (<b>ensembl:ENSG00000012048</b>) played an important role.</i>
85+
</ul>
86+
87+
Alternatively, you can add a hyperlink with one of the resolvers, for example, Identifiers.org:
88+
89+
<ul>
90+
<i>We found that BRCA1 (<b><a href="https://identifiers.org/ensembl:ENSG00000012048">ensembl:ENSG00000012048</a></b>) played an important role.</i>
91+
</ul>
92+
93+
### Compact identifiers for material identifiers
94+
95+
The European Registry of Materials proposes to use the compact identifier for their
96+
ERM identifiers (doi:[10.1186/s13321-022-00614-7](https://doi.org/10.1186/s13321-022-00614-7)):
97+
98+
<ul>
99+
<i>
100+
For example, the NanoSolveIT project registered a material with the ERM00000001 identifier.
101+
The full Uniform Resource Identifier (URI) for this compound is
102+
https://nanocommons.github.io/identifiers/registry#ERM00000001 which is too long to be used
103+
in documentation. The corresponding compact identifier <b>erm:ERM00000001</b> is easy to use in written
104+
material, analogous to the use of Protein Data Bank (PDB) identifiers for proteins in journals.
105+
</i>
106+
</ul>
107+
108+
### Compact identifiers for citation intent annotations
109+
110+
The compact identifier has also been used as the method to include citation intentions in journal
111+
articles (doi:[10.1186/s13321-020-00448-1](https://doi.org/10.1186/s13321-020-00448-1),
112+
compact identifier here highlighted in bold):
113+
114+
<ul>
115+
<i>
116+
We take advantage here of the ability to add notes to full form [..] references in bibliographies.
117+
These are referred to as bibnotes. The content of the note will be strictly formatted: it will use
118+
the syntax [<b>cito:usesMethodIn</b>] and formatted in bold. That is, the bibnote starts with the
119+
[ character, followed by one of the CiTO types, and ends with the ] character. If you wish to
120+
provide more than one annotation, you can repeat this syntax, separated by one or more spaces,
121+
for example: [<b>cito:usesMethodIn</b>] [<b>cito:citeAsAuthority</b>].
122+
</i>
123+
</ul>
124+
125+
Note that in this use, the square brackets and bold typeface are used to make them easier to
126+
be recognized. Also, note that this document uses this approach to indicate the intention of
127+
why the cited articles are cited.
128+
129+
## Conclusion
130+
131+
This document described what the compact identifier is, how it helps linking to online
132+
databases, and how they can be used in written reports as plain text, optionally
133+
hyperlinked with one of the compact identifier resolvers.
134+
135+
### Acknowledgments
136+
137+
I thank [github:tabbassidaloii](https://n2t.net/github:tabbassidaloii),
138+
[github:cthoyt](https://n2t.net/github:cthoyt), and
139+
[github:larsgw](https://n2t.net/github:larsgw) for their comment on
140+
[this GitHub repo](https://github.com/egonw/compact-ids-in-reports).

0 commit comments

Comments
 (0)