-
Notifications
You must be signed in to change notification settings - Fork 0
Including from Other Configurations
Typically, extracting RDF triples from even small XML projects
involves writing several XTriples configuration files, since there are
several types of resources in the knowledge graph. Thus, there will be
redundant parts of the configuration, e.g., the <vocabularies>
sections will probably share the same prefix definitions.
Good news: With XTriples, we can reduce redundancy by including the
same <vocabularies> from one file in other files. We can use XML's
generic inclusion mechanism for it:
XInclude.
Assume, we have an XTriples configuration in
resources/graph/common.xml, which contains an exhaustive set of
vocabularies.
<?xml-model uri="https://xtriples.lod.academy/xtriples.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<!-- XTriples configuration for extracting RDF triples from central header common.xml -->
<!DOCTYPE xtriples [
<!ATTLIST vocabularies xml:id ID #IMPLIED>
<!ENTITY edbase "https://scdh.zivgitlabpages.uni-muenster.de/hees-alea/edition-ibn-nubatah/">
<!ENTITY entbase "https://scdh.zivgitlabpages.uni-muenster.de/hees-alea/edition-ibn-nubatah/">
]>
<xtriples>
<configuration>
<vocabularies xml:id="vocabs">
<!-- domain of the edition -->
<vocabulary prefix="ed" uri="&edbase;edition/"/>
<!-- editable domain -->
<vocabulary prefix="edable" uri="&edbase;editable/"/>
<vocabulary prefix="edwit" uri="&edbase;editable/"/>
<!-- additional resources of the edition -->
<vocabulary prefix="edad" uri="&edbase;adds/"/>
<!-- textapi resources -->
<vocabulary prefix="edta" uri="&edbase;textapi/"/>
<!-- tbox of the edition, ontology -->
<vocabulary prefix="alea" uri="&edbase;alea#"/>
<!-- domain of local vocabularies, similar to the edition's tbox -->
<vocabulary prefix="edtags" uri="&edbase;tags/"/>
<!--
named entities
Path segment must match values of tei:rs/@type !
Maybe these should point to websites of the DSE web app.
-->
<vocabulary prefix="edprs" uri="&entbase;person#"/>
<vocabulary prefix="edplc" uri="&entbase;place#"/>
<vocabulary prefix="edorg" uri="&entbase;org#"/>
<vocabulary prefix="edevn" uri="&entbase;event#"/>
<vocabulary prefix="edent" uri="&entbase;"/>
<vocabulary prefix="align" uri="https://ontology.scdh.uni-muenster.de/alignment/"/>
<vocabulary prefix="ta" uri="https://scdh.zivgitlabpages.uni-muenster.de/text-api#"/>
<vocabulary prefix="rdf" uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
<vocabulary prefix="rdfs" uri="http://www.w3.org/2000/01/rdf-schema#"/>
<vocabulary prefix="owl" uri="http://www.w3.org/2002/07/owl#"/>
<vocabulary prefix="schema" uri="https://schema.org/"/>
<vocabulary prefix="foaf" uri="http://xmlns.com/foaf/0.1/"/>
<vocabulary prefix="rel" uri="http://purl.org/vocab/relationship/"/>
<vocabulary prefix="crm" uri="http://www.cidoc-crm.org/cidoc-crm/"/>
<vocabulary prefix="lrm" uri="http://iflastandards.info/ns/lrm/lrmoo/"/>
<vocabulary prefix="frbr" uri="http://iflastandards.info/ns/fr/frbr/frbroo/"/>
<vocabulary prefix="utils"
uri="https://edition-ibn-nubatah.arabistik.uni-muenster.de/textapi/utils/"/>
<vocabulary prefix="diwan" uri="http://scdh.wwu.de/transform/diwan#"/>
<vocabulary prefix="map" uri="http://www.w3.org/2005/xpath-functions/map"/>
<vocabulary prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
<!-- TEI is the default prefix for simple XPath expressions! -->
<vocabulary uri="http://www.tei-c.org/ns/1.0"/>
</vocabularies>
<xtriples>
<!-- statements -->
</xtriples>
</configuration>
<collection>
<!-- ... -->
</collection>
</xtriples>We can include these vocabularies in any other file with Xincludes:
<?xml-model uri="https://xtriples.lod.academy/xtriples.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<!-- XTriples configuration for extracting RDF from TEI place registry -->
<xtriples>
<configuration>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="common.xml" xpointer="vocabs"/>
<triples>
<statement>
<subject prefix="edplc">/@xml:id</subject>
<predicate prefix="rdf">type</predicate>
<object prefix="crm" type="uri">E53_Place</object>
</statement>
<!-- ... -->
</triples>
</configuration>
<collection uri="../..?select=[Pp]lace*.xml">
<resource uri="{/TEI/text//listPlace/place[@xml:id and not(ancestor::place)]}"/>
</collection>
</xtriples><xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="common.xml"/> tells to include common.xml. The value of the
href attribute may be relative paths like in this example or
absulute URIs.
Note the attribute xpointer="vocabs", which tells the parser (the
inclusion algorithm), to include only a part of the the common.xml
file. If the xpointer attribute only has a single token made of
letters, it is interpreted as a fragment identifier and the inclusion
algorithm selects the fragment with the given ID.
Typically XIncludes are processed by the XML parser. We can tell our
XSLT processor to use an xinclude aware parser. In Saxon's command
line interface, we therefore use the switch -xi.
java -jar saxon-10.9.jar -xsl:$XTRIPLES/xsl/extract-collection.xsl -s:MY_XTRIPLES_CONFIG.xml -xior with tooling
target/bin/xslt.sh -xsl:$XTRIPLES/xsl/extract-collection.xsl -s:MY_XTRIPLES_CONFIG.xml -xiHowever, XInclude support is far from being perfect in the existing
XML tool chain, especially when it comes to including parts of
files. Xerces, the most mature XML parser which is used by Saxon when
parsing xinclude aware, only partly supports the xpointer attribute
of an <include> element. It supports shorthand
pointers (formerly
known as bare names), i.e., pointing to fragments by IDREF. However,
only DTD-based IDs are
supported,
while xml:id is not interpreted as an Identifier by default. That's
why we put this DTD-fragment on top of the included file. It tells the
parser, that the attribute xml:id on the element vocabularies is
an ID.
<!DOCTYPE xtriples [
<!ATTLIST vocabularies xml:id ID #IMPLIED>
]>Without this instruction, Xerces won't find the fragment to be included. Note, that this DTD-fragment is only required in the included document, but not in the including document.
BTW, Oxygen runs a patched version of Xerces, which also processes
xml:id-based IDs. The libxml2, written in C, also knows xml:id-based
IDs.