Skip to content

Including from Other Configurations

Christian Lück edited this page Sep 23, 2025 · 2 revisions

Including from Other Configurations

Typically, extracting RDF triples from even small XML projects involves writing several XTriples configuration files, since there are several types of resources in the knowledge graph. Thus, there will be redundant parts of the configuration, e.g., the <vocabularies> sections will probably share the same prefix definitions.

Good news: With XTriples, we can reduce redundancy by including the same <vocabularies> from one file in other files. We can use XML's generic inclusion mechanism for it: XInclude.

Example

Assume, we have an XTriples configuration in resources/graph/common.xml, which contains an exhaustive set of vocabularies.

<?xml-model uri="https://xtriples.lod.academy/xtriples.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<!-- XTriples configuration for extracting RDF triples from central header common.xml -->
<!DOCTYPE xtriples [
   <!ATTLIST vocabularies xml:id ID #IMPLIED>
   <!ENTITY edbase "https://scdh.zivgitlabpages.uni-muenster.de/hees-alea/edition-ibn-nubatah/">
   <!ENTITY entbase "https://scdh.zivgitlabpages.uni-muenster.de/hees-alea/edition-ibn-nubatah/">
]>
<xtriples>
    <configuration>
        <vocabularies xml:id="vocabs">
            <!-- domain of the edition -->
            <vocabulary prefix="ed" uri="&edbase;edition/"/>
            <!-- editable domain -->
            <vocabulary prefix="edable" uri="&edbase;editable/"/>
            <vocabulary prefix="edwit" uri="&edbase;editable/"/>
            <!-- additional resources of the edition -->
            <vocabulary prefix="edad" uri="&edbase;adds/"/>
            <!-- textapi resources -->
            <vocabulary prefix="edta" uri="&edbase;textapi/"/>
            <!-- tbox of the edition, ontology -->
            <vocabulary prefix="alea" uri="&edbase;alea#"/>
            <!-- domain of local vocabularies, similar to the edition's tbox -->
            <vocabulary prefix="edtags" uri="&edbase;tags/"/>
            <!--
                named entities
                Path segment must match values of tei:rs/@type !
                Maybe these should point to websites of the DSE web app.
            -->
            <vocabulary prefix="edprs" uri="&entbase;person#"/>
            <vocabulary prefix="edplc" uri="&entbase;place#"/>
            <vocabulary prefix="edorg" uri="&entbase;org#"/>
            <vocabulary prefix="edevn" uri="&entbase;event#"/>
            <vocabulary prefix="edent" uri="&entbase;"/>

            <vocabulary prefix="align" uri="https://ontology.scdh.uni-muenster.de/alignment/"/>
            <vocabulary prefix="ta" uri="https://scdh.zivgitlabpages.uni-muenster.de/text-api#"/>

            <vocabulary prefix="rdf" uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
            <vocabulary prefix="rdfs" uri="http://www.w3.org/2000/01/rdf-schema#"/>
            <vocabulary prefix="owl" uri="http://www.w3.org/2002/07/owl#"/>
            <vocabulary prefix="schema" uri="https://schema.org/"/>
            <vocabulary prefix="foaf" uri="http://xmlns.com/foaf/0.1/"/>
            <vocabulary prefix="rel" uri="http://purl.org/vocab/relationship/"/>
            <vocabulary prefix="crm" uri="http://www.cidoc-crm.org/cidoc-crm/"/>
            <vocabulary prefix="lrm" uri="http://iflastandards.info/ns/lrm/lrmoo/"/>
            <vocabulary prefix="frbr" uri="http://iflastandards.info/ns/fr/frbr/frbroo/"/>

            <vocabulary prefix="utils"
                uri="https://edition-ibn-nubatah.arabistik.uni-muenster.de/textapi/utils/"/>
            <vocabulary prefix="diwan" uri="http://scdh.wwu.de/transform/diwan#"/>
            <vocabulary prefix="map" uri="http://www.w3.org/2005/xpath-functions/map"/>
            <vocabulary prefix="tei" uri="http://www.tei-c.org/ns/1.0"/>
            <!-- TEI is the default prefix for simple XPath expressions! -->
            <vocabulary uri="http://www.tei-c.org/ns/1.0"/>
        </vocabularies>
		<xtriples>
     	    <!-- statements -->
		</xtriples>
    </configuration>
	<collection>
	    <!-- ... -->
	</collection>
</xtriples>

We can include these vocabularies in any other file with Xincludes:

<?xml-model uri="https://xtriples.lod.academy/xtriples.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<!-- XTriples configuration for extracting RDF from TEI place registry -->
<xtriples>
    <configuration>
        <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="common.xml" xpointer="vocabs"/>
        <triples>
            <statement>
                <subject prefix="edplc">/@xml:id</subject>
                <predicate prefix="rdf">type</predicate>
                <object prefix="crm" type="uri">E53_Place</object>
            </statement>
			<!-- ... -->
        </triples>
    </configuration>
    <collection uri="../..?select=[Pp]lace*.xml">
        <resource uri="{/TEI/text//listPlace/place[@xml:id and not(ancestor::place)]}"/>
    </collection>
</xtriples>

<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="common.xml"/> tells to include common.xml. The value of the href attribute may be relative paths like in this example or absulute URIs.

Note the attribute xpointer="vocabs", which tells the parser (the inclusion algorithm), to include only a part of the the common.xml file. If the xpointer attribute only has a single token made of letters, it is interpreted as a fragment identifier and the inclusion algorithm selects the fragment with the given ID.

Processing XIncludes

Typically XIncludes are processed by the XML parser. We can tell our XSLT processor to use an xinclude aware parser. In Saxon's command line interface, we therefore use the switch -xi.

java -jar saxon-10.9.jar -xsl:$XTRIPLES/xsl/extract-collection.xsl -s:MY_XTRIPLES_CONFIG.xml -xi

or with tooling

target/bin/xslt.sh -xsl:$XTRIPLES/xsl/extract-collection.xsl -s:MY_XTRIPLES_CONFIG.xml -xi

XPointer Support

However, XInclude support is far from being perfect in the existing XML tool chain, especially when it comes to including parts of files. Xerces, the most mature XML parser which is used by Saxon when parsing xinclude aware, only partly supports the xpointer attribute of an <include> element. It supports shorthand pointers (formerly known as bare names), i.e., pointing to fragments by IDREF. However, only DTD-based IDs are supported, while xml:id is not interpreted as an Identifier by default. That's why we put this DTD-fragment on top of the included file. It tells the parser, that the attribute xml:id on the element vocabularies is an ID.

<!DOCTYPE xtriples [
   <!ATTLIST vocabularies xml:id ID #IMPLIED>
]>

Without this instruction, Xerces won't find the fragment to be included. Note, that this DTD-fragment is only required in the included document, but not in the including document.

BTW, Oxygen runs a patched version of Xerces, which also processes xml:id-based IDs. The libxml2, written in C, also knows xml:id-based IDs.

Clone this wiki locally