Skip to content

Unified Representation

johnataylor edited this page Feb 20, 2016 · 1 revision

Even when we have data in XML we still have multiple different way for that data to be represented in the XML. Perhaps we should stop obsessing over the format and start focusing more on the concepts. Here is the same book we used in the XML 2 JSON article, but expressed in a very different peice of XML.

<?xml version="1.0" encoding="utf-8" ?>
<book xmlns="http://schemas.example.org/alt/library"
      isbn="1-55860-190-2"
      title="Transaction Processing: Concepts and Techniques">
  <author name="Jim Gray" />
  <author name="Andreas Reuter" />
  <publisher by="Morgan Kaufmann" on="1993" />
</book>

The developer who came up with this XML clearly favored the more compact representation that can be had with using attributes, however, the net effect has really only been that this XML snippet is just different, equally valid perhaps, but different all the same. And of course, we would really like to understand that these two peices of XML are actually talking about the same thing. So what we need to do then is normalize. And rather than picking a winner between these two formats instead we are going to focus on the concepts, the real semantics, in the data - and that means translating this XML into RDF. Here is the transformation:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:src="http://schemas.example.org/alt/library"
  xmlns:library="http://schemas.example.org/library#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  version="1.0">
  <xsl:param name="baseAddress" />
  <xsl:template match="/src:book">
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <library:Book>
        <xsl:attribute name="rdf:about">
          <xsl:value-of select="concat($baseAddress, @isbn, '.json')" />
        </xsl:attribute>
        <xsl:if test="@title">
          <library:title>
            <xsl:value-of select="@title" />
          </library:title>
        </xsl:if>
        <xsl:for-each select="*">
          <xsl:choose>
            <xsl:when test="self::src:author">
              <library:author>
                <library:Author>
                  <xsl:attribute name="rdf:about">
                    <xsl:value-of select="concat($baseAddress, 'author/', @name, '.json')" />
                  </xsl:attribute>
                  <library:name>
                    <xsl:value-of select="@name"/>
                  </library:name>
                </library:Author>
              </library:author>
            </xsl:when>
            <xsl:when test="self::src:published">
              <xsl:if test="@by">
                <library:publisher>
                  <xsl:value-of select="@by" />
                </library:publisher>
              </xsl:if>
              <xsl:if test="@in">
                <library:published>
                  <xsl:value-of select="@in" />
                </library:published>
              </xsl:if>
            </xsl:when>
          </xsl:choose>
        </xsl:for-each>
      </library:Book>
    </rdf:RDF>
  </xsl:template>
</xsl:stylesheet>

Again, like the previous XSLT, the main thing going on here is the assignment of URIs to the key concepts in our domain; in our example that means books and authors. Although this XSLT is different than the previous example XML 2 JSON the way it manufactures the URIs is exactly the same. In fact this XSLT produces exactly the same graph from the XML above as the previous XSLT did for its XML. And because we end out with the same graph it will produce the same JSON.

What is going on here is a process of normalization. Instead of simply translating one shape of XML into another, we instead focused on identifying and labelling the key concepts in the documents. And having labeled the key concepts the rest of the model is just a matter of associating properties.

And, of course, you can give this one a try to:

curl -X POST -T Book7.xml -o Book7.json http://transformwebapplication.azurewebsites.net/xml2json/808c5361b5efcca185d/BookAlt.xslt/BookContext.json/Book

The number 808c5361b5efcca185d is the Metadata Repository gist where the service finds the XSLT and JSON-LD context and finally "Book" is the JSON-LD framing type.

Clone this wiki locally