Skip to content

Trang conversion from .rng to .rnc introduces extraneous text on every line ending within dita:moduleDesc and other brackets #285

@Ozc-Y

Description

@Ozc-Y

Hello. I am trying to familiarize myself with DITA XML 1.3 and the DITA Open Toolkit (DITA-OT) without using proprietary XML editors.

As DITA states that its RELAX NG XML schemas are normative, I am trying to convert DITA-OT's document-type shells from .rng to .rnc files (XML syntax to compact syntax) using trang.

I am able to run the conversion from the CLI; however, the conversion introduces additional text at the end of every line ending within the dita:moduleDesc section:

dita:moduleDesc [
  "\x{a}" ~
  "    "
  dita:moduleTitle [ "DITA Concept Shell" ]
  "\x{a}" ~
  "    "
  dita:headerComment [
    xml:space = "preserve"
    "\x{a}" ~
    "=============================================================\x{a}" ~
    "                   HEADER                                    \x{a}" ~
    "=============================================================\x{a}" ~
    "Darwin Information Typing Architecture (DITA) Version 1.3 Plus Errata 02\x{a}" ~
    "OASIS Standard\x{a}" ~
    "16 January 2018 \x{a}" ~
    "Copyright (c) OASIS Open 2018. All rights reserved. \x{a}" ~
    "Source: http://docs.oasis-open.org/dita/dita/v1.3/errata02/csprd01/complete/part0-overview/dita-v1.3-errata02-csprd01-part0-overview-complete.html\x{a}" ~
    "\x{a}" ~
    "============================================================\x{a}" ~
    " MODULE:    DITA Concept Shell                                 \x{a}" ~
    " VERSION:   1.3                                              \x{a}" ~
    " DATE:      March 2014                                    \x{a}" ~
    "                                                             \x{a}" ~
    "=============================================================\x{a}" ~
    "\x{a}" ~
    "=============================================================\x{a}" ~
    "                   PUBLIC DOCUMENT TYPE DEFINITION           \x{a}" ~
    "                   TYPICAL INVOCATION                        \x{a}" ~
    "                                                             \x{a}" ~
    " Refer to this file by the following public identifier or an \x{a}" ~
    "      appropriate system identifier \x{a}" ~
    "      \x{a}" ~
    'PUBLIC "-//OASIS//DTD DITA Concept//EN"\x{a}' ~
    "\x{a}" ~
    "The public ID above refers to the latest version of this DTD.\x{a}" ~
    "     To refer to this specific version, you may use this value:\x{a}" ~
    "\x{a}" ~
    'PUBLIC "-//OASIS//DTD DITA 1.3 Concept//EN"                       \x{a}' ~
    "\x{a}" ~
    "=============================================================\x{a}" ~
    "SYSTEM:     Darwin Information Typing Architecture (DITA)    \x{a}" ~
    "                                                             \x{a}" ~
    "PURPOSE:    DTD to describe DITA Concepts                    \x{a}" ~
    "                                                             \x{a}" ~
    "ORIGINAL CREATION DATE:                                      \x{a}" ~
    "            March 2001                                       \x{a}" ~
    "                                                             \x{a}" ~
    "            (C) Copyright OASIS Open 2005, 2014.             \x{a}" ~
    "            (C) Copyright IBM Corporation 2001, 2004.        \x{a}" ~
    "            All Rights Reserved.                             \x{a}" ~
    "                                                             \x{a}" ~
    " UPDATES:                                                    \x{a}" ~
    "   2006.06.07 RDA: Added indexing domain                     \x{a}" ~
    "   2006.06.21 RDA: Added props attribute extensions          \x{a}" ~
    "   2008.02.12 RDA: Modify imbeds to use specific 1.2 version \x{a}" ~
    "   2008.04.15 RDA: Added hazard domain                       \x{a}" ~
    "   2014.03.12 RDA: Updated for DITA 1.3. Implemented as \x{a}" ~
    "                   RELAX NG\x{a}" ~
    "=============================================================\x{a}" ~
    "  "
  ]
  "\x{a}" ~
  "    "
  dita:moduleMetadata [
    "\x{a}" ~
    "      "
    dita:moduleType [ "topicshell" ]
    "\x{a}" ~
    "      "
    dita:moduleShortName [ "concept" ]
    "\x{a}" ~
    "      "
    dita:shellPublicIds [
      "\x{a}" ~
      "        "
      dita:dtdShell [
        "-//OASIS//DTD DITA"
        dita:var [ presep = " " name = "ditaver" ]
        " Concept//EN"
      ]
      "\x{a}" ~
      "        "
      dita:rncShell [
        "urn:oasis:names:tc:dita:rnc:concept.rnc"
        dita:var [ presep = ":" name = "ditaver" ]
      ]
      "\x{a}" ~
      "        "
      dita:rngShell [
        "urn:oasis:names:tc:dita:rng:concept.rng"
        dita:var [ presep = ":" name = "ditaver" ]
      ]
      "\x{a}" ~
      "        "
      dita:xsdShell [
        "urn:oasis:names:tc:dita:xsd:concept.xsd"
        dita:var [ presep = ":" name = "ditaver" ]
      ]
      "\x{a}" ~
      "      "
    ]
    "\x{a}" ~
    "    "
  ]
  "\x{a}" ~
  "  "
]

Specifically, the extraneous text I am referring to is this:

"\x{a}" ~
"      "

Interestingly, the number of spaces on the second line seems to be correlated with indentation/nesting: When this text appears right after an additional level of nesting, the number of quoted spaces increases by 2. When this text (snippet? string? artifact?) appears right before a ] that ends a level of nesting, the number of quoted spaces decreases by 2.

For example, the number of quoted spaces increases from 4 to 6 after line 69 ( dita:moduleMetadata [).
On the line before this bracket is closed with ] (line 109), the number of quoted spaces decreases from 6, back down to 4.

The issue is seemingly replicated within a:documentation blocks,

trang pulls in a large number of other .rnc files during this conversion, and these additional .rnc files seem to exhibit the same problem.

I would like to know if there is a way to sidestep this problem. Simplifying the .rng file with jing first and then converting the resulting simplified RELAX NG XML syntax to .rnc seems to work; however, I seem to have a separate problem with that process and thus would prefer to know if I can directly convert DITA-OT document-type shells from .rng to .rnc without problems.

In case it matters, I am using Windows (10) PowerShell via Visual Studio Code's built-in terminal, and java -version (from the same terminal) has the following output:

openjdk version "23.0.2" 2025-01-21
OpenJDK Runtime Environment (build 23.0.2+7-58)
OpenJDK 64-Bit Server VM (build 23.0.2+7-58, mixed mode, sharing)

.zip containing the .rng and .rnc files relevant to this issue:

concept-rnc-and-rng.zip

Please let me know if I can supply any further information.

Edit:

It is likely that the given .rng file cannot be converted alone. If you would like to fully replicate my conversion environment, please download DITA-OT 4.2.4, extract the download, and navigate to .../dita-ot-4.2.4/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng (substitute \ for / if on Windows) before running trang on concept.rng within this directory.

Edit 2:

The problem is not relegated to the dita:moduleDesc brackets. In other files, they also occur elsewhere in a pattern I am not sure I can understand. Comments preceded by ## seem intact, but I suspect a:documentation might be related:

  • Line 29 of svg-basic-clip.rnc: a:documentation [ "\x{a}" ~ " SVG.Clip.attrib\x{a}" ~ " " ]

In the interest of clarity, I will provide the following .zip containing all of my generated .rnc files (disregard concept_simplified files) and some of the corresponding .rng files that were provided within the DITA-OT 4.2.4 distribution. They might not work without the full DITA-OT archive.

rng-files-with-rnc.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions