Skip to content

mdast json from JATS section title saved as "free to add title numbering" (when it shouldn't be) #74

@castedo

Description

@castedo

This issue can be handled in a variety of ways, maybe or maybe not in this npm package for JATS XML reading.

This issue is also a good case example of a difference between desirable behavior in an authoring tool (where an author can change editable source text) vs a reading tool (where the document digital data are frozen and not editable).

JATS XML from MDPI, such as PMC10000433.xml.txt, have heading numbering included inside the title text:

<body><sec sec-type="intro" id="sec1-cancers-15-01527"><title>1. Introduction</title>

which the jats convert ... utility will concert to:

...
      {
        "type": "block",
        "data": {
          "part": "intro"
        },
        "children": [
          {
            "type": "heading",
            "enumerated": true,
            "label": "sec1-cancers-15-01527",
            "identifier": "sec1-cancers-15-01527",
            "depth": 1,
            "children": [
              {
                "type": "text",
                "value": "1. Introduction"
              }
            ]
          },
...

Note the 1. is included in the "value": value.

Should header numbers be included in that value? The way the myst authoring tool works, clearly not. Numbering in HTML output is added in addition to the numbering already from the JATS XML when the output PMC10000433.myst.json is feed into the CLI tool myst to generate a website with the book-theme or article-theme templates. That is, the numbers that prepend the title are repeated twice.

It makes sense that for an authoring tool the title text should not include number and maybe an author chooses an output style that adds number, or not. But for JATS XML this choice is already made and the title text is not intended for reader software to add numbers as an optional feature. It definitely appears that the PMC reader software does not take the liberty and behaves more like how a web browser behaves with H1/H2... elements; just echo the text, don't attempt something fancy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions