Skip to content
This repository was archived by the owner on Dec 15, 2022. It is now read-only.
This repository was archived by the owner on Dec 15, 2022. It is now read-only.

XML DTD Improvements #89

Open
Open
@chpxu

Description

@chpxu

Prerequisites

Description

This is similar to #45 but with an extreme amount of detail. I also originally posted this on VSCode's issue page, but was then told to go here since I'm guessing they use the language packs here. I've essentially copied and pasted the issue here.

As a frequent user of XML for planning layouts and storing/retrieving data, I can say that some parts of XML aren't that well covered by VSCode. Specifically, this is XML DTD (Document Type Definitions) and the highlighting (and when inspecting using TM Scopes, the internal structure) isn't as fleshed out. Within this issue, I'd like to submit my own ideas for improving this. Below is a screenshot of some XML DTD Code which I will reference frequently to make my points:
image

I am using the theme "One Dark Pro" but testing the default themes also gives this same issue.
The example I am using is taken from w3schools

This issue will cover the following areas which I've created:

  • Keywords
  • Special Characters
  • Highlighting and Structure - This is more interweaved throughout, but thought I'd mention it for clarity

DISCLAIMER: For those who say use XSD, I do use that but sometimes I revert to DTD just for historical reasons or (at least for me), it's easier to define document-only related elements/attributes without having to do too much XSD Namespacing and that. Now onto the bulk!

Keywords

In XML DTD, Keywords can be split into multiple sections (at least this is what I can split them into):

  • Declarations: These include <!ELEMENT> and <!ATTLIST> and the name says it all
  • Modifiers: Such as EMPTY, ANY, #REQUIRED, #IMPLIED
  • Data Keywords: Including ones such as #PCDATA, CDATA
  • Values: Usually something that the user would type in. According to w3schools, these can be examples such as attribute-name`` and attribute-value```

As per my screenshot above, it is evident that for Declarations, there is some syntax there, as <!DOCTYPE> and <!ENTITY> is highlighted, these both fit in the scope keyword.other.doctype.xml and keyword.other.entity.xml respectively. However, it seems other keywords, especially <!ELEMENT> and <!ATTLIST> don't seem to have anything. The scope only shows them as part of meta.internalsubset.xml. Here, the obvious resolution would be to stick these with their own scope such as keyword.other.element.xml and keyword.other.attlist.xml and give them that purply highlighting (or blue in default dark+).

Secondly, for Modifier Keywords, I believe they should have a colour similar to that of JavaScript type objects such as Boolean (green default), and have a scope such as keyword.other.modifier.xml.
Again, these keywords present meaning and therefore, should be highlighted to give this meaning. Now this is where it gets more ambiguous as Data Keywords also share similar syntax, they can have #s too but not have them at the same time, in my examples, I've given #PCDATA and CDATA. To me, these should have the scope of charData or something of the sorts, since these keywords represent what can be 'parsed' inside an XML element, so possibly along the lines of keyword.other.char-data.xml, and also highlight them in the colour which is used in JavaScript variable number values

Now, this part gets really complicated. Welcome to Values! This section gets crazy because there are many different scopes that could be implemented and some can only appear on certain Declarations etc., and it's just a huge XMessL (see what I did there? I'm sorry). If I were to take the <!ELEMENT> declaration and give a quick diagnose:

  • Accepts element-name
  • Accepts category (This is essentially Modifiers but I renamed it to cover more values)/Modifiers
  • Accepts element-content including Data Keywords and Other Elements, along with Special Characters (more on that later)

Some of this is easy, some not. element-name, easy, just highlight it like any HTML/XML element that a theme would do? Default is blue, but One Dark Pro recognises it as a variable so it's red, doesn't matter to me that much though. Modifiers, explained pretty much. element-content is slightly more challenging, since it uses () and within the brackets, Data Keywords can be put inside, or special characters and other elements, separated by commas. In my screenshot, a good example is Line 10: <!ELEMENT PRODUCT (SPECIFICATIONS+,OPTIONS?,PRICE+,NOTES?)> and Line 23: <!ELEMENT OPTIONS (#PCDATA)>. My solution to highlight those elements is simply the element colour. I don't think the brackets need to be highlighted though, since I think there is enough mishmash XML colour here already right?

The next step is to diagnose the <!ATTLIST> declaration, as there is whole lot more that can be put into here:

  • They take element-name and modifiers as stated above, but also:
  • attribute-name
  • attribute-type
  • attribute-value
  • Can repeat the attribute stuff again so you can have multiple

Good examples from the screenshot are at Lines 19 - 21:

<!ATTLIST SPECIFICATIONS
WEIGHT CDATA #IMPLIED
POWER CDATA #IMPLIED>

and Lines 24 - 26:

<!ATTLIST OPTIONS FINISH (Metal|Polished|Matte) "Matte" 
ADAPTER (Included|Optional|NotApplicable) "Included"
CASE (HardShell|Soft|NotApplicable) "HardShell">

Just to explain briefly, this issue is already goddamn long enough.... The first example best represents the basic syntax. It has element SPECIFICATIONS, attribute WEIGHT, type CDATA (characters basically) and a modifier of #IMPLIED. The Second one is monstrous at first sight, but really, the only differences are that it contains a Default Value (in the double quotes) and Options for the attribute-value, separated by the special character |. To resolve this structure, obviously provide scopes such as keyword.other.attlist.attribute-name.xml, and replace attribute-name with the relevant one such as type and value. For highlighting the ones in brackets in example 2, the words would probably have to be highlighted in the colour used for attribute values in HTML/XML, same as the default-value since, well they are and can be values after all. Possibly add a .attribute-value-range scope?

That's basically it for this section, next up is Special Characters! Luckily, this is a much shorter section, less to develop so don't worry, you're almost done (if you haven't stopped reading already)

Special Characters

So far, we've already touched up on some special characters, these include the + ? | * operators, their meanings aren't relevant right now, but they usually appear inside parentheses, and denote when and how an element to appear (basically frequency). These should have a scope such as operators.xml and be highlighted in the colour of normal operators in say JavaScript (yes I've been referencing JS the most but it's one language I'm very close to)

One thing that I anticipate could be thrown up, is controlling it appearing in actual XML documents or in DTD. In actual XML, any text between elements is just text, so it shouldn't be highlighted. Plus, It's scope and hierarchal based, so it wouldn't be incorrect anyways. There are examples of this everywhere.

Finally (yes, I know!), i know I've probably made some errors, I've grouped things unnecessarily and jumped a few barriers. Below I'll provide a few links for reading into XML DTD more so you can understand where I;m coming from, and if any questions pop up, feel free to ask:
Intro to XML DTD
DTD Building Blocks
DTD Elements
DTD Attributes
DTD Elements vs Attributes
DTD Entities

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions