Skip to content

[XML] properly handle normalizedString & token #451

Open
@jkowalleck

Description

CycloneDX uses http://www.w3.org/2001/XMLSchema - which defines normalizedString as follows:

<xs:simpleType name="normalizedString" id="normalizedString">
  <xs:annotation>
    <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#normalizedString"/>
  </xs:annotation>
  <xs:restriction base="xs:string">
    <xs:whiteSpace value="replace" id="normalizedString.whiteSpace"/>
  </xs:restriction>
</xs:simpleType>

normalizedString represents white space normalized strings. The ·value space· of normalizedString is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters. The ·lexical space· of normalizedString is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters. The ·base type· of normalizedString is string.


CycloneDX uses http://www.w3.org/2001/XMLSchema - which defines token as follows:

<xs:simpleType name="token" id="token">
  <xs:annotation>
    <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#token"/>
  </xs:annotation>
  <xs:restriction base="xs:normalizedString">
    <xs:whiteSpace value="collapse" id="token.whiteSpace"/>
  </xs:restriction>
</xs:simpleType>

token represents tokenized strings. The ·value space· of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The ·lexical space· of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The ·base type· of token is normalizedString.


therefore, on XML-normalization for normalizedString, the following chars must be replaced by space( ):

  • carriage return: \r (#xD)
  • line feed: \n (#xA)
  • tab: \t (#x9)

Therefore, on XML-normalization for token, the following must aplpy:

  • all from above
  • consecutive spaces are collapsed to one space.
  • leading and trialing spaces are truncated

Affected are only fields that are defined as normalizedString respective token in XML spec!
Other field MUST NOT be affected!

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions