Skip to content

"&" sometimes recognized as markup within a CDATA section rather than character data #48

Open
@ScottG489

Description

@ScottG489

First I'd like to start by mentioning my assumption is that this is the code which powers the backend for https://validator.w3.org/feed/ and possibly https://www.rssboard.org/rss-validator/. However, the bug only occurs on the former website.

It seems that within certain elements that contain a CDATA section, if there is an ampersand (&) followed by a character that isn't a space, then the validator will report the following recommendation:

Invalid HTML: Named entity expected. Got none.

With a reference to this help doc.

The exact situation for this seems very specific. This doesn't reproduce for CDATA sections in all elements. Here is a minimal example that will reproduce the potential bug:

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Foo Bar</title>
    <link>https://example.com</link>
    <description><![CDATA[foo &bar]]></description>
    <atom:link href="https://example.com" rel="self" type="application/rss+xml"/>
    <item>
      <description><![CDATA[foo &bar]]></description>
<guid>http://example.com/123</guid>
    </item>
  </channel>
</rss>

The recommendation will be reported on line 8 (not 5) within the <description> nested within <item>.

I tried reproducing this issue within a <title> nested within <item> but it did not reproduce. I also tested within a <description> nested within <channel> and it also didn't reproduce. Perhaps there are other situations where it will reproduce but I've only been able to reproduce it with the CDATA section inside a <description> nested within an <item>.

This seems to indicate that in this specific context, it's recognizing the "&" as markup within a CDATA section rather than character data. However, the official documentation on the CDATA sections specifies that:

Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using " < " and " & ". CDATA sections cannot nest.

Looking forward to hearing your thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions