Description
First I'd like to start by mentioning my assumption is that this is the code which powers the backend for https://validator.w3.org/feed/ and possibly https://www.rssboard.org/rss-validator/. However, the bug only occurs on the former website.
It seems that within certain elements that contain a CDATA section, if there is an ampersand (&) followed by a character that isn't a space, then the validator will report the following recommendation:
Invalid HTML: Named entity expected. Got none.
With a reference to this help doc.
The exact situation for this seems very specific. This doesn't reproduce for CDATA sections in all elements. Here is a minimal example that will reproduce the potential bug:
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Foo Bar</title>
<link>https://example.com</link>
<description><![CDATA[foo &bar]]></description>
<atom:link href="https://example.com" rel="self" type="application/rss+xml"/>
<item>
<description><![CDATA[foo &bar]]></description>
<guid>http://example.com/123</guid>
</item>
</channel>
</rss>
The recommendation will be reported on line 8 (not 5) within the <description>
nested within <item>
.
I tried reproducing this issue within a <title>
nested within <item>
but it did not reproduce. I also tested within a <description>
nested within <channel>
and it also didn't reproduce. Perhaps there are other situations where it will reproduce but I've only been able to reproduce it with the CDATA section inside a <description>
nested within an <item>
.
This seems to indicate that in this specific context, it's recognizing the "&" as markup within a CDATA section rather than character data. However, the official documentation on the CDATA sections specifies that:
Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using " < " and " & ". CDATA sections cannot nest.
Looking forward to hearing your thoughts.