Skip to content

Sanitizer Bypass (in Markdown)

High
EmilStenstrom published GHSA-3rcm-vjrc-p45j Mar 18, 2026

Package

pip justhtml (pip)

Affected versions

<= 1.11.0

Patched versions

1.12.0

Description

Summary

to_markdown() does not sufficiently escape text content that looks like HTML. As a result, untrusted input that is safe in to_html() can become raw HTML in Markdown output.

This is not specific to tokenizer raw-text states like <title>, <noscript>, or <plaintext>, although those states can trigger the behavior. The root cause is broader: Markdown text serialization leaves angle brackets unescaped in text nodes.

Details

When converting a parsed document to Markdown, text nodes are escaped for a small set of Markdown metacharacters, but HTML-significant characters such as < and > are preserved. That means content parsed as text, including entity-decoded text or text produced by RCDATA/RAWTEXT-style parsing, can be emitted into Markdown as raw HTML.

Examples of affected input include:

  • Text produced from entity-decoded input such as &lt;script&gt;...&lt;/script&gt;
  • Text inside elements like <title>, <textarea>, <noscript> (when parsed as raw text), and <plaintext>

This is distinct from actual <script> or <style> elements in the DOM. Those are already dropped by default in to_markdown() unless html_passthrough=True.

Proof of Concept

General case

from justhtml import JustHTML

doc = JustHTML("<p>&lt;img src=x onerror=alert(1)&gt;</p>", fragment=True)

print(doc.to_html())
print()
print(doc.to_markdown())

Severity

High

CVE ID

No known CVE

Weaknesses

Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

The product does not neutralize or incorrectly neutralizes user-controllable input before it is placed in output that is used as a web page that is served to other users. Learn more on MITRE.

Credits