JustHTML has a Sanitizer Bypass (in Markdown)

Summary

to_markdown() does not sufficiently escape text content that looks like HTML. As a result, untrusted input that is safe in to_html() can become raw HTML in Markdown output.

This is not specific to tokenizer raw-text states like <title>, <noscript>, or <plaintext>, although those states can trigger the behavior. The root cause is broader: Markdown text serialization leaves angle brackets unescaped in text nodes.

Details

When converting a parsed document to Markdown, text nodes are escaped for a small set of Markdown metacharacters, but HTML-significant characters such as < and > are preserved. That means content parsed as text, including entity-decoded text or text produced by RCDATA/RAWTEXT-style parsing, can be emitted into Markdown as raw HTML.

Examples of affected input include:

Text produced from entity-decoded input such as <script>...</script>
Text inside elements like <title>, <textarea>, <noscript> (when parsed as raw text), and <plaintext>

This is distinct from actual <script> or <style> elements in the DOM. Those are already dropped by default in to_markdown() unless html_passthrough=True.

Proof of Concept

General case

from justhtml import JustHTML

doc = JustHTML("<p>&lt;img src=x onerror=alert(1)&gt;</p>", fragment=True)

print(doc.to_html())
print()
print(doc.to_markdown())
### References
- https://github.com/EmilStenstrom/justhtml/security/advisories/GHSA-3rcm-vjrc-p45j

EmilStenstrom published to EmilStenstrom/justhtml Mar 18, 2026

Published to the GitHub Advisory Database Mar 18, 2026

Reviewed Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package

Affected versions

Patched versions

Description

Summary

Details

Proof of Concept

General case

Severity

CVSS overall score

CVSS v4 base metrics

Exploitability Metrics

Vulnerable System Impact Metrics

Subsequent System Impact Metrics

CVSS v4 base metrics

Exploitability Metrics

Vulnerable System Impact Metrics

Subsequent System Impact Metrics

EPSS score

Weaknesses

Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

CVE ID

GHSA ID

Source code

Credits

Uh oh!