Skip to content

MarkdownManager not handling escaped HTML characters correctly #7539

@iafan

Description

@iafan

Affected Packages

markdown

Version(s)

3.20.0

Bug Description

(extracting this from this closed ticket as per request from @bdbch)

I've encountered the problem in inconsistent escaping in the latest version of TipTap + its Markdown extension.

For example, if my markdown file contains a string like foo <bar> baz in plain text, not as a fenced code, the <bar> part is interpreted as an HTML tag, inserted into document DOM as is, and is rendered as an unknown 'bar' tag (e.g. not displayed visually, but typically causes a paragraph break in TipTap editor). This by itself is not a problem, as one can argue that HTML should be interpreted in Markdown as is, and they will be right.

The natural workaround is to agree to always escape unsafe character in the source: foo &lt;bar&gt; baz. But when I load this into the editor via setContent(md, { contentType: 'markdown' }), the user sees the literal string &lt;bar&gt; instead of <bar>. So I get either no escaping of < and > or, essentially a double-escaping of &.

I asked Claude Code to debug the module, and it came up with a small patch in a few of places:

The fix I applied locally to MarkdownManager:

  • On parse: decode &lt; &gt; &amp; when creating text nodes from text tokens (in parseInlineTokens and parseFallbackToken)
  • On serialize: encode < > & back to entities when rendering text nodes (in renderNodeToMarkdown and renderNodesWithMarkBoundaries), but skip this for text inside code blocks and inline code since those are already literal

It created a small postintstall script that patches the NPM library — I'm not sharing it here as I don't know the policy around AI-generated code, and I'm not sure it is the best way to solve the problem, but in the end, for me this patch gives a clean roundtrip: &lt;foo&gt; on disk, <foo> in the editor, &lt;foo&gt; saved back. Code spans and fenced blocks are unaffected — I can use raw < and > there just fine. I still have to ensure that I initially have &lt;foo&gt; on disk, not <foo>, as the latter is ambiguous, so I think it's reasonable.

On top of this approach one can add some extra logic to auto-escape tag-like sequences if they do not represent a subset of expected tags in Markdown, but I think this should be a concern of a calling code rather than TipTap library.

Browser Used

Chrome

Code Example URL

No response

Expected Behavior

I would expect consistent handling of <> and & to be able to show literal strings < and > in unfenced content.

Additional Context (Optional)

No response

Dependency Updates

  • Yes, I've updated all my dependencies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Open SourceThe issue or pull reuqest is related to the open source packages of Tiptap.

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions