MarkdownManager not handling escaped HTML characters correctly

### Affected Packages

markdown

### Version(s)

3.20.0 

### Bug Description

_(extracting this from [this closed ticket](https://github.com/ueberdosis/tiptap/issues/4007#issuecomment-3942306876) as per request from @bdbch)_

I've encountered the problem in inconsistent escaping in the latest version of TipTap + its Markdown extension.

For example, if my markdown file contains a string like `foo <bar> baz` in plain text, not as a fenced code, the `<bar>` part is interpreted as an HTML tag, inserted into document DOM as is, and is rendered as an unknown 'bar' tag (e.g. not displayed visually, but typically causes a paragraph break in TipTap editor). This by itself is not a problem, as one can argue that HTML should be interpreted in Markdown as is, and they will be right. 

The natural workaround is to agree to always escape unsafe character in the source: `foo &lt;bar&gt; baz`. But when I load this into the editor via `setContent(md, { contentType: 'markdown' })`, the user sees the literal string `&lt;bar&gt;` instead of `<bar>`. So I get either no escaping of `<` and `>` or, essentially a double-escaping of `&`.

I asked Claude Code to debug the module, and it came up with a small patch in a few of places:

> The fix I applied locally to `MarkdownManager`:
>
> - **On parse**: decode `&lt;` `&gt;` `&amp;` when creating text nodes from `text` tokens (in `parseInlineTokens` and `parseFallbackToken`)
> - **On serialize**: encode `<` `>` `&` back to entities when rendering text nodes (in `renderNodeToMarkdown` and `renderNodesWithMarkBoundaries`), but skip this for text inside code blocks and inline code since those are already literal

It created a small postintstall script that patches the NPM library — I'm not sharing it here as I don't know the policy around AI-generated code, and I'm not sure it is the best way to solve the problem, but in the end, for me this patch gives a clean roundtrip: `&lt;foo&gt;` on disk, `<foo>` in the editor, `&lt;foo&gt;` saved back. Code spans and fenced blocks are unaffected — I can use raw `<` and `>` there just fine. I still have to ensure that I initially have `&lt;foo&gt;` on disk, not `<foo>`, as the latter is ambiguous, so I think it's reasonable.

On top of this approach one can add some extra logic to auto-escape tag-like sequences if they do not represent a subset of expected tags in Markdown, but I think this should be a concern of a calling code rather than TipTap library.

### Browser Used

Chrome

### Code Example URL

_No response_

### Expected Behavior

I would expect consistent handling of `<>` and `&` to be able to show literal strings `<` and `>` in unfenced content.

### Additional Context (Optional)

_No response_

### Dependency Updates

- [x] Yes, I've updated all my dependencies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MarkdownManager not handling escaped HTML characters correctly #7539

Affected Packages

Version(s)

Bug Description

Browser Used

Code Example URL

Expected Behavior

Additional Context (Optional)

Dependency Updates

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

MarkdownManager not handling escaped HTML characters correctly #7539

Description

Affected Packages

Version(s)

Bug Description

Browser Used

Code Example URL

Expected Behavior

Additional Context (Optional)

Dependency Updates

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions