Skip to content

Reduce Markdown feature source footprint #433

@codemonkeychris

Description

@codemonkeychris

Feature request

Reduce the source/maintenance footprint of the Markdown feature without weakening Markdown() runtime behavior or CommonMark compatibility.

Context

src\Reactor\Markdown is currently about 9K LOC. That initially looks surprisingly large for a UI framework feature, but most of it is not handwritten Reactor UI code:

  • Md4cParser.Inline.cs / Md4cParser.Block.cs are a C# port of md4c parsing logic.
  • Md4cUnicode.cs includes generated Unicode whitespace, punctuation/symbol, and case-folding tables.
  • Md4cEntity.cs includes the full HTML named entity table.
  • MarkdownHtml.cs is primarily useful for CommonMark/spec/fuzz validation rather than native Reactor rendering.

Proposed work

Investigate and implement safe reductions:

  1. Move MarkdownHtml out of the core Reactor assembly if it is only needed by CommonMark/spec/fuzz tests. A test-only renderer would keep parser validation coverage without shipping an HTML renderer as part of the core runtime surface.
  2. Replace the generated Unicode whitespace table with .NET built-in Unicode category checks where behavior matches md4c/CommonMark requirements.
  3. Replace the generated Unicode punctuation/symbol table with .NET Unicode category checks, preserving the current behavior that treats both punctuation and symbols as punctuation-like for delimiter logic.
  4. Keep the Unicode case-folding table unless we intentionally relax CommonMark reference-label compatibility. .NET lowercasing / ignore-case comparison does not expose full Unicode case folding, especially multi-codepoint folds such as ß -> ss, İ -> i + combining dot, and ligatures.
  5. Keep the HTML named entity table in core for now. Markdown() needs entity decoding for native text rendering, and System.Net.WebUtility.HtmlDecode does not cover the full HTML5/CommonMark entity set. Moving this table to a data file would reduce visible source LOC, but it is not automatically better for runtime: text resources add parsing cost, binary resources add custom loader complexity, and the data still needs to be loaded into an efficient lookup structure.

Notes

The goal is mostly to reduce source size, IDE/build/review noise, and shipped surface area. It is not to remove Markdown features or weaken safe defaults for untrusted Markdown content.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions