Added
MdParser\Options::headingAnchors: when true, every rendered<hN>gets anidattribute holding a GitHub-style slug of the heading's text. Slugs lowercase ASCII, replace whitespace runs with a single-, drop other ASCII punctuation, preserve UTF-8 multibyte bytes, and dedupe collisions with-1,-2, ... Headings whose text slugifies to nothing (pure punctuation) emit<hN>with no id rather thanid="". Coexists withsourcepos: theidlands beforedata-sourcepos.MdParser\Options::nofollowLinks: when true, every emitted<a href="...">getsrel="nofollow noopener noreferrer"injected for inline links, reference links, and autolinks. Applies totoHtml()andtoInlineHtml(). Anchors inside fenced or inline code are left untouched because cmark escapes them before reaching the postprocess step. In-document fragment anchors (href="#...", i.e. footnote references and backrefs) are intentionally skipped. Raw<script>/<style>regions underunsafe: trueare emitted verbatim so anchor-shaped substrings inside JavaScript or CSS are not corrupted.- Linux and macOS prebuilt binaries are now attached to every GitHub release (x86_64 + arm64 glibc Linux, x86_64 + arm64 macOS, PHP 8.4 and 8.5, NTS). PIE picks the matching
.sofirst and only falls back to a source build for combinations not covered by an asset (e.g. PHP 8.3, Alpine/musl, ZTS).composer.jsondeclaresdownload-url-method: ["pre-packaged-binary", "composer-default"]to opt into the prebuilt path.
Both new HTML-postprocess flags default to false. They are pure HTML post-passes; XML and AST output are unaffected. The static Parser::html() / Parser::xml() shortcuts use the module defaults and so do not apply either transform.
Heading anchors are positioned by rendering each AST heading standalone and locating its exact byte sequence in the document HTML, rather than by counting line-start <hN> tags. Under unsafe: true, raw HTML headings written directly in the markdown source are normally left alone and do not consume slugs intended for real headings. One documented limitation: if a raw HTML heading produces bytes identical to a later Markdown heading (e.g. <h1>same</h1> followed by # same), the byte-fingerprint search hits the raw heading first, the raw heading absorbs the id, and the real Markdown heading is left without one. A durable fix needs renderer-level heading-id support; until then, unsafe: true callers should not rely on heading-id stability when raw HTML headings can collide with real ones. Pinned in tests/030_anchor_unsafe_collision.phpt.
Changed
Parsernow caches a single cmark_parser per instance and reuses it acrosstoHtml/toXml/toAst/toInlineHtmlcalls.cmark_parser_finishresets the parser internally on every successful render, so the cached parser holds no state from prior input: no link reference definitions, no inline subject leftovers, no buffered partial input. After a render that did not complete cleanly the parser is rebuilt rather than reused. Pinned intests/033_parser_reuse_isolation.phpt.- cmark allocations now route through a Zend MM-backed
cmark_mem(ecalloc/erealloc/efree). cmark-side memory is now accounted bymemory_limit, surfaced bymemory_get_usage(), and cleaned up by Zend MM on bailout. Out-of-memory under hostile or oversized input goes through PHP's standardAllowed memory size exhaustedfatal instead of cmark's default-allocatorabort(). - AST node-type values, list type / delim values, and table alignment values are now permanent interned strings created at MINIT, eliminating ~1 emalloc + memcpy per AST node on
toAst(). - AST key strings (
type,children,literal,level, ...) are now permanent interned strings created at MINIT viazend_string_init_interned(..., true)instead of persistent non-internedzend_strings lazy-initialized on the firsttoAst()call. Permanent interned strings skip refcount mutation duringzend_hash_add_new, so concurrenttoAst()calls on a ZTS build no longer race the (non-atomic) shared refcount that the previous persistent strings carried. - AST node array preallocation bumped from
array_init_size(out, 8)to 16. The worst-case node (a list withsourcepos: true) carries 10 keys, so 8 forced a rehash on every list. 16 lands on the next power-of-two HT bucket size and avoids the rehash for every supported node shape. - HTML postprocess failure messages distinguish AST depth-cap (heading text exceeded
MDPARSER_MAX_AST_DEPTH) from cmark iterator/render allocation failure, instead of collapsing all three reasons into the generic "HTML postprocess allocation failure" string.
Fixed
Parser::toInlineHtml()no longer lets block-level markers (#,-,>,1., four-space indent, fenced/HTML blocks, thematic breaks) fire on lines after the first. The source-rewrite step now normalizes\r\nand lone\rto\n, collapses runs of newlines, drops leading/trailing newlines, and inserts a U+200B sentinel at the start of every physical line; the output stripper removes the wrapper plus every per-line sentinel. Multi-line input is therefore guaranteed to render as inline content.- PHP 8.6 compatibility: replaced
XtOffsetOfwithoffsetofthroughout the wrapper. php-src master removed theXtOffsetOfportability macro fromzend_portability.h;offsetoffrom<stddef.h>is the documented replacement and works on every PHP version mdparser supports. config.w32now listsmdparser_html_postprocess.cso Windows builds link successfully.
Security
- HTML postprocess no longer splices into raw-HTML attribute values, HTML comments, CDATA, or escapable-raw-text element bodies. Under
unsafe: true, tagfilter: false, nofollowLinks: true, attacker-authored bytes inside<title>,<textarea>,<iframe>,<noscript>,<xmp>,<noembed>,<noframes>,<plaintext>,<!-- … -->,<![CDATA[ … ]]>, or quoted attribute values like<div title='<a href="x">…'>previously matched the postprocessor's<a href="pattern and rewrote bytes inside those regions, producing malformed HTML that could splice attributes onto the surrounding tag. The skip-region scanner now covers all HTML5 raw-text / escapable-raw-text elements + comments + CDATA, and apply_transforms walks tag-by-tag (with quoted-attribute awareness) so positions inside attribute values are never visited as tag-starts. Same logic applies to the heading-anchor fingerprint search inresolve_heading_offsets, closing the comment / CDATA / textarea slug-hijack vector. Pinned intests/031_postprocess_attribute_safety.phpt. - Heading slugs now percent-encode invalid UTF-8 byte sequences (lone continuation bytes, overlong leads, truncated multi-byte sequences) instead of letting them land verbatim in
id="…". Valid UTF-8 multi-byte sequences (e.g.日本語) still pass through. Reachable when callers turn offvalidateUtf8. Parser::toInlineHtml()no longer pre-allocates4 * src_len + 3for the normalized scratch buffer. Newline-heavy input well below the documented 256 MB cap previously fataled on the scratch allocation under tightmemory_limit(40 MB of\nallocated ~168 MB even though the normalized buffer was empty). The scratch buffer now grows on demand viasmart_strand tracks the actual normalized size. Pinned intests/037_toinlinehtml_memory_limit.phpt.Optionsobjects built viaReflectionClass::newInstanceWithoutConstructor()are now rejected atParser::__construct()withMdParser\Exception. Previously, reading uninitialized typed properties returnedIS_NULLto silent property reads, so the parser cached an all-false mask (notablyvalidateUtf8: falseandtagfilter: false) while$parser->optionsstill threw on any property access. The constructor now bails before publishing$options, so a half-built Options can never reach cached parser state. Regression test intests/029_regressions.phpt.- Linux build compiled with
-fvisibility=hidden. Vendored cmark symbols (cmark_parser_new,cmark_release_plugins,CMARK_DEFAULT_MEM_ALLOCATOR, ...) and wrapper internals no longer appear inmdparser.so's dynamic symbol table; only PHP's requiredget_moduleis exported. Prevents symbol collisions with other extensions that vendor or link cmark. - Windows release workflow pins
php/php-windows-builder/*references to a commit SHA instead of the mutable@v1tag, so a moved or compromised tag cannot push DLLs into a release withcontents: write.