11# md4w
22
3- This is a WebAssembly port of
4- [ md4c ] ( https://github.com/mity/md4c ) - a Markdown parser written in C .
3+ A ** Markdown-to-HTML ** Parser written in Zig & C, compiled to WebAssymbly for
4+ all JavaScript runtimes .
55
6- - ** Fast** : written in C, compiled to WebAssembly (it's about 2x faster than markdown-it, see [ benchmark] ( #benchmark ) )
7- - ** Simple** : input markdown, output HTML
8- - ** Small** : ` ~25KB ` gzipped
9- - ** Universal** : works in any JavaScript environment
10- - ** Extensible** : supports custom extensions (WIP)
6+ - ** Fast** : written in Zig, powered by [ md4c] ( https://github.com/mity/md4c ) ,
7+ compiled to WebAssembly (it's about 2.5x faster than markdown-it, see
8+ [ benchmark] ( #benchmark ) ).
9+ - ** Small** : ` ~25KB ` gzipped.
10+ - ** Simple** : input markdown, output HTML.
11+ - ** Streaming** : supports streaming API for large markdown files.
12+ - ** Universal** : works in any JavaScript runtime (Node.js, Deno, Bun, Browsers,
13+ Cloudflare Workers, etc.).
1114
1215## Usage
1316
1417``` js
1518// npm i md4w (Node.js, Bun, Cloudflare Workers, etc.)
16- import { init , mdToHtml } from " md4w" ;
17- // or use the CDN version (Deno, Modern Browsers)
18- import { init , mdToHtml } from " https://esm.sh/md4w" ;
19+ import { init , mdToHtml , mdToReadableHtml } from " md4w" ;
20+ // or use the CDN url (Deno, Browsers)
21+ import { init , mdToHtml , mdToReadableHtml } from " https://esm.sh/md4w" ;
1922
23+ // waiting for md4w.wasm...
2024await init ();
21- console .log (mdToHtml (" # Hello, World!" ));
25+
26+ // markdown -> HTML
27+ const html = mdToHtml (" # Hello, World!" );
28+
29+ // markdown -> HTML (ReadableStream)
30+ const readable = mdToReadableHtml (" # Hello, World!" );
31+ const response = new Response (readable, {
32+ headers: { " Content-Type" : " text/html" },
33+ });
34+ ```
35+
36+ ## Parse Flags
37+
38+ By default, md4w uses the following parse flags:
39+
40+ - ` COLLAPSE_WHITESPACE ` : Collapse non-trivial whitespace into single space.
41+ - ` PERMISSIVE_ATX_HEADERS ` : Do not require space in ATX headers (` ###header ` ).
42+ - ` PERMISSIVE_URL_AUTO_LINKS ` : Recognize URLs as links.
43+ - ` STRIKETHROUGH ` : Text enclosed in tilde marks, e.g. ` ~foo bar~ `
44+ - ` TABLES ` : Support GitHub-style tables.
45+ - ` TASK_LISTS ` : Support GitHub-style task lists.
46+
47+ You can use the ` parseFlags ` option to change the parser behavior:
48+
49+ ``` ts
50+ mdToHtml (" # Hello, World!" , {
51+ parseFlags: {
52+ DEFAULT: true ,
53+ NO_HTML: true ,
54+ LATEX_MATHS_PANS: true ,
55+ // ... other parse flags
56+ },
57+ });
2258```
2359
60+ All available parse flags are:
61+
62+ ``` ts
63+ const ParseFlags = {
64+ /** Collapse non-trivial whitespace into single space. */
65+ COLLAPSE_WHITESPACE: 0x0001 ,
66+ /** Do not require space in ATX headers ( ###header ) */
67+ PERMISSIVE_ATX_HEADERS: 0x0002 ,
68+ /** Recognize URLs as links. */
69+ PERMISSIVE_URL_AUTO_LINKS: 0x0004 ,
70+ /** Recognize e-mails as links.*/
71+ PERMISSIVE_EMAIL_AUTO_LINKS: 0x0008 ,
72+ /** Disable indented code blocks. (Only fenced code works.) */
73+ NO_INDENTED_CODE_BLOCKS: 0x0010 ,
74+ /** Disable raw HTML blocks. */
75+ NO_HTML_BLOCKS: 0x0020 ,
76+ /** Disable raw HTML (inline). */
77+ NO_HTML_SPANS: 0x0040 ,
78+ /** Support GitHub-style tables. */
79+ TABLES: 0x0100 ,
80+ /** Support strike-through spans (text enclosed in tilde marks, e.g. ~foo bar~). */
81+ STRIKE_THROUGH: 0x0200 ,
82+ /** Support WWW autolinks (without proto; just 'www.') */
83+ PERMISSIVE_WWW_AUTO_LINKS: 0x0400 ,
84+ /** Support GitHub-style task lists. */
85+ TASKLISTS: 0x0800 ,
86+ /** Support LaTeX math spans ($...$) and LaTeX display math spans ($$...$$) are supported. (Note though that the HTML renderer outputs them verbatim in a custom tag <x-equation>.) */
87+ LATEX_MATHS_PANS: 0x1000 ,
88+ /** Support wiki-style links ([[link label]] and [[target article|link label]]) are supported. (Note that the HTML renderer outputs them in a custom tag <x-wikilink>.) */
89+ WIKI_LINKS: 0x2000 ,
90+ /** Denotes an underline instead of an ordinary emphasis or strong emphasis. */
91+ UNDERLINE: 0x4000 ,
92+ /** Using hard line breaks. */
93+ HARD_SOFT_BREAKS: 0x8000 ,
94+ /** Shorthand for NO_HTML_BLOCKS | NO_HTML_SPANS */
95+ NO_HTML: 0x00200 | 0x0040 ,
96+ /** Default flags: COLLAPSE_WHITESPACE | PERMISSIVE_ATX_HEADERS | PERMISSIVE_URL_AUTO_LINKS | STRIKETHROUGH | TABLES | TASK_LISTS */
97+ DEFAULT: 0x0001 | 0x0002 | 0x0004 | 0x0100 | 0x0200 | 0x0800 ,
98+ };
99+ ```
100+
101+ ## Code Highlighter
102+
103+ md4w would not add colors to the code blocks by default, however, we provide a
104+ ` setCodeHighlighter ` function to allow you to add any code highlighter you like.
105+
106+ ``` js
107+ import { setCodeHighlighter } from " md4w" ;
108+
109+ setCodeHighlighter ((code , lang ) => {
110+ // return highlighted code in html
111+ return ` <pre><code class="language-js"><span style="color:#green">...<span></code></pre>` ;
112+ });
113+ ```
114+
115+ > FYI: The output of the custom code highlighter would not be passed back to the
116+ > wasm module, no need to worry about the performance.
117+
118+ ## Streaming API
119+
120+ md4w supports streaming API for large markdown files, this also is useful for a
121+ http servert to stream the response.
122+
123+ ``` js
124+ import { mdToReadableHtml } from " md4w" ;
125+
126+ const largeMarkdown = ` # Hello, World!\n ` .repeat (1_000_000 );
127+ const readable = mdToReadableHtml (largeMarkdown);
128+
129+ // write to file
130+ const file = await Deno .open (" /foo/bar.html" , { write: true , create: true });
131+ readable .pipeTo (file .writable );
132+
133+ // or send to client
134+ const response = new Response (readable, {
135+ headers: { " Content-Type" : " text/html" },
136+ });
137+ ```
138+
139+ ### Buffer Size
140+
141+ By default, md4w uses a buffer size of ` 1KB ` for streaming, you can change it by adding the ` bufferSize ` option:
142+
143+ ``` js
144+ mdToReadableHtml (largeMarkdown, {
145+ bufferSize: 16 * 1024
146+ });
147+ ```
148+
149+ ### Caveats
150+
151+ The streaming API currently only uses the buffer for html output, you still need to load the raw markdown data into the memory.
152+
24153## Development
25154
26- The wasm binding layer is written in [ Zig] ( https://ziglang.org/ ) , ensure you
27- have it installed. Also the [ wasm-opt] ( https://github.com/WebAssembly/binaryen ) is
155+ The parser is written in [ Zig] ( https://ziglang.org/ ) , ensure you have it
156+ installed. Also the [ wasm-opt] ( https://github.com/WebAssembly/binaryen ) is
28157required to optimize the generated WebAssembly binary.
29158
30159``` bash
@@ -33,8 +162,19 @@ zig build && deno test -A
33162
34163## Benchmark
35164
36- ![ ] ( ./test/benchmark-screenshot.png )
165+ ![ screenshot ] ( ./test/benchmark-screenshot.png )
37166
38167``` bash
39168zig build && deno bench -A test/benchmark.js
40169```
170+
171+ ## Prior Art
172+
173+ - [ md4c] ( https://github.com/mity/md4c ) - C Markdown parser. Fast. SAX-like
174+ interface. Compliant to CommonMark specification.
175+ - [ markdown-wasm] ( https://github.com/rsms/markdown-wasm ) - Very fast Markdown
176+ parser and HTML generator implemented in WebAssembly, based on md4c.
177+
178+ ## License
179+
180+ MIT
0 commit comments