Skip to content

Commit cd95152

Browse files
committed
Update readme
1 parent 5d0f71d commit cd95152

File tree

1 file changed

+154
-14
lines changed

1 file changed

+154
-14
lines changed

README.md

Lines changed: 154 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,159 @@
11
# md4w
22

3-
This is a WebAssembly port of
4-
[md4c](https://github.com/mity/md4c) - a Markdown parser written in C.
3+
A **Markdown-to-HTML** Parser written in Zig & C, compiled to WebAssymbly for
4+
all JavaScript runtimes.
55

6-
- **Fast**: written in C, compiled to WebAssembly (it's about 2x faster than markdown-it, see [benchmark](#benchmark))
7-
- **Simple**: input markdown, output HTML
8-
- **Small**: `~25KB` gzipped
9-
- **Universal**: works in any JavaScript environment
10-
- **Extensible**: supports custom extensions (WIP)
6+
- **Fast**: written in Zig, powered by [md4c](https://github.com/mity/md4c),
7+
compiled to WebAssembly (it's about 2.5x faster than markdown-it, see
8+
[benchmark](#benchmark)).
9+
- **Small**: `~25KB` gzipped.
10+
- **Simple**: input markdown, output HTML.
11+
- **Streaming**: supports streaming API for large markdown files.
12+
- **Universal**: works in any JavaScript runtime (Node.js, Deno, Bun, Browsers,
13+
Cloudflare Workers, etc.).
1114

1215
## Usage
1316

1417
```js
1518
// npm i md4w (Node.js, Bun, Cloudflare Workers, etc.)
16-
import { init, mdToHtml } from "md4w";
17-
// or use the CDN version (Deno, Modern Browsers)
18-
import { init, mdToHtml } from "https://esm.sh/md4w";
19+
import { init, mdToHtml, mdToReadableHtml } from "md4w";
20+
// or use the CDN url (Deno, Browsers)
21+
import { init, mdToHtml, mdToReadableHtml } from "https://esm.sh/md4w";
1922

23+
// waiting for md4w.wasm...
2024
await init();
21-
console.log(mdToHtml("# Hello, World!"));
25+
26+
// markdown -> HTML
27+
const html = mdToHtml("# Hello, World!");
28+
29+
// markdown -> HTML (ReadableStream)
30+
const readable = mdToReadableHtml("# Hello, World!");
31+
const response = new Response(readable, {
32+
headers: { "Content-Type": "text/html" },
33+
});
34+
```
35+
36+
## Parse Flags
37+
38+
By default, md4w uses the following parse flags:
39+
40+
- `COLLAPSE_WHITESPACE`: Collapse non-trivial whitespace into single space.
41+
- `PERMISSIVE_ATX_HEADERS`: Do not require space in ATX headers (`###header`).
42+
- `PERMISSIVE_URL_AUTO_LINKS`: Recognize URLs as links.
43+
- `STRIKETHROUGH`: Text enclosed in tilde marks, e.g. `~foo bar~`
44+
- `TABLES`: Support GitHub-style tables.
45+
- `TASK_LISTS`: Support GitHub-style task lists.
46+
47+
You can use the `parseFlags` option to change the parser behavior:
48+
49+
```ts
50+
mdToHtml("# Hello, World!", {
51+
parseFlags: {
52+
DEFAULT: true,
53+
NO_HTML: true,
54+
LATEX_MATHS_PANS: true,
55+
// ... other parse flags
56+
},
57+
});
2258
```
2359

60+
All available parse flags are:
61+
62+
```ts
63+
const ParseFlags = {
64+
/** Collapse non-trivial whitespace into single space. */
65+
COLLAPSE_WHITESPACE: 0x0001,
66+
/** Do not require space in ATX headers ( ###header ) */
67+
PERMISSIVE_ATX_HEADERS: 0x0002,
68+
/** Recognize URLs as links. */
69+
PERMISSIVE_URL_AUTO_LINKS: 0x0004,
70+
/** Recognize e-mails as links.*/
71+
PERMISSIVE_EMAIL_AUTO_LINKS: 0x0008,
72+
/** Disable indented code blocks. (Only fenced code works.) */
73+
NO_INDENTED_CODE_BLOCKS: 0x0010,
74+
/** Disable raw HTML blocks. */
75+
NO_HTML_BLOCKS: 0x0020,
76+
/** Disable raw HTML (inline). */
77+
NO_HTML_SPANS: 0x0040,
78+
/** Support GitHub-style tables. */
79+
TABLES: 0x0100,
80+
/** Support strike-through spans (text enclosed in tilde marks, e.g. ~foo bar~). */
81+
STRIKE_THROUGH: 0x0200,
82+
/** Support WWW autolinks (without proto; just 'www.') */
83+
PERMISSIVE_WWW_AUTO_LINKS: 0x0400,
84+
/** Support GitHub-style task lists. */
85+
TASKLISTS: 0x0800,
86+
/** Support LaTeX math spans ($...$) and LaTeX display math spans ($$...$$) are supported. (Note though that the HTML renderer outputs them verbatim in a custom tag <x-equation>.) */
87+
LATEX_MATHS_PANS: 0x1000,
88+
/** Support wiki-style links ([[link label]] and [[target article|link label]]) are supported. (Note that the HTML renderer outputs them in a custom tag <x-wikilink>.) */
89+
WIKI_LINKS: 0x2000,
90+
/** Denotes an underline instead of an ordinary emphasis or strong emphasis. */
91+
UNDERLINE: 0x4000,
92+
/** Using hard line breaks. */
93+
HARD_SOFT_BREAKS: 0x8000,
94+
/** Shorthand for NO_HTML_BLOCKS | NO_HTML_SPANS */
95+
NO_HTML: 0x00200 | 0x0040,
96+
/** Default flags: COLLAPSE_WHITESPACE | PERMISSIVE_ATX_HEADERS | PERMISSIVE_URL_AUTO_LINKS | STRIKETHROUGH | TABLES | TASK_LISTS */
97+
DEFAULT: 0x0001 | 0x0002 | 0x0004 | 0x0100 | 0x0200 | 0x0800,
98+
};
99+
```
100+
101+
## Code Highlighter
102+
103+
md4w would not add colors to the code blocks by default, however, we provide a
104+
`setCodeHighlighter` function to allow you to add any code highlighter you like.
105+
106+
```js
107+
import { setCodeHighlighter } from "md4w";
108+
109+
setCodeHighlighter((code, lang) => {
110+
// return highlighted code in html
111+
return `<pre><code class="language-js"><span style="color:#green">...<span></code></pre>`;
112+
});
113+
```
114+
115+
> FYI: The output of the custom code highlighter would not be passed back to the
116+
> wasm module, no need to worry about the performance.
117+
118+
## Streaming API
119+
120+
md4w supports streaming API for large markdown files, this also is useful for a
121+
http servert to stream the response.
122+
123+
```js
124+
import { mdToReadableHtml } from "md4w";
125+
126+
const largeMarkdown = `# Hello, World!\n`.repeat(1_000_000);
127+
const readable = mdToReadableHtml(largeMarkdown);
128+
129+
// write to file
130+
const file = await Deno.open("/foo/bar.html", { write: true, create: true });
131+
readable.pipeTo(file.writable);
132+
133+
// or send to client
134+
const response = new Response(readable, {
135+
headers: { "Content-Type": "text/html" },
136+
});
137+
```
138+
139+
### Buffer Size
140+
141+
By default, md4w uses a buffer size of `1KB` for streaming, you can change it by adding the `bufferSize` option:
142+
143+
```js
144+
mdToReadableHtml(largeMarkdown, {
145+
bufferSize: 16 * 1024
146+
});
147+
```
148+
149+
### Caveats
150+
151+
The streaming API currently only uses the buffer for html output, you still need to load the raw markdown data into the memory.
152+
24153
## Development
25154

26-
The wasm binding layer is written in [Zig](https://ziglang.org/), ensure you
27-
have it installed. Also the [wasm-opt](https://github.com/WebAssembly/binaryen) is
155+
The parser is written in [Zig](https://ziglang.org/), ensure you have it
156+
installed. Also the [wasm-opt](https://github.com/WebAssembly/binaryen) is
28157
required to optimize the generated WebAssembly binary.
29158

30159
```bash
@@ -33,8 +162,19 @@ zig build && deno test -A
33162

34163
## Benchmark
35164

36-
![](./test/benchmark-screenshot.png)
165+
![screenshot](./test/benchmark-screenshot.png)
37166

38167
```bash
39168
zig build && deno bench -A test/benchmark.js
40169
```
170+
171+
## Prior Art
172+
173+
- [md4c](https://github.com/mity/md4c) - C Markdown parser. Fast. SAX-like
174+
interface. Compliant to CommonMark specification.
175+
- [markdown-wasm](https://github.com/rsms/markdown-wasm) - Very fast Markdown
176+
parser and HTML generator implemented in WebAssembly, based on md4c.
177+
178+
## License
179+
180+
MIT

0 commit comments

Comments
 (0)