|
1 | 1 | # tree-sitter-markdown |
2 | | -A markdown parser for tree-sitter |
| 2 | +A markdown parser for [tree-sitter]. |
3 | 3 |
|
4 | | -For now this implements the [CommonMark Spec](https://spec.commonmark.org/). Maybe it will be extended to support [Github flavored markdown](https://github.github.com/gfm/) |
| 4 | +The parser is designed to read markdown according to the [CommonMark Spec], |
| 5 | +but some extensions to the spec from different sources such as [Github flavored |
| 6 | +markdown] are also included. These can be toggled on or off at compile time. |
| 7 | +For specifics see [Extensions](#extensions) |
5 | 8 |
|
6 | | -## Structure |
| 9 | +## Goals |
7 | 10 |
|
8 | | -The parser is spit into two grammars. One for the [block structure](https://spec.commonmark.org/0.30/#blocks-and-inlines) which can be found in `/tree-sitter-markdown` and one for the [inline structure](https://spec.commonmark.org/0.30/#inlines) which is in `/tree-sitter-markdown-inline`. |
9 | | -Because of this the entire document has to be scanned twice in order to be fully parsed. |
10 | | -This is motivated by the [parsing strategy section](https://spec.commonmark.org/0.30/#appendix-a-parsing-strategy) of the CommonMark Spec which suggests doing exactly this: Parsing the document twice, first determining the block structure and then parsing any inline content. |
| 11 | +Even though this parser has existed for some while and obvious issues are |
| 12 | +mostly solved, there are still lots of inaccuarcies in the output. These stem |
| 13 | +from restricting a complex format such as markdown to the quite restricting |
| 14 | +tree-sitter parsing rules. |
11 | 15 |
|
12 | | -It also helps managing complexity, which was a problem with earlier versions of this parser, by allowing block and inline structure to be considered seperately. This was not the case as tree-sitters dynamic precedence can create hard to predict effects. |
| 16 | +As such it is not recommended to use this parser where correctness is |
| 17 | +important. The main goal for this parser is to provide syntactical information |
| 18 | +for syntax highlighting in parsers such as [neovim] and [helix]. |
13 | 19 |
|
14 | | -## Usage |
| 20 | +## Contributing |
15 | 21 |
|
16 | | -To use the two grammars, first parse the document with the block grammar. Then perform a second parse with the inline grammar using `ts_parser_set_included_ranges` to specify which parts are inline content. These parts are marked as `inline` nodes. Children of those inline nodes should be excluded from these ranges. For an example implementation see `lib.rs` in the `bindings` folder. |
| 22 | +All contributions are welcome. For details refer to [CONTRIBUTING.md]. |
| 23 | + |
| 24 | +## Extensions |
| 25 | + |
| 26 | +Extensions can be enabled at compile time through environment variables. Some |
| 27 | +of them are on by default, these can be disabled with the environment variable |
| 28 | +`NO_DEFAULT_EXTENSIONS`. |
| 29 | + |
| 30 | +| Name | Environment variable | Specification | Default | Also enables | |
| 31 | +|:----:|:--------------------:|:-------------:|:-------:|:------------:| |
| 32 | +| Github flavored markdown | `EXTENSION_GFM` | [link](https://github.github.com/gfm/) | ✓ | Task lists, strikethrough, pipe tables | |
| 33 | +| Task lists | `EXTENSION_TASK_LIST` | [link](https://github.github.com/gfm/#task-list-items-extension-) | ✓ | | |
| 34 | +| Strikethrough | `EXTENSION_STRIKETHROUGH` | [link](https://github.github.com/gfm/#strikethrough-extension-) | ✓ | | |
| 35 | +| Pipe tables | `EXTENSION_PIPE_TABLE` | [link](https://github.github.com/gfm/#tables-extension-) | ✓ | | |
| 36 | +| YAML metadata | `EXTENSION_MINUS_METADATA` | [link](https://gohugo.io/content-management/front-matter/) | ✓ | | |
| 37 | +| TOML metadata | `EXTENSION_PLUS_METADATA` | [link](https://gohugo.io/content-management/front-matter/) | ✓ | | |
| 38 | +| Tags | `EXTENSION_TAGS` | [link](https://help.obsidian.md/Editing+and+formatting/Tags#Tag+format) | | | |
| 39 | + |
| 40 | +## Usage in Editors |
| 41 | + |
| 42 | +For guides on how to use this parser in a specific editor, refer to that |
| 43 | +editor's specific documentation, e.g. |
| 44 | +* [neovim](https://github.com/nvim-treesitter/nvim-treesitter) |
| 45 | +* [helix](https://docs.helix-editor.com/guides/adding_languages.html) |
| 46 | + |
| 47 | +## Standalone usage |
| 48 | + |
| 49 | +To use the two grammars, first parse the document with the block |
| 50 | +grammar. Then perform a second parse with the inline grammar using |
| 51 | +`ts_parser_set_included_ranges` to specify which parts are inline content. |
| 52 | +These parts are marked as `inline` nodes. Children of those inline nodes should |
| 53 | +be excluded from these ranges. For an example implementation see `lib.rs` in |
| 54 | +the `bindings` folder. |
| 55 | + |
| 56 | +[CommonMark Spec]: https://spec.commonmark.org/ |
| 57 | +[Github flavored markdown]: https://github.github.com/gfm/ |
| 58 | +[tree-sitter]: https://tree-sitter.github.io/tree-sitter/ |
| 59 | +[neovim]: https://neovim.io/ |
| 60 | +[helix]: https://helix-editor.com/ |
| 61 | +[CONTRIBUTING.md]: https://github.com/MDeiml/tree-sitter-markdown/blob/split_parser/CONTRIBUTING.md |
0 commit comments