Skip to content

Commit 3a392ff

Browse files
committed
Update readme
1 parent 7e7aa9a commit 3a392ff

File tree

1 file changed

+54
-9
lines changed

1 file changed

+54
-9
lines changed

README.md

Lines changed: 54 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,61 @@
11
# tree-sitter-markdown
2-
A markdown parser for tree-sitter
2+
A markdown parser for [tree-sitter].
33

4-
For now this implements the [CommonMark Spec](https://spec.commonmark.org/). Maybe it will be extended to support [Github flavored markdown](https://github.github.com/gfm/)
4+
The parser is designed to read markdown according to the [CommonMark Spec],
5+
but some extensions to the spec from different sources such as [Github flavored
6+
markdown] are also included. These can be toggled on or off at compile time.
7+
For specifics see [Extensions](#extensions)
58

6-
## Structure
9+
## Goals
710

8-
The parser is spit into two grammars. One for the [block structure](https://spec.commonmark.org/0.30/#blocks-and-inlines) which can be found in `/tree-sitter-markdown` and one for the [inline structure](https://spec.commonmark.org/0.30/#inlines) which is in `/tree-sitter-markdown-inline`.
9-
Because of this the entire document has to be scanned twice in order to be fully parsed.
10-
This is motivated by the [parsing strategy section](https://spec.commonmark.org/0.30/#appendix-a-parsing-strategy) of the CommonMark Spec which suggests doing exactly this: Parsing the document twice, first determining the block structure and then parsing any inline content.
11+
Even though this parser has existed for some while and obvious issues are
12+
mostly solved, there are still lots of inaccuarcies in the output. These stem
13+
from restricting a complex format such as markdown to the quite restricting
14+
tree-sitter parsing rules.
1115

12-
It also helps managing complexity, which was a problem with earlier versions of this parser, by allowing block and inline structure to be considered seperately. This was not the case as tree-sitters dynamic precedence can create hard to predict effects.
16+
As such it is not recommended to use this parser where correctness is
17+
important. The main goal for this parser is to provide syntactical information
18+
for syntax highlighting in parsers such as [neovim] and [helix].
1319

14-
## Usage
20+
## Contributing
1521

16-
To use the two grammars, first parse the document with the block grammar. Then perform a second parse with the inline grammar using `ts_parser_set_included_ranges` to specify which parts are inline content. These parts are marked as `inline` nodes. Children of those inline nodes should be excluded from these ranges. For an example implementation see `lib.rs` in the `bindings` folder.
22+
All contributions are welcome. For details refer to [CONTRIBUTING.md].
23+
24+
## Extensions
25+
26+
Extensions can be enabled at compile time through environment variables. Some
27+
of them are on by default, these can be disabled with the environment variable
28+
`NO_DEFAULT_EXTENSIONS`.
29+
30+
| Name | Environment variable | Specification | Default | Also enables |
31+
|:----:|:--------------------:|:-------------:|:-------:|:------------:|
32+
| Github flavored markdown | `EXTENSION_GFM` | [link](https://github.github.com/gfm/) || Task lists, strikethrough, pipe tables |
33+
| Task lists | `EXTENSION_TASK_LIST` | [link](https://github.github.com/gfm/#task-list-items-extension-) || |
34+
| Strikethrough | `EXTENSION_STRIKETHROUGH` | [link](https://github.github.com/gfm/#strikethrough-extension-) || |
35+
| Pipe tables | `EXTENSION_PIPE_TABLE` | [link](https://github.github.com/gfm/#tables-extension-) || |
36+
| YAML metadata | `EXTENSION_MINUS_METADATA` | [link](https://gohugo.io/content-management/front-matter/) || |
37+
| TOML metadata | `EXTENSION_PLUS_METADATA` | [link](https://gohugo.io/content-management/front-matter/) || |
38+
| Tags | `EXTENSION_TAGS` | [link](https://help.obsidian.md/Editing+and+formatting/Tags#Tag+format) | | |
39+
40+
## Usage in Editors
41+
42+
For guides on how to use this parser in a specific editor, refer to that
43+
editor's specific documentation, e.g.
44+
* [neovim](https://github.com/nvim-treesitter/nvim-treesitter)
45+
* [helix](https://docs.helix-editor.com/guides/adding_languages.html)
46+
47+
## Standalone usage
48+
49+
To use the two grammars, first parse the document with the block
50+
grammar. Then perform a second parse with the inline grammar using
51+
`ts_parser_set_included_ranges` to specify which parts are inline content.
52+
These parts are marked as `inline` nodes. Children of those inline nodes should
53+
be excluded from these ranges. For an example implementation see `lib.rs` in
54+
the `bindings` folder.
55+
56+
[CommonMark Spec]: https://spec.commonmark.org/
57+
[Github flavored markdown]: https://github.github.com/gfm/
58+
[tree-sitter]: https://tree-sitter.github.io/tree-sitter/
59+
[neovim]: https://neovim.io/
60+
[helix]: https://helix-editor.com/
61+
[CONTRIBUTING.md]: https://github.com/MDeiml/tree-sitter-markdown/blob/split_parser/CONTRIBUTING.md

0 commit comments

Comments
 (0)