-
Notifications
You must be signed in to change notification settings - Fork 99
Description
Description
The parser creates massive overlapping strong_emphasis ranges when bold text ends with a colon followed by more bold text in a list or new line.
Minimal Reproduction
**What's missing:**
1. **Streaming the work as it happens**
2. **Visibility of what's running**
**Possible solutions:**
1. Push notificationsExpected Behavior
Each bold section should get its own separate strong_emphasis range:
**What's missing:**→ range of ~20 characters**Streaming the work as it happens**→ range of ~40 characters**Visibility of what's running**→ range of ~35 characters**Possible solutions:**→ range of ~25 characters
Actual Behavior
The parser creates massive overlapping ranges:
- Range 1: 155 characters covering
**What's missing:**\n1. **Streaming the work as it happens**(incorrect) - Range 2: 90 characters overlapping range 1 (incorrect)
- Range 3: 36 characters
**Streaming the work as it happens**(correct) - Range 4: 826 characters covering
**Possible solutions:**and everything after it (incorrect) - Range 5: 786 characters overlapping range 4 (incorrect)
- Range 6: 24 characters
**Possible solutions:**(correct)
Analysis
The :** pattern at the end of bold text appears to confuse the parser's delimiter matching:
- Parser sees
**What's missing:**but doesn't close it properly - Continues reading through the newline and numbered list
- Encounters the next
**and creates multiple interpretations of where the bold region ends - Results in overlapping
strong_emphasiscaptures from different parse tree levels
Impact
This causes syntax highlighting to break:
- When multiple overlapping ranges are applied, the last (largest) range wins
- Large sections of text get incorrectly highlighted as bold
- Affects real-world markdown documents (we encountered this in production)
Pattern Details
The bug specifically occurs with:
- Bold text ending with punctuation (
:) - Followed by a newline
- Followed by another bold section (with or without list markers)
Works correctly:
**What's missing**
1. **Item one**Breaks:
**What's missing:**
1. **Item one**Testing Details
Tested with SwiftTreeSitterLayer using:
- tree-sitter-markdown parser (split_parser branch)
- markdown + markdown_inline injection
- Query:
(strong_emphasis) @text.strong
Workaround
Currently filtering ranges that match both:
- Contains the
:**pattern - Length > 50 characters
This filters the incorrect large ranges while keeping legitimate bold text.
Environment
- Parser: tree-sitter-markdown (tree-sitter-grammars organization)
- Branch: split_parser
- Language: markdown + markdown_inline with injection
- Integration: SwiftTreeSitterLayer
Related
This is a known limitation mentioned in the README:
"There are inaccuracies in the output stemming from restricting markdown's complex format to tree-sitter's parsing rules"
However, this specific pattern (:** in bold text) creates particularly problematic overlapping ranges that break syntax highlighting in practice.