Skip to content

bug: Bold text ending with colon creates massive overlapping ranges #209

@junhongwang418

Description

@junhongwang418

Description

The parser creates massive overlapping strong_emphasis ranges when bold text ends with a colon followed by more bold text in a list or new line.

Minimal Reproduction

**What's missing:**
1. **Streaming the work as it happens**
2. **Visibility of what's running**

**Possible solutions:**
1. Push notifications

Expected Behavior

Each bold section should get its own separate strong_emphasis range:

  • **What's missing:** → range of ~20 characters
  • **Streaming the work as it happens** → range of ~40 characters
  • **Visibility of what's running** → range of ~35 characters
  • **Possible solutions:** → range of ~25 characters

Actual Behavior

The parser creates massive overlapping ranges:

  • Range 1: 155 characters covering **What's missing:**\n1. **Streaming the work as it happens** (incorrect)
  • Range 2: 90 characters overlapping range 1 (incorrect)
  • Range 3: 36 characters **Streaming the work as it happens** (correct)
  • Range 4: 826 characters covering **Possible solutions:** and everything after it (incorrect)
  • Range 5: 786 characters overlapping range 4 (incorrect)
  • Range 6: 24 characters **Possible solutions:** (correct)

Analysis

The :** pattern at the end of bold text appears to confuse the parser's delimiter matching:

  1. Parser sees **What's missing:** but doesn't close it properly
  2. Continues reading through the newline and numbered list
  3. Encounters the next ** and creates multiple interpretations of where the bold region ends
  4. Results in overlapping strong_emphasis captures from different parse tree levels

Impact

This causes syntax highlighting to break:

  • When multiple overlapping ranges are applied, the last (largest) range wins
  • Large sections of text get incorrectly highlighted as bold
  • Affects real-world markdown documents (we encountered this in production)

Pattern Details

The bug specifically occurs with:

  • Bold text ending with punctuation (:)
  • Followed by a newline
  • Followed by another bold section (with or without list markers)

Works correctly:

**What's missing**
1. **Item one**

Breaks:

**What's missing:**
1. **Item one**

Testing Details

Tested with SwiftTreeSitterLayer using:

  • tree-sitter-markdown parser (split_parser branch)
  • markdown + markdown_inline injection
  • Query: (strong_emphasis) @text.strong

Workaround

Currently filtering ranges that match both:

  • Contains the :** pattern
  • Length > 50 characters

This filters the incorrect large ranges while keeping legitimate bold text.

Environment

  • Parser: tree-sitter-markdown (tree-sitter-grammars organization)
  • Branch: split_parser
  • Language: markdown + markdown_inline with injection
  • Integration: SwiftTreeSitterLayer

Related

This is a known limitation mentioned in the README:

"There are inaccuracies in the output stemming from restricting markdown's complex format to tree-sitter's parsing rules"

However, this specific pattern (:** in bold text) creates particularly problematic overlapping ranges that break syntax highlighting in practice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions