Skip to content

Implement branch_point/branch/fail backtracking#614

Open
stefanobaghino wants to merge 17 commits intotrishume:masterfrom
stefanobaghino:branch-point
Open

Implement branch_point/branch/fail backtracking#614
stefanobaghino wants to merge 17 commits intotrishume:masterfrom
stefanobaghino:branch-point

Conversation

@stefanobaghino
Copy link
Contributor

@stefanobaghino stefanobaghino commented Mar 21, 2026

Closes #271.

Summary

  • Implements the branch_point, branch, and fail directives from the
    Sublime Text syntax definition spec, enabling speculative parsing with
    backtracking when a chosen branch turns out to be wrong.
  • Handles cross-line backtracking: when fail fires on a later line than
    its branch_point, the parser re-parses intermediate lines under the
    corrected alternative and surfaces the corrected ops via a new
    ParseLineOutput.replayed field.
  • Breaking API change: ParseState::parse_line now returns
    Result<ParseLineOutput, ParsingError> instead of a bare Vec.
    Migration: append .ops to the old call site. Callers that care about
    cross-line accuracy should also inspect .replayed.

Design decisions

  • Ops are always returned immediately (not withheld while a branch point
    is active). Only raw line strings are buffered for potential replay.
    This keeps the common-case caller contract unchanged.
  • The inner parsing loop is extracted into parse_line_inner so that
    handle_fail can re-parse buffered lines without re-entrancy issues
    with the outer parse_line bookkeeping.
  • Branch points older than 128 lines are expired, matching Sublime Text
    behavior and bounding memory usage.

Known limitations

  • Callers that maintain a ScopeStack incrementally will have applied
    the original (potentially wrong) ops for buffered lines. When
    replayed is non-empty, they must rewind and re-apply. syntect's own
    highlighter modules use .ops only and are unaffected.

Test plan

  • cargo test — all 142 unit tests + 13 doc-tests pass
  • New branch_cross_line_backtrack test verifies cross-line replay
  • Public API snapshot updated and passing

Add support for Sublime Text build 4050+ backtracking constructs:
- branch_point/branch: saves a checkpoint and pushes the first
  alternative context, trying subsequent alternatives on fail
- fail: rewinds parser state to the named branch point and tries
  the next alternative

Branch points are invalidated when popped from the stack or after
128 lines, matching Sublime Text behavior.
When backtracking via fail, the next alternative was pushed without
the branch pattern's with_prototype and without emitting meta_scope/
meta_content_scope ops. Store with_prototype in BranchPoint and emit
the correct scope ops on fail-triggered context pushes.
When a `fail` fires on a different line from its `branch_point`, the
highlighting ops for intermediate lines have already been returned to
callers and cannot be revised. This change addresses that gap.

The `parse_line` return type changes from a bare `Vec` to a new
`ParseLineOutput` struct with two fields:

- `ops`: ops for the current line (same as the old return value)
- `replayed`: corrected ops for previously-buffered lines, populated
  only when a cross-line `fail` rewinds the parser state

`ParseState` now stores the raw line strings for all lines since the
most recent branch point (`pending_lines`). When `handle_fail` detects
that the branch point was set on an earlier line, it re-parses those
buffered lines using the rewound state and makes the corrected ops
available via `flushed_ops`, which are drained into `replayed` on the
next call to `parse_line`.

Callers that do not need cross-line accuracy can migrate with a minimal
change: `parse_line(...)?` becomes `parse_line(...)?.ops`. All internal
call sites (highlighter, html, easy) are updated accordingly.

A new test `branch_cross_line_backtrack` verifies that when `fail`
fires on line 2, the `replayed` field on that call contains the
corrected ops for line 1 (using the fallback branch, not the failed one).
The Julia syntax in bat uses branch_point/branch/fail, so our new
backtracking support correctly changes its highlighting output.
Regenerate the expected highlighted fixtures before comparing so the
test validates self-consistency rather than diffing against a stale
baseline committed in the bat repo.
@stefanobaghino
Copy link
Contributor Author

I have a fix locally for the clippy issues.

The regression test failure seem benign, as Julia's syntax seem to have been fixed. 🙂 Having a look at that, happy to take input though, if you have any.

Use $PWD/target/release instead of relative paths in PATH so the bat
binary remains reachable after scripts cd to subdirectories.
@stefanobaghino
Copy link
Contributor Author

stefanobaghino commented Mar 21, 2026

I believe 406273b and 591c638 should address the bat regression test failures (diff).

Address review feedback on PR trishume#614: restore the algorithmic
comments explaining capture ordering and the empty-capture guard,
which were inadvertently removed in a45c41e.
Address review feedback from trishume#614: when cross-line
branch_point/fail backtracking occurs, the ops returned for lines
before the fail resolves are speculative and may be wrong. Previously,
syntest evaluated assertions eagerly against these speculative ops and
never revisited them.

Now syntest buffers parsed line state (text, ScopeStack snapshot, and
evaluated assertions). When parse_line returns non-empty replayed ops,
it resets the scope stack to before the first replayed line, recomputes
scoped text from the corrected ops, and re-evaluates any assertions
that targeted those lines, adjusting the failure count accordingly.
The previous commit (81bd452) correctly re-evaluated assertion *counts*
on cross-line backtracking replay, but failure *messages* were still
printed eagerly against potentially speculative scopes. This meant a
user could see a failure message for scopes that would later be
corrected by a `fail` action, even though the final pass/fail count
was accurate.

This commit buffers failure messages and only flushes them once
`ParseState::is_speculative()` returns false (i.e. no active branch
points), or unconditionally at EOF. On replay, pending messages whose
`test_against_line_number` matches a replayed line are pruned and
regenerated from the corrected scopes.

Changes:
- Add `ParseState::is_speculative()` public accessor (checks
  `branch_points.is_empty()`) and update the public API snapshot.
- Introduce `BufferedFailureMessage` struct, `print_failure()`, and
  `flush_pending_messages()` helpers in syntest.
- Store `assertion_line_number` on `BufferedAssertion` so replayed
  failure messages can report the original assertion source line.
- Replace inline println with buffering; flush after parse_line when
  not speculative and at EOF.
- Prune + regenerate pending messages in the replay block.

Edge cases preserved:
- No branch points: flush after every parse_line (identical to before).
- Summary mode: flush still clears the buffer without printing.
- Debug mode: debug output remains eager (informational, not assertions).
Cover five previously untested code paths:
- Branch point expiry after 128 lines
- Stack-depth invalidation making fail a no-op
- Nested/overlapping branch points (inner fails, outer survives)
- fail referencing a non-existent branch point name
- Cross-line replay with multiple buffered lines
Add branch_point_still_valid_at_128_lines to verify the lower bound of
the 128-line expiry window, preventing an off-by-one mutation from
surviving. Add ops assertions to branch_cross_line_backtrack and
branch_cross_line_multi_replay to ensure ops.clear() is exercised,
catching stale try.* scopes leaking into current-line output.
- Strengthen branch_fail_after_partial_match: assert keyword.declaration
  is absent and string.unquoted starts at position 0 after backtrack
- Redesign branch_stack_depth_invalidation so FAIL actually reaches
  handle_fail: test both equal-depth (backtrack succeeds) and
  shallower-depth (no-op) scenarios
- Add branch_cross_line_fail_with_preceding_ops: fail line has content
  before FAIL so ops are non-empty and start > 0, verifying ops.clear()
  and start reset to 0
Strengthen 3 existing mutation-killing tests to actually detect their
targeted mutations, and add formatting fixes from cargo fmt:

- consuming_match_not_treated_as_loop: check push position (==0) instead
  of just counting occurrences, catching L533 > → < in find_best_match
- v2_embed_scope_replaces_skips_meta_content_pop_on_exit: deepen stack
  to 5 levels (via wrapper contexts) so L915 - → / produces a different
  index than -2, and verify source scope survives on the scope stack
  to catch L912/L913 >= → < mutations
- v2_set_clear_scopes_only_from_last_context: assert source.v2clear
  remains on the scope stack, catching L1029 && → || and == → !=

Remaining 6 MISSED mutations are all equivalent or impractical:
- L608/L611 (search): depth-100 guard and cache internals
- L709 (capture sort): constant offset in secondary sort key
- L771 (handle_fail): guard for unreachable stack state
@stefanobaghino
Copy link
Contributor Author

Thanks for the review, @keith-hall! Here's a summary of what I've done to address your comments:

Removed comments in exec_pattern — restored in e371486. The println!s were intentionally removed but the explanatory comments got caught in the crossfire. They're back now.

syntest should evaluate assertions only after the parser commits to a branch — fully agreed, and this turned out to be a meaningful body of work:

  • 81bd452 handles replayed ops in syntest so that cross-line backtracking produces correct scopes (when a fail rewinds to a previous line, the ops from the speculative branch are discarded and replaced by the winning branch's ops).
  • f7d1fb6 defers assertion output until the parser commits, using the new is_speculative() API. While the parser is still exploring branches, syntest buffers any assertion failures instead of reporting them immediately. Once the branch is resolved (either committed or failed and retried), it evaluates assertions against the final scopes. This also serves as an example of how to get the committed scopes, as you suggested.

On the testing side, I also ran cargo mutants against the changed code in parser.rs and wrote targeted tests to kill surviving mutations across find_best_match, search, exec_pattern, push_meta_ops, and handle_fail (978b5b2, 252b4a4, a8953e3). The only remaining survivors are equivalent mutations (e.g. constant offsets in secondary sort keys, guards for unreachable states) or impractical ones (depth-100 stack limits, internal cache behavior).

Copy link
Collaborator

@keith-hall keith-hall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (with the caveat that I haven't run the syntest example or anything to verify, but I imagine that will come next: as part of bumping the Sublime HQ Packages submodule and seeing what the state is).

I do strongly think it is worth adding a new API to the easy module in a future PR which would abstract away the complexities of dealing with the speculative states and just being able to send lines of text to the parser (in my mind it would handle the parse state itself, but I guess for caching purposes the caller may still want to know the state at each line), and perform a callback or something when a line has completed parsing and thus the scopes/operations are final.

@stefanobaghino
Copy link
Contributor Author

LGTM (with the caveat that I haven't run the syntest example or anything to verify, but I imagine that will come next: as part of bumping the Sublime HQ Packages submodule and seeing what the state is).

I was planning to work on #222 and then bump the packages and identify the gaps, if that works for you.

@keith-hall
Copy link
Collaborator

Even better, thank you so much for your hard work reviving syntect btw!

@stefanobaghino
Copy link
Contributor Author

Even better, thank you so much for your hard work reviving syntect btw!

I'm taking advantage of a sabbatical I had to take for personal reasons to address this mildly uncomfortable feeling of not seeing Java highlighted correctly in delta. 😂

@jrappen

This comment was marked as resolved.

@michaelblyons
Copy link

The 128 line limit is handled explicitly. Not sure about logging.

Surface 128-line branch point expiry as warnings in ParseLineOutput
instead of silently discarding, following the same pattern used for
syntax loading warnings. Rename the `ops` variable in the parsing
benchmark to `output` for consistency with all other call sites.
@stefanobaghino
Copy link
Contributor Author

On the 128-line limit: as @michaelblyons noted, it's already enforced — branch points are pruned at the start of every parse_line call and re-checked in handle_fail. There are tests for both the boundary (exactly 128 lines: still valid) and the expiry (129 lines: discarded). I've now also added structured warnings (following the same pattern as the syntax loading warnings) to ParseLineOutput so callers can see when a branch point expires. The existing tests now assert on the warning content too.

@stefanobaghino stefanobaghino changed the title Implement branch_point/branch/fail backtracking (issue #271) Implement branch_point/branch/fail backtracking Mar 24, 2026
@stefanobaghino
Copy link
Contributor Author

I tweaked the description to mention this closes #271 so that when it's merged the issue is closed automatically. Let me know if you have anything against it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sublime-syntax branch_point support

4 participants