Implement branch_point/branch/fail backtracking by stefanobaghino · Pull Request #614 · trishume/syntect

stefanobaghino · 2026-03-21T15:42:24Z

Closes #271.

Summary

Implements the branch_point, branch, and fail directives from the
Sublime Text syntax definition spec, enabling speculative parsing with
backtracking when a chosen branch turns out to be wrong.
Handles cross-line backtracking: when fail fires on a later line than
its branch_point, the parser re-parses intermediate lines under the
corrected alternative and surfaces the corrected ops via a new
ParseLineOutput.replayed field.
Breaking API change: ParseState::parse_line now returns
Result<ParseLineOutput, ParsingError> instead of a bare Vec.
Migration: append .ops to the old call site. Callers that care about
cross-line accuracy should also inspect .replayed.

Design decisions

Ops are always returned immediately (not withheld while a branch point
is active). Only raw line strings are buffered for potential replay.
This keeps the common-case caller contract unchanged.
The inner parsing loop is extracted into parse_line_inner so that
handle_fail can re-parse buffered lines without re-entrancy issues
with the outer parse_line bookkeeping.
Branch points older than 128 lines are expired, matching Sublime Text
behavior and bounding memory usage.

Known limitations

Callers that maintain a ScopeStack incrementally will have applied
the original (potentially wrong) ops for buffered lines. When
replayed is non-empty, they must rewind and re-apply. syntect's own
highlighter modules use .ops only and are unaffected.

Test plan

cargo test — all 142 unit tests + 13 doc-tests pass
New branch_cross_line_backtrack test verifies cross-line replay
Public API snapshot updated and passing

Add support for Sublime Text build 4050+ backtracking constructs: - branch_point/branch: saves a checkpoint and pushes the first alternative context, trying subsequent alternatives on fail - fail: rewinds parser state to the named branch point and tries the next alternative Branch points are invalidated when popped from the stack or after 128 lines, matching Sublime Text behavior.

When backtracking via fail, the next alternative was pushed without the branch pattern's with_prototype and without emitting meta_scope/ meta_content_scope ops. Store with_prototype in BranchPoint and emit the correct scope ops on fail-triggered context pushes.

When a `fail` fires on a different line from its `branch_point`, the highlighting ops for intermediate lines have already been returned to callers and cannot be revised. This change addresses that gap. The `parse_line` return type changes from a bare `Vec` to a new `ParseLineOutput` struct with two fields: - `ops`: ops for the current line (same as the old return value) - `replayed`: corrected ops for previously-buffered lines, populated only when a cross-line `fail` rewinds the parser state `ParseState` now stores the raw line strings for all lines since the most recent branch point (`pending_lines`). When `handle_fail` detects that the branch point was set on an earlier line, it re-parses those buffered lines using the rewound state and makes the corrected ops available via `flushed_ops`, which are drained into `replayed` on the next call to `parse_line`. Callers that do not need cross-line accuracy can migrate with a minimal change: `parse_line(...)?` becomes `parse_line(...)?.ops`. All internal call sites (highlighter, html, easy) are updated accordingly. A new test `branch_cross_line_backtrack` verifies that when `fail` fires on line 2, the `replayed` field on that call contains the corrected ops for line 1 (using the fallback branch, not the failed one).

The Julia syntax in bat uses branch_point/branch/fail, so our new backtracking support correctly changes its highlighting output. Regenerate the expected highlighted fixtures before comparing so the test validates self-consistency rather than diffing against a stale baseline committed in the bat repo.

stefanobaghino · 2026-03-21T16:57:27Z

I have a fix locally for the clippy issues.

The regression test failure seem benign, as Julia's syntax seem to have been fixed. 🙂 Having a look at that, happy to take input though, if you have any.

Use $PWD/target/release instead of relative paths in PATH so the bat binary remains reachable after scripts cd to subdirectories.

stefanobaghino · 2026-03-21T19:06:49Z

I believe 406273b and 591c638 should address the bat regression test failures (diff).

src/parsing/parser.rs

examples/syntest.rs

Address review feedback on PR trishume#614: restore the algorithmic comments explaining capture ordering and the empty-capture guard, which were inadvertently removed in a45c41e.

Address review feedback from trishume#614: when cross-line branch_point/fail backtracking occurs, the ops returned for lines before the fail resolves are speculative and may be wrong. Previously, syntest evaluated assertions eagerly against these speculative ops and never revisited them. Now syntest buffers parsed line state (text, ScopeStack snapshot, and evaluated assertions). When parse_line returns non-empty replayed ops, it resets the scope stack to before the first replayed line, recomputes scoped text from the corrected ops, and re-evaluates any assertions that targeted those lines, adjusting the failure count accordingly.

The previous commit (81bd452) correctly re-evaluated assertion *counts* on cross-line backtracking replay, but failure *messages* were still printed eagerly against potentially speculative scopes. This meant a user could see a failure message for scopes that would later be corrected by a `fail` action, even though the final pass/fail count was accurate. This commit buffers failure messages and only flushes them once `ParseState::is_speculative()` returns false (i.e. no active branch points), or unconditionally at EOF. On replay, pending messages whose `test_against_line_number` matches a replayed line are pruned and regenerated from the corrected scopes. Changes: - Add `ParseState::is_speculative()` public accessor (checks `branch_points.is_empty()`) and update the public API snapshot. - Introduce `BufferedFailureMessage` struct, `print_failure()`, and `flush_pending_messages()` helpers in syntest. - Store `assertion_line_number` on `BufferedAssertion` so replayed failure messages can report the original assertion source line. - Replace inline println with buffering; flush after parse_line when not speculative and at EOF. - Prune + regenerate pending messages in the replay block. Edge cases preserved: - No branch points: flush after every parse_line (identical to before). - Summary mode: flush still clears the buffer without printing. - Debug mode: debug output remains eager (informational, not assertions).

Cover five previously untested code paths: - Branch point expiry after 128 lines - Stack-depth invalidation making fail a no-op - Nested/overlapping branch points (inner fails, outer survives) - fail referencing a non-existent branch point name - Cross-line replay with multiple buffered lines

Add branch_point_still_valid_at_128_lines to verify the lower bound of the 128-line expiry window, preventing an off-by-one mutation from surviving. Add ops assertions to branch_cross_line_backtrack and branch_cross_line_multi_replay to ensure ops.clear() is exercised, catching stale try.* scopes leaking into current-line output.

- Strengthen branch_fail_after_partial_match: assert keyword.declaration is absent and string.unquoted starts at position 0 after backtrack - Redesign branch_stack_depth_invalidation so FAIL actually reaches handle_fail: test both equal-depth (backtrack succeeds) and shallower-depth (no-op) scenarios - Add branch_cross_line_fail_with_preceding_ops: fail line has content before FAIL so ops are non-empty and start > 0, verifying ops.clear() and start reset to 0

Strengthen 3 existing mutation-killing tests to actually detect their targeted mutations, and add formatting fixes from cargo fmt: - consuming_match_not_treated_as_loop: check push position (==0) instead of just counting occurrences, catching L533 > → < in find_best_match - v2_embed_scope_replaces_skips_meta_content_pop_on_exit: deepen stack to 5 levels (via wrapper contexts) so L915 - → / produces a different index than -2, and verify source scope survives on the scope stack to catch L912/L913 >= → < mutations - v2_set_clear_scopes_only_from_last_context: assert source.v2clear remains on the scope stack, catching L1029 && → || and == → != Remaining 6 MISSED mutations are all equivalent or impractical: - L608/L611 (search): depth-100 guard and cache internals - L709 (capture sort): constant offset in secondary sort key - L771 (handle_fail): guard for unreachable stack state

stefanobaghino · 2026-03-23T06:25:48Z

Thanks for the review, @keith-hall! Here's a summary of what I've done to address your comments:

Removed comments in exec_pattern — restored in e371486. The println!s were intentionally removed but the explanatory comments got caught in the crossfire. They're back now.

syntest should evaluate assertions only after the parser commits to a branch — fully agreed, and this turned out to be a meaningful body of work:

81bd452 handles replayed ops in syntest so that cross-line backtracking produces correct scopes (when a fail rewinds to a previous line, the ops from the speculative branch are discarded and replaced by the winning branch's ops).
f7d1fb6 defers assertion output until the parser commits, using the new is_speculative() API. While the parser is still exploring branches, syntest buffers any assertion failures instead of reporting them immediately. Once the branch is resolved (either committed or failed and retried), it evaluates assertions against the final scopes. This also serves as an example of how to get the committed scopes, as you suggested.

On the testing side, I also ran cargo mutants against the changed code in parser.rs and wrote targeted tests to kill surviving mutations across find_best_match, search, exec_pattern, push_meta_ops, and handle_fail (978b5b2, 252b4a4, a8953e3). The only remaining survivors are equivalent mutations (e.g. constant offsets in secondary sort keys, guards for unreachable states) or impractical ones (depth-100 stack limits, internal cache behavior).

keith-hall

LGTM (with the caveat that I haven't run the syntest example or anything to verify, but I imagine that will come next: as part of bumping the Sublime HQ Packages submodule and seeing what the state is).

I do strongly think it is worth adding a new API to the easy module in a future PR which would abstract away the complexities of dealing with the speculative states and just being able to send lines of text to the parser (in my mind it would handle the parse state itself, but I guess for caching purposes the caller may still want to know the state at each line), and perform a callback or something when a line has completed parsing and thus the scopes/operations are final.

stefanobaghino · 2026-03-23T07:43:27Z

LGTM (with the caveat that I haven't run the syntest example or anything to verify, but I imagine that will come next: as part of bumping the Sublime HQ Packages submodule and seeing what the state is).

I was planning to work on #222 and then bump the packages and identify the gaps, if that works for you.

keith-hall · 2026-03-23T07:44:48Z

Even better, thank you so much for your hard work reviving syntect btw!

stefanobaghino · 2026-03-23T07:48:17Z

Even better, thank you so much for your hard work reviving syntect btw!

I'm taking advantage of a sabbatical I had to take for personal reasons to address this mildly uncomfortable feeling of not seeing Java highlighted correctly in delta. 😂

michaelblyons · 2026-03-23T12:15:49Z

The 128 line limit is handled explicitly. Not sure about logging.

Surface 128-line branch point expiry as warnings in ParseLineOutput instead of silently discarding, following the same pattern used for syntax loading warnings. Rename the `ops` variable in the parsing benchmark to `output` for consistency with all other call sites.

stefanobaghino · 2026-03-23T13:09:36Z

On the 128-line limit: as @michaelblyons noted, it's already enforced — branch points are pruned at the start of every parse_line call and re-checked in handle_fail. There are tests for both the boundary (exactly 128 lines: still valid) and the expiry (129 lines: discarded). I've now also added structured warnings (following the same pattern as the syntax loading warnings) to ParseLineOutput so callers can see when a branch point expires. The existing tests now assert on the warning content too.

stefanobaghino · 2026-03-24T10:58:21Z

I tweaked the description to mention this closes #271 so that when it's merged the issue is closed automatically. Let me know if you have anything against it.

stefanobaghino added 4 commits March 21, 2026 15:36

Fix cargo fmt formatting issues

3d1429c

stefanobaghino mentioned this pull request Mar 21, 2026

sublime-syntax branch_point support #271

Open

stefanobaghino added 2 commits March 21, 2026 17:25

Fix cargo clippy warnings across codebase

278a2b0

stefanobaghino added 2 commits March 21, 2026 19:40

Fix cargo fmt formatting in error_handling test

17a9929

Fix bat CI job PATH resolution when scripts change directory

591c638

Use $PWD/target/release instead of relative paths in PATH so the bat binary remains reachable after scripts cd to subdirectories.

keith-hall reviewed Mar 21, 2026

View reviewed changes

src/parsing/parser.rs Show resolved Hide resolved

keith-hall reviewed Mar 21, 2026

View reviewed changes

examples/syntest.rs Outdated Show resolved Hide resolved

stefanobaghino added 7 commits March 21, 2026 23:14

Restore explanatory comments in exec_pattern

e371486

Address review feedback on PR trishume#614: restore the algorithmic comments explaining capture ordering and the empty-capture guard, which were inadvertently removed in a45c41e.

stefanobaghino requested a review from keith-hall March 23, 2026 07:24

keith-hall approved these changes Mar 23, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

Fix cargo fmt formatting

6d5313e

stefanobaghino changed the title ~~Implement branch_point/branch/fail backtracking (issue #271)~~ Implement branch_point/branch/fail backtracking Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement branch_point/branch/fail backtracking#614

Implement branch_point/branch/fail backtracking#614
stefanobaghino wants to merge 17 commits intotrishume:masterfrom
stefanobaghino:branch-point

stefanobaghino commented Mar 21, 2026 •

edited

Loading

Uh oh!

stefanobaghino commented Mar 21, 2026

Uh oh!

stefanobaghino commented Mar 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

stefanobaghino commented Mar 23, 2026

Uh oh!

keith-hall left a comment

Uh oh!

stefanobaghino commented Mar 23, 2026

Uh oh!

keith-hall commented Mar 23, 2026

Uh oh!

stefanobaghino commented Mar 23, 2026

Uh oh!

This comment was marked as resolved.

michaelblyons commented Mar 23, 2026

Uh oh!

stefanobaghino commented Mar 23, 2026

Uh oh!

stefanobaghino commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

stefanobaghino commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decisions

Known limitations

Test plan

Uh oh!

stefanobaghino commented Mar 21, 2026

Uh oh!

stefanobaghino commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stefanobaghino commented Mar 23, 2026

Uh oh!

keith-hall left a comment

Choose a reason for hiding this comment

Uh oh!

stefanobaghino commented Mar 23, 2026

Uh oh!

keith-hall commented Mar 23, 2026

Uh oh!

stefanobaghino commented Mar 23, 2026

Uh oh!

This comment was marked as resolved.

michaelblyons commented Mar 23, 2026

Uh oh!

stefanobaghino commented Mar 23, 2026

Uh oh!

stefanobaghino commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stefanobaghino commented Mar 21, 2026 •

edited

Loading

stefanobaghino commented Mar 21, 2026 •

edited

Loading