Implement branch_point/branch/fail backtracking#614
Implement branch_point/branch/fail backtracking#614stefanobaghino wants to merge 17 commits intotrishume:masterfrom
Conversation
Add support for Sublime Text build 4050+ backtracking constructs: - branch_point/branch: saves a checkpoint and pushes the first alternative context, trying subsequent alternatives on fail - fail: rewinds parser state to the named branch point and tries the next alternative Branch points are invalidated when popped from the stack or after 128 lines, matching Sublime Text behavior.
When backtracking via fail, the next alternative was pushed without the branch pattern's with_prototype and without emitting meta_scope/ meta_content_scope ops. Store with_prototype in BranchPoint and emit the correct scope ops on fail-triggered context pushes.
When a `fail` fires on a different line from its `branch_point`, the highlighting ops for intermediate lines have already been returned to callers and cannot be revised. This change addresses that gap. The `parse_line` return type changes from a bare `Vec` to a new `ParseLineOutput` struct with two fields: - `ops`: ops for the current line (same as the old return value) - `replayed`: corrected ops for previously-buffered lines, populated only when a cross-line `fail` rewinds the parser state `ParseState` now stores the raw line strings for all lines since the most recent branch point (`pending_lines`). When `handle_fail` detects that the branch point was set on an earlier line, it re-parses those buffered lines using the rewound state and makes the corrected ops available via `flushed_ops`, which are drained into `replayed` on the next call to `parse_line`. Callers that do not need cross-line accuracy can migrate with a minimal change: `parse_line(...)?` becomes `parse_line(...)?.ops`. All internal call sites (highlighter, html, easy) are updated accordingly. A new test `branch_cross_line_backtrack` verifies that when `fail` fires on line 2, the `replayed` field on that call contains the corrected ops for line 1 (using the fallback branch, not the failed one).
The Julia syntax in bat uses branch_point/branch/fail, so our new backtracking support correctly changes its highlighting output. Regenerate the expected highlighted fixtures before comparing so the test validates self-consistency rather than diffing against a stale baseline committed in the bat repo.
|
I have a fix locally for the clippy issues. The regression test failure seem benign, as Julia's syntax seem to have been fixed. 🙂 Having a look at that, happy to take input though, if you have any. |
Use $PWD/target/release instead of relative paths in PATH so the bat binary remains reachable after scripts cd to subdirectories.
Address review feedback on PR trishume#614: restore the algorithmic comments explaining capture ordering and the empty-capture guard, which were inadvertently removed in a45c41e.
Address review feedback from trishume#614: when cross-line branch_point/fail backtracking occurs, the ops returned for lines before the fail resolves are speculative and may be wrong. Previously, syntest evaluated assertions eagerly against these speculative ops and never revisited them. Now syntest buffers parsed line state (text, ScopeStack snapshot, and evaluated assertions). When parse_line returns non-empty replayed ops, it resets the scope stack to before the first replayed line, recomputes scoped text from the corrected ops, and re-evaluates any assertions that targeted those lines, adjusting the failure count accordingly.
The previous commit (81bd452) correctly re-evaluated assertion *counts* on cross-line backtracking replay, but failure *messages* were still printed eagerly against potentially speculative scopes. This meant a user could see a failure message for scopes that would later be corrected by a `fail` action, even though the final pass/fail count was accurate. This commit buffers failure messages and only flushes them once `ParseState::is_speculative()` returns false (i.e. no active branch points), or unconditionally at EOF. On replay, pending messages whose `test_against_line_number` matches a replayed line are pruned and regenerated from the corrected scopes. Changes: - Add `ParseState::is_speculative()` public accessor (checks `branch_points.is_empty()`) and update the public API snapshot. - Introduce `BufferedFailureMessage` struct, `print_failure()`, and `flush_pending_messages()` helpers in syntest. - Store `assertion_line_number` on `BufferedAssertion` so replayed failure messages can report the original assertion source line. - Replace inline println with buffering; flush after parse_line when not speculative and at EOF. - Prune + regenerate pending messages in the replay block. Edge cases preserved: - No branch points: flush after every parse_line (identical to before). - Summary mode: flush still clears the buffer without printing. - Debug mode: debug output remains eager (informational, not assertions).
Cover five previously untested code paths: - Branch point expiry after 128 lines - Stack-depth invalidation making fail a no-op - Nested/overlapping branch points (inner fails, outer survives) - fail referencing a non-existent branch point name - Cross-line replay with multiple buffered lines
Add branch_point_still_valid_at_128_lines to verify the lower bound of the 128-line expiry window, preventing an off-by-one mutation from surviving. Add ops assertions to branch_cross_line_backtrack and branch_cross_line_multi_replay to ensure ops.clear() is exercised, catching stale try.* scopes leaking into current-line output.
- Strengthen branch_fail_after_partial_match: assert keyword.declaration is absent and string.unquoted starts at position 0 after backtrack - Redesign branch_stack_depth_invalidation so FAIL actually reaches handle_fail: test both equal-depth (backtrack succeeds) and shallower-depth (no-op) scenarios - Add branch_cross_line_fail_with_preceding_ops: fail line has content before FAIL so ops are non-empty and start > 0, verifying ops.clear() and start reset to 0
Strengthen 3 existing mutation-killing tests to actually detect their targeted mutations, and add formatting fixes from cargo fmt: - consuming_match_not_treated_as_loop: check push position (==0) instead of just counting occurrences, catching L533 > → < in find_best_match - v2_embed_scope_replaces_skips_meta_content_pop_on_exit: deepen stack to 5 levels (via wrapper contexts) so L915 - → / produces a different index than -2, and verify source scope survives on the scope stack to catch L912/L913 >= → < mutations - v2_set_clear_scopes_only_from_last_context: assert source.v2clear remains on the scope stack, catching L1029 && → || and == → != Remaining 6 MISSED mutations are all equivalent or impractical: - L608/L611 (search): depth-100 guard and cache internals - L709 (capture sort): constant offset in secondary sort key - L771 (handle_fail): guard for unreachable stack state
|
Thanks for the review, @keith-hall! Here's a summary of what I've done to address your comments: Removed comments in syntest should evaluate assertions only after the parser commits to a branch — fully agreed, and this turned out to be a meaningful body of work:
On the testing side, I also ran |
keith-hall
left a comment
There was a problem hiding this comment.
LGTM (with the caveat that I haven't run the syntest example or anything to verify, but I imagine that will come next: as part of bumping the Sublime HQ Packages submodule and seeing what the state is).
I do strongly think it is worth adding a new API to the easy module in a future PR which would abstract away the complexities of dealing with the speculative states and just being able to send lines of text to the parser (in my mind it would handle the parse state itself, but I guess for caching purposes the caller may still want to know the state at each line), and perform a callback or something when a line has completed parsing and thus the scopes/operations are final.
I was planning to work on #222 and then bump the packages and identify the gaps, if that works for you. |
|
Even better, thank you so much for your hard work reviving syntect btw! |
I'm taking advantage of a sabbatical I had to take for personal reasons to address this mildly uncomfortable feeling of not seeing Java highlighted correctly in |
This comment was marked as resolved.
This comment was marked as resolved.
|
The 128 line limit is handled explicitly. Not sure about logging. |
Surface 128-line branch point expiry as warnings in ParseLineOutput instead of silently discarding, following the same pattern used for syntax loading warnings. Rename the `ops` variable in the parsing benchmark to `output` for consistency with all other call sites.
|
On the 128-line limit: as @michaelblyons noted, it's already enforced — branch points are pruned at the start of every |
|
I tweaked the description to mention this closes #271 so that when it's merged the issue is closed automatically. Let me know if you have anything against it. |
Closes #271.
Summary
branch_point,branch, andfaildirectives from theSublime Text syntax definition spec, enabling speculative parsing with
backtracking when a chosen branch turns out to be wrong.
failfires on a later line thanits
branch_point, the parser re-parses intermediate lines under thecorrected alternative and surfaces the corrected ops via a new
ParseLineOutput.replayedfield.ParseState::parse_linenow returnsResult<ParseLineOutput, ParsingError>instead of a bareVec.Migration: append
.opsto the old call site. Callers that care aboutcross-line accuracy should also inspect
.replayed.Design decisions
is active). Only raw line strings are buffered for potential replay.
This keeps the common-case caller contract unchanged.
parse_line_innerso thathandle_failcan re-parse buffered lines without re-entrancy issueswith the outer
parse_linebookkeeping.behavior and bounding memory usage.
Known limitations
ScopeStackincrementally will have appliedthe original (potentially wrong) ops for buffered lines. When
replayedis non-empty, they must rewind and re-apply. syntect's ownhighlighter modules use
.opsonly and are unaffected.Test plan
cargo test— all 142 unit tests + 13 doc-tests passbranch_cross_line_backtracktest verifies cross-line replay