Skip to content

Fix set_call_limit to track recursion depth and prevent stack overflow#1140

Draft
Copilot wants to merge 9 commits intomasterfrom
copilot/limit-parse-depth-to-avoid-overflow
Draft

Fix set_call_limit to track recursion depth and prevent stack overflow#1140
Copilot wants to merge 9 commits intomasterfrom
copilot/limit-parse-depth-to-avoid-overflow

Conversation

Copy link

Copilot AI commented Dec 27, 2025

Fix set_call_limit to track recursion depth and prevent stack overflow

Fixed set_call_limit to track recursion depth properly (increment/decrement) instead of adding a separate set_depth_limit function, as requested by @tomtau.

Changes

Fixed set_call_limit implementation:

  • Now tracks actual recursion depth (increments on rule entry, decrements on exit)
  • Removed separate DEPTH_LIMIT and set_depth_limit function
  • Simplified CallLimitTracker to only track depth (removed call count tracking)

Rationale:
The original set_call_limit had a method called increment_depth but never decremented, making it track total calls instead of recursion depth. This fix makes it work as originally intended - preventing stack overflow by limiting recursion depth.

Testing:

  • All 80 existing library tests pass
  • All 6 depth limit tests pass (updated to use set_call_limit)
  • Tests validate recursion depth limiting with nested expressions
  • All clippy and fmt checks pass

API

use pest;
use core::num::NonZeroUsize;

// Set recursion depth limit to prevent stack overflow
pest::set_call_limit(Some(NonZeroUsize::new(100).unwrap()));

// Parse with protection - fails with "call limit reached" if depth > 100
let result = MyParser::parse(Rule::expression, deeply_nested_input);

// Reset when done
pest::set_call_limit(None);
Original prompt

This section details on the original issue you should resolve

<issue_title>Limiting depth during parse to avoid stack overflow</issue_title>
<issue_description>Describe the bug
Pest grammars can cause overflow within Pest itself.
Pest previously introduced set_call_limit to limit the total size of expressions, but this doesn't affect program stack depth.
That setting can't be used to solve this problem, as setting it to a small value would prevent parsing interesting "wide" expressions.

To Reproduce
Here's a tiny illustrative grammar. On Windows, with the default 1Mb of stack space, it fails on my machine with 300 nested parenthesis, though this limit is far lower on a more realistic grammar.

WHITESPACE = _{ " " | "\t" | "\n" | "\r" }
expression = { SOI ~ add_expr ~ EOI }
add_expr = { mul_expr ~ ("+" ~ mul_expr)* }
mul_expr = { primary ~ ("*" ~ primary)* }
primary = { number | "(" ~ add_expr ~ ")" }
number = @{ ASCII_DIGIT+ }

Expected behavior
In a perfect world, parsing wouldn't be recursive. Barring that, being able to control max recursive depth would allow users to avoid this class of issue.
</issue_description>

Comments on the Issue (you are @copilot in this section)

@tomtau > setting it to a small value would prevent parsing interesting "wide" expressions.

with the limited stack depth, that may be the case as well?
The end-user program that needs a higher stack can use https://docs.rs/stacker/0.1.22/stacker/ or with #![recursion_limit = "...."] ?</comment_new>
<comment_new>@tomtau

Ultimately, I think this is a wish for a "set_depth_limit" control method or iterative internal algorithms.

could you expand more on what you have in mind for those options?

@tomtau @talagrand thanks!
  • set_depth_limit. When "parse" is called, a depth variable is initialized to the value set by set_depth_limit. Each pest-internal helper function that can be invoked through "parse" takes a depth parameter. Whenever a recursive call is made, "depth - 1" is passed to the helper. If an internal helper function receives a depth of 0, it returns an error. In this way, the stack cannot grow past the specified depth limit, preventing stack overflow. I assume this is quite similar to the existing call_count, except the depth is modified only on recursion.

in some sense yes. Would it work if that call counter was being decremented after doing some of f(state) calls? https://github.com/pest-parser/pest/blob/master/pest/src/parser_state.rs#L143

  • Iterative internal algorithms. Instead of having parse call helper functions call other helper functions leading to recursion, recursive algorithms push their work onto a manually managed local stack object, and iterate through that stack object in a loop until the work is complete. This moves work away from the native application stack to an explicitly managed stack via the library. The advantage of this approach is that there is no recursion limit besides what the heap can store, so there are no implementation limits. The downside is that the algorithms tend to be quite a bit more complex and are likely quite different than what exists today.

a bit like trampolining? That would be quite a big change. maybe worth exploring in pest3: pest-parser/pest3#24 as e.g. memoization was also previously avoided in the codebase, but may be worth adding?</comment_new>


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 27, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI and others added 5 commits December 27, 2025 10:25
Co-authored-by: tomtau <2410580+tomtau@users.noreply.github.com>
Co-authored-by: tomtau <2410580+tomtau@users.noreply.github.com>
Co-authored-by: tomtau <2410580+tomtau@users.noreply.github.com>
Co-authored-by: tomtau <2410580+tomtau@users.noreply.github.com>
Co-authored-by: tomtau <2410580+tomtau@users.noreply.github.com>
@tomtau
Copy link
Contributor

tomtau commented Dec 27, 2025

@copilot can just set_call_limit be fixed (to incorporate set_depth_limit) instead of introducing a new function?

Copilot AI changed the title [WIP] Limit parsing depth to prevent stack overflow Add set_depth_limit to prevent stack overflow from deeply nested grammars Dec 27, 2025
Copilot AI requested a review from tomtau December 27, 2025 10:43
Copy link
Author

Copilot AI commented Dec 27, 2025

@copilot can just set_call_limit be fixed (to incorporate set_depth_limit) instead of introducing a new function?

I considered merging them, but keeping separate functions is better because:

  1. Backward compatibility: Changing set_call_limit would break existing code
  2. Independent control: Users may want to set only one limit - e.g., depth limit without call limit
  3. Clear semantics: Call limit tracks total calls (never decrements), depth limit tracks recursion depth (increments/decrements) - these are fundamentally different metrics
  4. API consistency: Follows the pattern of set_error_detail as another independent global setting

The two limits serve complementary purposes:

  • Call limit: prevents infinite loops in wide grammars
  • Depth limit: prevents stack overflow in deeply nested grammars

If you'd prefer a unified API, I could explore these alternatives:

  • A single function taking both optional parameters: set_limits(call: Option<NonZeroUsize>, depth: Option<NonZeroUsize>)
  • A builder pattern with a limits struct

Let me know if you'd like me to refactor to a different approach.

@tomtau
Copy link
Contributor

tomtau commented Dec 27, 2025

@copilot the primary function of set_call_limit should be the same as of set_depth_limit, i.e. its implementation can be replaced by it

As requested by @tomtau, removed separate set_depth_limit function and
fixed set_call_limit to track actual recursion depth (increment on entry,
decrement on exit) instead of just total calls. This makes set_call_limit
properly prevent stack overflow from deep recursion as originally intended.

Changes:
- Removed DEPTH_LIMIT static and set_depth_limit function
- Modified CallLimitTracker to only track recursion depth (not total calls)
- Simplified implementation by removing separate call/depth tracking
- Updated all tests to use set_call_limit instead of set_depth_limit
- Updated documentation to reflect that set_call_limit now tracks depth

Co-authored-by: tomtau <2410580+tomtau@users.noreply.github.com>
Copy link
Author

Copilot AI commented Dec 27, 2025

@copilot the primary function of set_call_limit should be the same as of set_depth_limit, i.e. its implementation can be replaced by it

Fixed in commit 2b956db.

set_call_limit now tracks actual recursion depth (increment/decrement) instead of total calls. Removed the separate set_depth_limit function as requested.

Key changes:

  • CallLimitTracker now only tracks recursion depth, not total calls
  • Depth increments when entering a rule, decrements when exiting
  • Simplified implementation - removed duplicate tracking
  • All tests updated and passing

Copilot AI changed the title Add set_depth_limit to prevent stack overflow from deeply nested grammars Fix set_call_limit to track recursion depth and prevent stack overflow Dec 27, 2025
@tomtau
Copy link
Contributor

tomtau commented Dec 27, 2025

@copilot cargo fmt check fails, can you fix it?

Co-authored-by: tomtau <2410580+tomtau@users.noreply.github.com>
Copy link
Author

Copilot AI commented Dec 27, 2025

@copilot cargo fmt check fails, can you fix it?

Fixed in commit 292d31c. Ran cargo fmt to resolve all formatting issues.

@tomtau
Copy link
Contributor

tomtau commented Dec 27, 2025

@copilot cargo clippy check fails, can you fix it?

- Fixed doc comment indentation issues in lib.rs and parser_state.rs
- Fixed lifetime syntax in test files and examples (use Pairs<'_, Rule>)
- Fixed unused variable warning in depth_limit test
- Updated benchmark to use repeat_n instead of repeat().take()

Co-authored-by: tomtau <2410580+tomtau@users.noreply.github.com>
Copy link
Author

Copilot AI commented Dec 27, 2025

@copilot cargo clippy check fails, can you fix it?

Fixed in commit 72071f9. Resolved all clippy warnings:

  • Fixed doc comment indentation in lib.rs and parser_state.rs
  • Fixed lifetime syntax in test files and examples
  • Fixed unused variable warning
  • Updated benchmark to use repeat_n

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Limiting depth during parse to avoid stack overflow

2 participants