Description
Typically, long sequences of items such as long array literals or long sequences of statements are found only in generated files. Such files may not be interesting to analyze (e.g. in the case of semgrep) so it would be fine to give up on them. The problem is that at least the translation to the OCaml tree uses stack space that's proportional to the length of the tree (including lists) and it results in segfaults on some platforms while on other platforms it raises a Stack_overflow
exception. To avoid such a crash, a solution may be to calculate the depth of the tree returned by the tree-sitter parser and return an error if it exceeds some limit. This assumes the tree-sitter parser itself doesn't crash due to insufficient stack space.
Here's an example of a large generated C++ file whose parsing results in stack overflows: https://github.com/juce-framework/JUCE/blob/d054f0d14dcac387aebda44ce5d792b5e7a625b3/extras/Projucer/JuceLibraryCode/BinaryData.cpp
Tasks:
- Check whether we can parse the input file above with just tree-sitter-cpp (e.g. with
tree-sitter parse
). - If so, add a pass to calculate/estimate the depth of the tree returned by tree-sitter before its translation to OCaml.
- Add an option to fail if the tree depth exceeds a limit.
Ideas to avoid complete failure:
- Truncate excessively deep trees/lists without aborting when possible (e.g. tree-sitter's
repeat()
andrepeat1()
constructs). - Increase the system's stack size limit or ask the user to do so in a last-gasp error message.
Activity