Description
Describe the bug
In flexmark 0.64.8, a StackOverflowError
occurs when a link URL is enclosed in Angle Quotes in a reference link definition like [definition name]: <link URL>
, and the link URL is long. The issue is located in the Parser.
A case where this error occurs is when using the "Copy as Markdown" feature for an image from Google Docs, which generates Markdown like the following:
**![][image1]**
[image1]: <data:image/png;base64,....>
To Reproduce
The issue can be reproduced with the following code:
Parser parser = Parser.builder().build();
String url = "https://example.com/" + "A".repeat(5000);
String markdown = "[link]: <%s>".formatted(url);
parser.parse(markdown);
Expected behavior
The parse method should complete successfully.
Resulting Output
java.lang.StackOverflowError
at com.vladsch.flexmark.util.sequence.SubSequence.charAt(SubSequence.java:115)
at java.base/java.lang.Character.codePointAt(Character.java:9320)
at java.base/java.lang.Character.codePointAt(Character.java:9320)
at java.base/java.util.regex.Pattern$CharProperty.match(Pattern.java:4106)
at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4914)
at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4969)
at java.base/java.util.regex.Pattern$Loop.match(Pattern.java:5078)
at java.base/java.util.regex.Pattern$GroupTail.match(Pattern.java:5000)
at java.base/java.util.regex.Pattern$BranchConn.match(Pattern.java:4878)
at java.base/java.util.regex.Pattern$CharProperty.match(Pattern.java:4110)
at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4914)
...
Additional context
The error occurs at the beginning of InlineParserImpl#parseLinkDestination()
in the line BasedSequence res = match(myParsing.LINK_DESTINATION_ANGLES);
. It appears that the stack overflow is caused by the regular expression matching process.
Possible Solution
Using a possessive quantifier *+
instead of the greedy quantifier *
in Parsing.ST_LINK_DESTINATION_ANGLES_SPC
and Parsing.ST_LINK_DESTINATION_ANGLES_NO_SPC
might avoid the error. However, there is no certainty that this change is correct.