Open
Description
Describe the bug
I was considering using Flexmark as a HTML => text/plain engine for Apache James
(We currently rely on an homegrown Jsoup based parser)
I did throw our test suite at flexmark-html2md-converter and we encountered a Strackoverflow error with the following test:
@Test
public void deeplyNestedHtmlShouldNotThrowStackOverflow() {
final int count = 2048;
String html = Strings.repeat("<div>", count) + "<p>para1</p><p>para2</p>" + Strings.repeat("</div>", count);
String expectedPlainText = "para1\n\npara2\n\n";
assertThat(FlexmarkHtmlConverter.builder().build().convert(html))
.isEqualTo(expectedPlainText);
}
java.lang.StackOverflowError
at java.base/java.util.Vector.addElement(Vector.java:616)
at java.base/java.util.Stack.push(Stack.java:68)
at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.pushState(FlexmarkHtmlConverter.java:1151)
at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter.processHtmlTree(FlexmarkHtmlConverter.java:1696)
at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.renderChildren(FlexmarkHtmlConverter.java:1146)
at com.vladsch.flexmark.html2md.converter.internal.HtmlConverterCoreNodeRenderer.processDiv(HtmlConverterCoreNodeRenderer.java:537)
at com.vladsch.flexmark.html2md.converter.HtmlNodeRendererHandler.render(HtmlNodeRendererHandler.java:19)
at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.renderNode(FlexmarkHtmlConverter.java:1135)
at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.render(FlexmarkHtmlConverter.java:1050)
at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter.processHtmlTree(FlexmarkHtmlConverter.java:1707)
at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.renderChildren(FlexmarkHtmlConverter.java:1146)
at com.vladsch.flexmark.html2md.converter.internal.HtmlConverterCoreNodeRenderer.processDiv(HtmlConverterCoreNodeRenderer.java:537)
at com.vladsch.flexmark.html2md.converter.HtmlNodeRendererHandler.render(HtmlNodeRendererHandler.java:19)
The rendrering is tree based, and rendering it uses the Java stack with recursion.
This feels familiar as we have had a similar issue with our homegrown jsoup-based parser that relied on similar mechanisms. We did overcome this limitation by replacing recursion with stacks (in and out) and loops.
Metadata
Metadata
Assignees
Labels
No labels