Skip to content

Stackoverflow while rendering markdown from HTML #626

Open
@chibenwa

Description

@chibenwa

Describe the bug

I was considering using Flexmark as a HTML => text/plain engine for Apache James

(We currently rely on an homegrown Jsoup based parser)

I did throw our test suite at flexmark-html2md-converter and we encountered a Strackoverflow error with the following test:

    @Test
    public void deeplyNestedHtmlShouldNotThrowStackOverflow() {
        final int count = 2048;
        String html = Strings.repeat("<div>", count) +  "<p>para1</p><p>para2</p>" + Strings.repeat("</div>", count);
        String expectedPlainText = "para1\n\npara2\n\n";
        
        assertThat(FlexmarkHtmlConverter.builder().build().convert(html))
            .isEqualTo(expectedPlainText);
    }
java.lang.StackOverflowError
	at java.base/java.util.Vector.addElement(Vector.java:616)
	at java.base/java.util.Stack.push(Stack.java:68)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.pushState(FlexmarkHtmlConverter.java:1151)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter.processHtmlTree(FlexmarkHtmlConverter.java:1696)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.renderChildren(FlexmarkHtmlConverter.java:1146)
	at com.vladsch.flexmark.html2md.converter.internal.HtmlConverterCoreNodeRenderer.processDiv(HtmlConverterCoreNodeRenderer.java:537)
	at com.vladsch.flexmark.html2md.converter.HtmlNodeRendererHandler.render(HtmlNodeRendererHandler.java:19)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.renderNode(FlexmarkHtmlConverter.java:1135)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.render(FlexmarkHtmlConverter.java:1050)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter.processHtmlTree(FlexmarkHtmlConverter.java:1707)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.renderChildren(FlexmarkHtmlConverter.java:1146)
	at com.vladsch.flexmark.html2md.converter.internal.HtmlConverterCoreNodeRenderer.processDiv(HtmlConverterCoreNodeRenderer.java:537)
	at com.vladsch.flexmark.html2md.converter.HtmlNodeRendererHandler.render(HtmlNodeRendererHandler.java:19)

The rendrering is tree based, and rendering it uses the Java stack with recursion.

This feels familiar as we have had a similar issue with our homegrown jsoup-based parser that relied on similar mechanisms. We did overcome this limitation by replacing recursion with stacks (in and out) and loops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions