Open
Description
HtmlConverterCoreNodeRenderer.handleTableCell has a call to String.replaceAll("\\s*\n\\s*", " ")
which can be quite slow. The regex is quite simple and can be sped up by removing the regex.
To Reproduce
See attached file test.html.txt
public class LoadingTest {
public static void main(final String[] args) throws Exception {
final String STR = java.nio.file.Files.readString(java.nio.file.Path.of("test.html.txt"));
final long tic = System.currentTimeMillis();
com.diffbot.websearch.html.MarkdownNormalizer.markdown(STR);
System.out.println("took: " + (System.currentTimeMillis() - tic));
}
}
Expected behavior
The code takes >4000 ms to run on my laptop.
took: 4024
It should take much lesser time.
Metadata
Metadata
Assignees
Labels
No labels