partitioning large html documents leads to an empty result. This is due to the missing huge_tree option on HTML parser generation in:
https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/partition/html/parser.py#L929
Fix:
Include the huge_tree option to solve this: etree.HTMLParser(remove_comments=True, huge_tree=True)
partitioning large html documents leads to an empty result. This is due to the missing huge_tree option on HTML parser generation in:
https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/partition/html/parser.py#L929
Fix:
Include the huge_tree option to solve this:
etree.HTMLParser(remove_comments=True, huge_tree=True)