Description
Describe the bug
The partition_html function fails to extract accordion titles. For example, when applied to the following FAQ page:
Phonak FAQ, the extracted elements do not include the questions.
To Reproduce
from unstructured.partition.html import partition_html
url = "https://www.phonak.com/en-int/support-options/frequently-asked-questions"
elements = partition_html(url=url)
for el in elements:
print(el.to_dict())
Expected behavior
The extracted elements should include the FAQ questions as titles.
For example, a question like:
"Which Bluetooth profiles are required to support my Phonak hearing aids?"
should appear in the parsed output as a title element.
Screenshots
Environment Info
Python Version: 3.12.6 (tags/v3.12.6:a4a2d2b, Sep 6 2024, 20:11:23) [MSC v.1940 64 bit (AMD64)]
Platform: Windows-11-10.0.22631-SP0
unstructured 0.16.15
unstructured-client 0.29.0
unstructured-inference 0.8.6
unstructured-ingest 0.4.0
unstructured.pytesseract 0.3.13
Additional context
Add any other context about the problem here.