Add HTML Page Title to Element Metadata in partition_html() #3970
prasannaJosium
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Description:
Currently, when using
partition_html()
, the metadata of elements doesn't include the HTML page title, which is a valuable piece of information that could be useful for many use cases. The title is available in the HTML document's<title>
tag but isn't being extracted and included in the element metadata.Proposed Solution:
Add a
page_title
field to theElementMetadata
class and modify thepartition_html()
function to extract and include the page title in the metadata of each element. This would involve:page_title: Optional[str] = None
to theElementMetadata
classFIRST
strategyBenefits:
Example Usage:
Would you like me to submit a PR with these changes?
If there are other ways to get his done, please do educate.
Cheers
Beta Was this translation helpful? Give feedback.
All reactions