Skip to content

feat/skip_strikethrough parameter #3569

Open
@arisjr

Description

@arisjr

Is your feature request related to a problem? Please describe.
Yes. I'm doing a RAG on a group of brazilian laws and I think that the problem applies to all RAG/LLM community.
(I'm new to RAG)

Law and general legislation publications and documents that need to keep track of changes (history) normally don't simply erase text, they strikethrough the text, like the examples below:

https://www.planalto.gov.br/ccivil_03/_ato2004-2006/2006/decreto/d5948.htm
https://www.justice.gov/oip/freedom-information-act-5-usc-552

I think that including strikethrough text on data may lead to false assumptions by the AI, leading to wrong results for the analyst.

Describe the solution you'd like
Add skip_strikethrough parameter on partition_html class

Describe alternatives you've considered
None

Additional context
I'm using langchain unstructuredHTMLLoader.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions