Description
A question, and a feature request if there is no easy answer.
I want to add custom metadata to my documents, for example a uuid for each doc from some custom mapping function.
Not sure what the 'right solution' is, but it seems like having a 'metadata phase' in the pipeline where we get access to the doc and some mapping function to generate and inject metadata
Describe alternatives you've considered
I considered pulling from a datasource like OpenSearch (i see i can get meta and doc id from OpenSearch as default) but I have binary full docs to be parsed, while it seems like the OpenSearch connector only expects text documents to partition/chunk.
I would want the OpenSearch connector to allow me to retrieve the binary doc, and then parse it as per its type (docx, pptx, pdf, html...)
maybe there are solutions, I haven't found in the docs.