-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Requested feature
Handling images that are referenced in HTML and adding them to the converted DoclingDocument
has been requested by some users. We would like to have the flexibility to ignore, keep just the reference, or embed the images when the backend parses the document. We should have the possibility to pass parsing options to the backend. Currently, this is not possible, since the init
method of all backends is restricted to the arguments in_doc
and path_or_stream
.
We could find a solution that is either specific to this backend (e.g., through HTMLFormatOption
) or generic to all the DeclarativeDocumentBackend
implementations.
The #1411 initiated the reflection on this topic.
Alternatives
An alternative would be to create several backend implementations for each option for handling images (placeholder, referenced, and embedded). The commit 5d08b74 points in this direction.
However, this should not be the preferred option, since it is not efficient, nor flexible.