Skip to content

Allow parameters in HTML backend or any DeclarativeDocumentBackend implementation #1963

@ceberam

Description

@ceberam

Requested feature

Handling images that are referenced in HTML and adding them to the converted DoclingDocument has been requested by some users. We would like to have the flexibility to ignore, keep just the reference, or embed the images when the backend parses the document. We should have the possibility to pass parsing options to the backend. Currently, this is not possible, since the init method of all backends is restricted to the arguments in_doc and path_or_stream.
We could find a solution that is either specific to this backend (e.g., through HTMLFormatOption) or generic to all the DeclarativeDocumentBackend implementations.

The #1411 initiated the reflection on this topic.

Alternatives

An alternative would be to create several backend implementations for each option for handling images (placeholder, referenced, and embedded). The commit 5d08b74 points in this direction.

However, this should not be the preferred option, since it is not efficient, nor flexible.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthtmlissue related to html backend

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions