Supporting re-prompting VLM for picture description if the description is bad

### Requested feature

Piloting `do_picture_description` with [`docling==2.55.1`'s default](https://github.com/docling-project/docling/blob/v2.55.1/docling/datamodel/pipeline_options.py#L223-L226) picture description model [HuggingFaceTB/SmolVLM-256M-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct), I am seeing almost all descriptions are not useful. Here's a bulleted list of descriptions I am seeing when processing a big PDF:

- In
- This image is a diagram showing different types of sensors and their response functions. The diagram is labeled with the names of the different types of sensors and their response functions. The diagram is divided into two main sections: the left section shows the response function of the sensor, while the right section shows the response function of the sensor. The response function of the sensor is represented by a line graph, and the response function of the sensor is represented by a curve.
- This
- In
- This
- In
- This
- This
- This
- In
- The
- This
- The x-axis label is "Time (s)."
- The y-axis label is "Membrane
- This
- In

We can see the majority of them are one-word. Ignoring `HuggingFaceTB/SmolVLM-256M-Instruct`'s poor performance, it exposes that Docling could benefit from a feature where:

1. If some predicate determines the response is insufficient
    - For example, checking common failure modes: `lambda x: x.strip().lower() in {"in", "this", "the"}`
2. The picture description model can be re-prompted with failure details

    ```none
    Describe this image in a few sentences.

    Your previous description of "This" is insufficient.
    ```

### Alternatives

Expanding the `PictureDescriptionVlmOptions.prompt`:

- Include examples of desirable descriptions
- Mention a minimum word length

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Supporting re-prompting VLM for picture description if the description is bad #2412

Requested feature

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Supporting re-prompting VLM for picture description if the description is bad #2412

Description

Requested feature

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions