Skip to content

Picture descriptions missing from export_to_markdown() until after JSON round-trip #2615

@lakatosd

Description

@lakatosd

Bug

When converting a PDF with do_picture_description=True, the initial result.document.export_to_markdown() does not include the generated image descriptions. However, after saving the document to JSON and loading it back (DoclingDocument.load_from_json), export_to_markdown() does include the descriptions.

Observations:

  • In the initial result.document, doc.pictures entries have meta=None and only annotations.
  • After JSON round-trip, loaded_doc.pictures contain both annotations and meta, and the markdown includes the descriptions.
  • Expected behavior: both the initial document and the loaded one should include the generated picture descriptions in markdown.
  • It looks like the picture description metadata is generated but not attached (or not surfaced) to doc.pictures[*].meta before serialization. The JSON round-trip likely triggers normalization that merges annotations into meta, enabling markdown rendering to pick them up.

Steps to reproduce

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions, PictureDescriptionApiOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling_core.types import DoclingDocument
import os


def openai_compatible_vlm_options(model: str, prompt: str, max_tokens: int = 4096, api_key: str = ""):
    headers = {}
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"

    options = PictureDescriptionApiOptions(
        url="https://api.openai.com/v1/chat/completions",
        params=dict(
            model=model,
            max_tokens=max_tokens,
        ),
        headers=headers,
        prompt=prompt,
        timeout=90,
        scale=2.0,
    )
    return options

converter = DocumentConverter()
pipeline_options = PdfPipelineOptions()
pipeline_options.enable_remote_services = True
pipeline_options.do_picture_description = True
pipeline_options.picture_description_options = openai_compatible_vlm_options(
    model="gpt-4.1-mini",
    prompt="Describe the image in three sentences. Be consise and accurate.",
    api_key=os.getenv("OPENAI_API_KEY"),
    )
converter.format_to_options[InputFormat.PDF] = PdfFormatOption(pipeline_options=pipeline_options)


source = "https://arxiv.org/pdf/2408.09869"
result = converter.convert(source)

markdown = result.document.export_to_markdown()  # does not contain the image description

result.document.save_as_json("docling_test.json")
loaded_doc = DoclingDocument.load_from_json("docling_test.json")
loaded_markdown = loaded_doc.export_to_markdown() # contains the image description

Docling version

2.61.2

Python version

Python 3.10.19

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions