-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Bug
When converting a PDF with do_picture_description=True, the initial result.document.export_to_markdown() does not include the generated image descriptions. However, after saving the document to JSON and loading it back (DoclingDocument.load_from_json), export_to_markdown() does include the descriptions.
Observations:
- In the initial
result.document,doc.picturesentries havemeta=Noneand onlyannotations. - After JSON round-trip,
loaded_doc.picturescontain bothannotationsandmeta, and the markdown includes the descriptions. - Expected behavior: both the initial document and the loaded one should include the generated picture descriptions in markdown.
- It looks like the picture description metadata is generated but not attached (or not surfaced) to
doc.pictures[*].metabefore serialization. The JSON round-trip likely triggers normalization that merges annotations into meta, enabling markdown rendering to pick them up.
Steps to reproduce
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions, PictureDescriptionApiOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling_core.types import DoclingDocument
import os
def openai_compatible_vlm_options(model: str, prompt: str, max_tokens: int = 4096, api_key: str = ""):
headers = {}
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
options = PictureDescriptionApiOptions(
url="https://api.openai.com/v1/chat/completions",
params=dict(
model=model,
max_tokens=max_tokens,
),
headers=headers,
prompt=prompt,
timeout=90,
scale=2.0,
)
return options
converter = DocumentConverter()
pipeline_options = PdfPipelineOptions()
pipeline_options.enable_remote_services = True
pipeline_options.do_picture_description = True
pipeline_options.picture_description_options = openai_compatible_vlm_options(
model="gpt-4.1-mini",
prompt="Describe the image in three sentences. Be consise and accurate.",
api_key=os.getenv("OPENAI_API_KEY"),
)
converter.format_to_options[InputFormat.PDF] = PdfFormatOption(pipeline_options=pipeline_options)
source = "https://arxiv.org/pdf/2408.09869"
result = converter.convert(source)
markdown = result.document.export_to_markdown() # does not contain the image description
result.document.save_as_json("docling_test.json")
loaded_doc = DoclingDocument.load_from_json("docling_test.json")
loaded_markdown = loaded_doc.export_to_markdown() # contains the image descriptionDocling version
2.61.2
Python version
Python 3.10.19
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working