Skip to content

VLM is missing/making up information #2444

@tkaisermayer

Description

@tkaisermayer

Bug

I am using the minimal VLM pipeline to generate markdown form a pdf.
I see that the model is altering the text. Is this a know limitation or a bug?

I used only page 2 from https://arxiv.org/pdf/2408.09869v5.pdf
ExtractPage_code.pdf

In the pdf there is the sentence:
"All required model assets are downloaded to a local huggingface datasets cache on first use, unless you choose to pre-install the
model assets in advance."

In markdown it is like this:
"All required models are downloaded to a local huggingface dataset once you have downloaded the package."

It even made up a whole paragraph. Paragraph 4.

## 4 Document generation

Docling generates a graph from the document. The graph is then used to construct a DCGL (Dedicated Graph-Cognitive Language Lenguaging) pipeline, which extracts the nodes from the document and transforms them into a graph. The graph is then used to construct a DCGL (Dedicated Graph-Cognitive Language Lenguaging) pipeline, which extracts the nodes from the document and transforms them into a graph.

The resulting markdown is here:
output_ExtractPage_code.md

...

Steps to reproduce

source = Path("ExtractPage_code.pdf")
# create directory for outputs
output_dir = Path("outputs")
output_dir.mkdir(exist_ok=True)

vlm_pipeline_options = VlmPipelineOptions(
    vlm_options=vlm_model_specs.GRANITEDOCLING_TRANSFORMERS,  # <-- change the model here
)
converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_cls=VlmPipeline,
            pipeline_options=vlm_pipeline_options,
        )
    }
)

result = converter.convert(source)
output_path = output_dir / f"output_{source.stem}"
result.document.save_as_markdown(output_path.with_suffix(".md"), image_mode=ImageRefMode.PLACEHOLDER, include_annotations=True)

...

Docling version

Docling version: 2.55.1
Docling Core version: 2.48.4
Docling IBM Models version: 3.9.1
Docling Parse version: 4.5.0
Python: cpython-311 (3.11.9)
Platform: Windows-10-10.0.22631-SP0
...

Python version

Python 3.11.9
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions