Skip to content

[Bug]: DoclingReader takes too much time to get documents when launched from Docker container #19994

@maxvelichko26

Description

@maxvelichko26

Bug Description

I am currently working on a RAG system, which query engine uses both VectorIndexRetriever and bm25retriever. For latter one I require nodes, which I try to generate from files that I gather from MinioReader that uses DoclingReader as file_extractor.
When I am running load_data() natively it takes about 4 minutes to extract about 170 .docx and .xlsx files from minio container. But when launched from inside docker container application, process takes more than an hour.
What could be the cause of this? Docker container has no constraints on resourses and I even added GPU to deploy in docker-compose file.

Version

llama-index-readers-docling = 0.4.1

Steps to Reproduce

Just build a docker app with 2 containers: minio and main, put some files inside minio, then try extracting them with following script:

For comparison, try the same without putting the script inside docker, for example, launch from jupyter notebook

Calling nodes from minio via next function:
def get_documents_minio():
file_extr = {}
reader = DoclingReader(export_type=DoclingReader.ExportType.MARKDOWN, doc_converter= DocumentConverter())
for val in ['.pdf', '.vdx', '.docx', '.xlsx']:
file_extr[val] = reader
minio_reader = MinioReader(
bucket='example',
minio_endpoint='127.0.0.1:9000',
minio_access_key="minioadmin",
minio_secret_key="minioadmin",
file_extractor= file_extr,
)

  documents = minio_reader.load_data() 

return documents

Relevant Logs/Tracbacks

Don't know about relevancy, but docker logs just show Reader working on documents for a long time, between seconds and multiple minutes.

2025-10-01 15:27:35 2025-10-01 12:27:35,230 - INFO - deleted item in tree at stack: (597,) => #/texts/644
2025-10-01 15:27:35 2025-10-01 12:27:35,247 - INFO - deleted item in tree at stack: (597,) => #/texts/644
2025-10-01 15:27:35 2025-10-01 12:27:35,263 - INFO - deleted item in tree at stack: (597,) => #/texts/644
2025-10-01 15:27:35 2025-10-01 12:27:35,280 - INFO - deleted item in tree at stack: (597,) => #/texts/644
2025-10-01 15:27:35 2025-10-01 12:27:35,298 - INFO - deleted item in tree at stack: (597,) => #/texts/644
2025-10-01 15:27:35 2025-10-01 12:27:35,324 - INFO - deleted item in tree at stack: (597,) => #/texts/646
2025-10-01 15:27:35 2025-10-01 12:27:35,341 - INFO - deleted item in tree at stack: (597,) => #/texts/646
2025-10-01 15:27:35 2025-10-01 12:27:35,358 - INFO - deleted item in tree at stack: (597,) => #/texts/646
2025-10-01 15:27:35 2025-10-01 12:27:35,375 - INFO - deleted item in tree at stack: (597,) => #/texts/646
2025-10-01 15:27:35 2025-10-01 12:27:35,391 - INFO - deleted item in tree at stack: (597,) => #/texts/646
2025-10-01 15:27:35 2025-10-01 12:27:35,416 - INFO - deleted item in tree at stack: (597,) => #/texts/648
2025-10-01 15:27:35 2025-10-01 12:27:35,432 - INFO - deleted item in tree at stack: (597,) => #/texts/648
2025-10-01 15:27:35 2025-10-01 12:27:35,448 - INFO - deleted item in tree at stack: (597,) => #/texts/648
2025-10-01 15:27:35 2025-10-01 12:27:35,464 - INFO - deleted item in tree at stack: (597,) => #/texts/648
2025-10-01 15:27:35 2025-10-01 12:27:35,481 - INFO - deleted item in tree at stack: (597,) => #/texts/648
2025-10-01 15:27:35 2025-10-01 12:27:35,507 - INFO - deleted item in tree at stack: (597,) => #/texts/650
2025-10-01 15:27:35 2025-10-01 12:27:35,523 - INFO - deleted item in tree at stack: (597,) => #/texts/650
2025-10-01 15:27:35 2025-10-01 12:27:35,540 - INFO - deleted item in tree at stack: (597,) => #/texts/650
2025-10-01 15:27:35 2025-10-01 12:27:35,564 - INFO - deleted item in tree at stack: (604,) => #/texts/656
2025-10-01 15:27:35 2025-10-01 12:27:35,580 - INFO - deleted item in tree at stack: (604,) => #/texts/656
2025-10-01 15:27:35 2025-10-01 12:27:35,598 - INFO - deleted item in tree at stack: (604,) => #/texts/656
2025-10-01 15:27:35 2025-10-01 12:27:35,616 - INFO - deleted item in tree at stack: (604,) => #/texts/656
2025-10-01 15:27:35 2025-10-01 12:27:35,633 - INFO - deleted item in tree at stack: (604,) => #/texts/656
2025-10-01 15:27:35 2025-10-01 12:27:35,649 - INFO - deleted item in tree at stack: (604,) => #/texts/656
2025-10-01 15:27:35 2025-10-01 12:27:35,666 - INFO - deleted item in tree at stack: (604,) => #/texts/656
2025-10-01 15:27:35 2025-10-01 12:27:35,682 - INFO - deleted item in tree at stack: (604,) => #/texts/656
2025-10-01 15:27:35 2025-10-01 12:27:35,699 - INFO - deleted item in tree at stack: (604,) => #/texts/656

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageIssue needs to be triaged/prioritized

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions