Skip to content

Conversation

carlodek
Copy link

@carlodek carlodek commented Oct 1, 2025

Image description with LLM into docx docs

What I've done:

  1. Modified converter_utils/docx/pre_process.py to detect images and put the description generated by LLM into the right place.
  2. Moved _llm_caption file into main folder as it will be used by pre_process file too.
  3. Added an image to test it: docx_with_image_test.docx into test_files folder.

How to test:

I've tested it with AzureOpenAI, here it's a code snippet:

from packages.markitdown.src.markitdown import MarkItDown
from openai import AzureOpenAI

if __name__ == "__main__":
    AZURE_OPEN_AI_ENDPOINT = "<your_endpoint>
    AZURE_OPEN_AI_DEPLOYMENT = "<your_deployment>"
    AZURE_OPEN_AI_KEY = "<your_api_key>"
    AZURE_OPEN_AI_API_VERSION = "<your_version>"
    file_path = "tests/test_files/docx_with_image_test.docx"
    client = AzureOpenAI(
        azure_endpoint=AZURE_OPEN_AI_ENDPOINT,
        api_key=AZURE_OPEN_AI_KEY,
        api_version=AZURE_OPEN_AI_API_VERSION
    )
    md = MarkItDown(llm_client=client, llm_model=AZURE_OPEN_AI_DEPLOYMENT, llm_prompt="Please describe the image")
    result = md.convert(file_path)
    print(result.markdown)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants