Skip to content

Allowing for controlling maximum image size before feeding image into LLMImageBlobParser #30391

Open
@alberto-agudo

Description

@alberto-agudo

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from dotenv import load_dotenv

from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.document_loaders.parsers.images import LLMImageBlobParser
from langchain_aws.chat_models import ChatBedrock

def main():
    # Include Bedrock credentials
    load_dotenv()
    
    # Ingest document
    # Note you can download this file from: https://documents1.worldbank.org/curated/en/099101824180532047/pdf/BOSIB13bdde89d07f1b3711dd8e86adb477.pdf
    fp = "./data/world-bank-report-example.pdf"
    
    prompt = (
    "You are an assistant tasked with describing images for retrieval. "
    "1. These descriptions will be embedded and used to retrieve the raw image. "
    "Give a concise description of the image that is well optimized for retrieval\n"
    "2. extract all the text from the image. "
    "Do not exclude any content from the page.\n"
    "Format your answer in markdown without explanatory text "
    "and without markdown delimiter ``` at the beginning. "
)

    # 1) Load and parse documents
    llm_img_parser = ChatBedrock(
        model_id = "anthropic.claude-3-sonnet-20240229-v1:0",
        model_kwargs=dict(temperature=0.1)
    )

    img_parser = LLMImageBlobParser(
        model=llm_img_parser,
        prompt=prompt
    )
    loader = PyMuPDFLoader(
        file_path=fp,
        mode="page",
        extract_images=True, 
        images_parser=img_parser,
        extract_tables="markdown",
        images_inner_format="text"
    )
    
    docs = []
    docs_lazy = loader.lazy_load()
    
    for doc in docs_lazy:
        print(f"Processing doc {doc}")
        docs.append(doc)
    print(docs[0].page_content[:100])
    print(docs[0].metadata)


if __name__ == "__main__": 
    main()

Error Message and Stack Trace (if applicable)

Error raised by bedrock service
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/langchain_aws/llms/bedrock.py", line 956, in _prepare_input_and_invoke
response = self.client.invoke_model(**request_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 570, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/botocore/context.py", line 124, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 1031, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: messages.0.content.0.image.source: image exceeds 5 MB maximum: 8033316 bytes > 5242880 bytes

Description

  • I want to use the LLMImageBlobParser to extract descriptions from images in a PDF, as an addition to the PyMuPDF document loader
  • I am using Anthropic in Bedrock to parse these images. I am aware that the maximum size for an image passed to Anthropic models is 5 MB
  • The inner behavior of parsing in the PDF Parsers doesn't take into account these limits in file sizes. There should be a helper function to resize images to a maximum amount of MB that the user knows and can pre-specify when calling for instance the PyMuPDF document loader
  • Here's the detail of where the images are created, where the helper size reduction function should go:
    img_list = page.get_images()
    images = []
    for img in img_list:
    if self.images_parser:
    xref = img[0]
    pix = pymupdf.Pixmap(doc, xref)
    image = np.frombuffer(pix.samples, dtype=np.uint8).reshape(
    pix.height, pix.width, -1
    )
    image_bytes = io.BytesIO()
    numpy.save(image_bytes, image)
    blob = Blob.from_data(
    image_bytes.getvalue(), mime_type="application/x-npy"
    )
    image_text = next(self.images_parser.lazy_parse(blob)).page_content
  • Another option instead would be to modify this part of the LLMImageBlobParser, probably more modular:
    def _analyze_image(self, img: "Image") -> str:
    """Analyze an image using the provided language model.
    Args:
    img: The image to be analyzed.
    Returns:
    The extracted textual content.
    """
    image_bytes = io.BytesIO()
    img.save(image_bytes, format="PNG")
    img_base64 = base64.b64encode(image_bytes.getvalue()).decode("utf-8")
    msg = self.model.invoke(

System Info

System Information

OS: Linux
OS Version: #1 SMP Fri Mar 29 23:14:13 UTC 2024
Python Version: 3.12.9 (main, Feb 25 2025, 02:40:13) [GCC 12.2.0]

Package Information

langchain_core: 0.3.46
langchain: 0.3.21
langchain_community: 0.3.20
langsmith: 0.3.18
langchain_aws: 0.2.16
langchain_text_splitters: 0.3.7

Optional packages not installed

langserve

Other Dependencies

aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
boto3: 1.37.13
dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
httpx: 0.28.1
httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.45: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.7: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langchain<1.0.0,>=0.3.21: Installed. No version info available.
langsmith-pyo3: Installed. No version info available.
langsmith<0.4,>=0.1.125: Installed. No version info available.
langsmith<0.4,>=0.1.17: Installed. No version info available.
numpy: 2.2.4
numpy<3,>=1.26.2: Installed. No version info available.
openai-agents: Installed. No version info available.
opentelemetry-api: Installed. No version info available.
opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
opentelemetry-sdk: Installed. No version info available.
orjson: 3.10.15
packaging: 24.2
packaging<25,>=23.2: Installed. No version info available.
pydantic: 2.10.6
pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
pydantic<3.0.0,>=2.5.2;: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic<3.0.0,>=2.7.4;: Installed. No version info available.
pytest: Installed. No version info available.
PyYAML>=5.3: Installed. No version info available.
requests: 2.32.3
requests-toolbelt: 1.0.0
requests<3,>=2: Installed. No version info available.
rich: Installed. No version info available.
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
typing-extensions>=4.7: Installed. No version info available.
zstandard: 0.23.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    🤖:bugRelated to a bug, vulnerability, unexpected error with an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions