feat(llm): inline http(s) image URLs as base64 for models that require it (Kimi K2.6)#3219
Draft
juanmichelini wants to merge 1 commit into
Draft
feat(llm): inline http(s) image URLs as base64 for models that require it (Kimi K2.6)#3219juanmichelini wants to merge 1 commit into
juanmichelini wants to merge 1 commit into
Conversation
…e it Some model APIs reject http(s) image_url content blocks and only accept base64 data URLs (notably Moonshot's public Kimi endpoint, which fails SWE-bench Multimodal runs against moonshot/kimi-k2.6 with 'MoonshotException - Invalid request: unsupported image url' for every GitHub user-content URL — see #3155). Approach: - Add a typed capability bit `requires_inline_image_data` to `ModelFeatures`, sourced from a narrow substring list (`REQUIRES_INLINE_IMAGE_DATA_MODELS`, currently `moonshot/kimi-k2.6` only). This matches the existing pattern used for `force_string_serializer`, `send_reasoning_content`, etc. - New `openhands.sdk.llm.utils.image_inline.maybe_inline_image_urls` pass — mirrors the shape of `image_resize.py` — fetches each non-data URL on `ImageContent` and rewrites it to `data:{mime};base64,{...}`. Bounded in-memory LRU cache so the same image isn't re-downloaded every conversation turn. Fetch failures fall back to the original URL with a warning. - Wire into `LLM.format_messages_for_llm` right before the existing resize pass (inline → resize chaining gives free large-image protection for the inlined path). - Add `LLM.inline_image_urls: bool | None` as an explicit override for proxy/alias deployments that hide the underlying model from the capability substring match — same shape as `force_string_serializer: bool | None`. - Activate for kimi-k2.6 in `.github/run-eval/resolve_model_config.py` via `inline_image_urls=True` (and add it to `SDK_ONLY_PARAMS` so it is not forwarded to the preflight `litellm.completion` call). Co-authored-by: openhands <openhands@all-hands.dev>
Contributor
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
Contributor
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Contributor
Coverage Report •
|
|||||||||||||||||||||||||||||||||||
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
swebenchmultimodalevaluations againstlitellm_proxy/moonshot/kimi-k2.6fail 100% of conversations with:Moonshot's public Kimi API explicitly does not support URL-formatted images — only base64
data:URLs orms://<file_id>references (see Use the Kimi Vision Model):LiteLLM's docs claim it downloads URLs → base64 when the upstream doesn't support URLs, but
litellm/llms/moonshot/chat/transformation.pydoes not wireconvert_url_to_base64(unlike Gemini, Bedrock Anthropic, Vertex, Azure-AI, OpenAI), so URLs are forwarded straight toapi.moonshot.aiand rejected. Full investigation: #3155 (comment).This PR adds the missing piece in the SDK so the public Kimi endpoint behaves the same as the private one.
Summary
ModelFeatures.requires_inline_image_data: boolcapability bit, populated by a narrow substring listREQUIRES_INLINE_IMAGE_DATA_MODELSthat currently contains only"moonshot/kimi-k2.6". Same pattern asforce_string_serializer,send_reasoning_content, etc. Provider-name matching is deliberately not used because (a) proxies/aliases erase the provider name and (b) the same provider can host the same model behind a URL-tolerant upstream (Bedrock/Azure-AI/Fireworks Kimi do not have this restriction).openhands.sdk.llm.utils.image_inline.maybe_inline_image_urlspass that mirrors the shape ofimage_resize.py. For eachImageContent, any entry that isn't alreadydata:is fetched and rewritten asdata:{mime};base64,{...}. Bounded in-memory LRU cache (64 MB by default, configurable viaOH_INLINE_IMAGE_CACHE_BYTES) so the same image isn't re-downloaded every conversation turn. Per-image size cap (OH_INLINE_IMAGE_MAX_MB, default 20 MB) and 30 s fetch timeout. Fetch failures fall back to the original URL with a warning — we never silently drop images.LLM.inline_image_urls: bool | Noneexplicit override (samebool | Noneshape asforce_string_serializer) for proxy/alias deployments that hide the underlying model from the capability substring match.LLM.format_messages_for_llmright before the existing resize pass. Inline → resize chaining means the existingdata:image/…-only resize util now also protects oversized Moonshot images for free.kimi-k2.6(and only that model) in.github/run-eval/resolve_model_config.pyvia"inline_image_urls": True. Added"inline_image_urls"toSDK_ONLY_PARAMSso it is not forwarded to the preflightlitellm.completioncall.Issue Number
Fixes #3155.
How to Test
Unit tests (in this repo):
uv run pytest tests/sdk/llm/test_image_inline.py \ tests/sdk/llm/test_model_features.py \ tests/sdk/llm/test_llm_image_resizing.py \ tests/sdk/llm/test_llm.py \ tests/cross/test_resolve_model_config.py -qResult on this branch: 255 passed.
Full LLM suite still green:
uv run pytest tests/sdk/llm -q # => 741 passedThe new test file
tests/sdk/llm/test_image_inline.pycovers:data:image/png;base64,…end-to-end.data:URLs pass through unchanged with no network call.get_features("moonshot/kimi-k2.6").requires_inline_image_data is True, plus negative cases for sibling Kimi models and Bedrock/Fireworks-hosted Kimi.LLM.inline_image_urls=Trueopts in even on unrelated models.LLM(model="litellm_proxy/moonshot/kimi-k2.6")auto-inlines without override.LLM.inline_image_urls=Falseopts out even when the capability default would opt in (verified by assertinghttpx.Clientis never constructed).End-to-end (still TODO before merging):
A full
swebenchmultimodalrerun againstlitellm_proxy/moonshot/kimi-k2.6is the canonical real-world test — the failure mode reproduced in #3155 is exactly the upstream API rejectingimage_url.urlhttp(s) values, so unit tests can only confirm we are now sending base64 instead. Suggested rollout: trigger one smalleval_limit=10swebenchmultimodalrun via the existing eval-trigger workflow withMODEL_IDS=kimi-k2.6and confirm the new run has 0LLMBadRequestError: MoonshotException - Invalid request: unsupported image urlfailures (the public run linked in the issue had 10/10).Video/Screenshots
N/A — backend-only change, no UI.
Type
Notes
moonshot/kimi-k2.6today). Reviewers should expand it only after confirming a specific upstream rejects URL images in production. The same model behind Bedrock/Azure-AI/Fireworks does not need it.requires_inline_image_data=False,LLM.inline_image_urls=None. Old serialised configs deserialise unchanged.convert_url_to_base64to the Moonshot transformer, mirroringgemini/chat/transformation.py) would benefit every LiteLLM consumer and let us remove this pass. That is independent of this PR.This PR was created by an AI agent (OpenHands) on behalf of @juanmichelini.
@juanmichelini can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:04c9fe3-pythonRun
All tags pushed for this build
About Multi-Architecture Support
04c9fe3-python) is a multi-arch manifest supporting both amd64 and arm6404c9fe3-python-amd64) are also available if needed