Skip to content

feat(llm): inline http(s) image URLs as base64 for models that require it (Kimi K2.6)#3219

Draft
juanmichelini wants to merge 1 commit into
mainfrom
openhands/inline-image-urls-moonshot
Draft

feat(llm): inline http(s) image URLs as base64 for models that require it (Kimi K2.6)#3219
juanmichelini wants to merge 1 commit into
mainfrom
openhands/inline-image-urls-moonshot

Conversation

@juanmichelini
Copy link
Copy Markdown
Collaborator

@juanmichelini juanmichelini commented May 12, 2026

  • A human has tested these changes.

Why

swebenchmultimodal evaluations against litellm_proxy/moonshot/kimi-k2.6 fail 100% of conversations with:

litellm.BadRequestError: MoonshotException - Invalid request: unsupported image url: https://user-images.githubusercontent.com/...

Moonshot's public Kimi API explicitly does not support URL-formatted images — only base64 data: URLs or ms://<file_id> references (see Use the Kimi Vision Model):

URL-formatted images: Not supported, currently only supports base64-encoded image content and images/videos uploaded via file ID

LiteLLM's docs claim it downloads URLs → base64 when the upstream doesn't support URLs, but litellm/llms/moonshot/chat/transformation.py does not wire convert_url_to_base64 (unlike Gemini, Bedrock Anthropic, Vertex, Azure-AI, OpenAI), so URLs are forwarded straight to api.moonshot.ai and rejected. Full investigation: #3155 (comment).

This PR adds the missing piece in the SDK so the public Kimi endpoint behaves the same as the private one.

Summary

  • ModelFeatures.requires_inline_image_data: bool capability bit, populated by a narrow substring list REQUIRES_INLINE_IMAGE_DATA_MODELS that currently contains only "moonshot/kimi-k2.6". Same pattern as force_string_serializer, send_reasoning_content, etc. Provider-name matching is deliberately not used because (a) proxies/aliases erase the provider name and (b) the same provider can host the same model behind a URL-tolerant upstream (Bedrock/Azure-AI/Fireworks Kimi do not have this restriction).
  • New openhands.sdk.llm.utils.image_inline.maybe_inline_image_urls pass that mirrors the shape of image_resize.py. For each ImageContent, any entry that isn't already data: is fetched and rewritten as data:{mime};base64,{...}. Bounded in-memory LRU cache (64 MB by default, configurable via OH_INLINE_IMAGE_CACHE_BYTES) so the same image isn't re-downloaded every conversation turn. Per-image size cap (OH_INLINE_IMAGE_MAX_MB, default 20 MB) and 30 s fetch timeout. Fetch failures fall back to the original URL with a warning — we never silently drop images.
  • LLM.inline_image_urls: bool | None explicit override (same bool | None shape as force_string_serializer) for proxy/alias deployments that hide the underlying model from the capability substring match.
  • Wired into LLM.format_messages_for_llm right before the existing resize pass. Inline → resize chaining means the existing data:image/…-only resize util now also protects oversized Moonshot images for free.
  • Activated for kimi-k2.6 (and only that model) in .github/run-eval/resolve_model_config.py via "inline_image_urls": True. Added "inline_image_urls" to SDK_ONLY_PARAMS so it is not forwarded to the preflight litellm.completion call.

Issue Number

Fixes #3155.

How to Test

Unit tests (in this repo):

uv run pytest tests/sdk/llm/test_image_inline.py \
              tests/sdk/llm/test_model_features.py \
              tests/sdk/llm/test_llm_image_resizing.py \
              tests/sdk/llm/test_llm.py \
              tests/cross/test_resolve_model_config.py -q

Result on this branch: 255 passed.

Full LLM suite still green:

uv run pytest tests/sdk/llm -q
# => 741 passed

The new test file tests/sdk/llm/test_image_inline.py covers:

  • No-op fast path when the capability is off or vision is disabled.
  • HTTP URL is rewritten to data:image/png;base64,… end-to-end.
  • data: URLs pass through unchanged with no network call.
  • Fetch failure falls back to original URL.
  • In-memory cache deduplicates repeated URLs.
  • get_features("moonshot/kimi-k2.6").requires_inline_image_data is True, plus negative cases for sibling Kimi models and Bedrock/Fireworks-hosted Kimi.
  • LLM.inline_image_urls=True opts in even on unrelated models.
  • LLM(model="litellm_proxy/moonshot/kimi-k2.6") auto-inlines without override.
  • LLM.inline_image_urls=False opts out even when the capability default would opt in (verified by asserting httpx.Client is never constructed).

End-to-end (still TODO before merging):

A full swebenchmultimodal rerun against litellm_proxy/moonshot/kimi-k2.6 is the canonical real-world test — the failure mode reproduced in #3155 is exactly the upstream API rejecting image_url.url http(s) values, so unit tests can only confirm we are now sending base64 instead. Suggested rollout: trigger one small eval_limit=10 swebenchmultimodal run via the existing eval-trigger workflow with MODEL_IDS=kimi-k2.6 and confirm the new run has 0 LLMBadRequestError: MoonshotException - Invalid request: unsupported image url failures (the public run linked in the issue had 10/10).

Video/Screenshots

N/A — backend-only change, no UI.

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Notes

  • The capability list is intentionally narrow (only moonshot/kimi-k2.6 today). Reviewers should expand it only after confirming a specific upstream rejects URL images in production. The same model behind Bedrock/Azure-AI/Fireworks does not need it.
  • Defaults preserve every existing model's behaviour: requires_inline_image_data=False, LLM.inline_image_urls=None. Old serialised configs deserialise unchanged.
  • A parallel upstream fix in BerriAI/litellm (adding convert_url_to_base64 to the Moonshot transformer, mirroring gemini/chat/transformation.py) would benefit every LiteLLM consumer and let us remove this pass. That is independent of this PR.

This PR was created by an AI agent (OpenHands) on behalf of @juanmichelini.

@juanmichelini can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:04c9fe3-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-04c9fe3-python \
  ghcr.io/openhands/agent-server:04c9fe3-python

All tags pushed for this build

ghcr.io/openhands/agent-server:04c9fe3-golang-amd64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-golang-amd64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-golang-amd64
ghcr.io/openhands/agent-server:04c9fe3-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:04c9fe3-golang-arm64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-golang-arm64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-golang-arm64
ghcr.io/openhands/agent-server:04c9fe3-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:04c9fe3-java-amd64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-java-amd64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-java-amd64
ghcr.io/openhands/agent-server:04c9fe3-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:04c9fe3-java-arm64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-java-arm64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-java-arm64
ghcr.io/openhands/agent-server:04c9fe3-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:04c9fe3-python-amd64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-python-amd64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-python-amd64
ghcr.io/openhands/agent-server:04c9fe3-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:04c9fe3-python-arm64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-python-arm64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-python-arm64
ghcr.io/openhands/agent-server:04c9fe3-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:04c9fe3-golang
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-golang
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-golang
ghcr.io/openhands/agent-server:04c9fe3-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:04c9fe3-java
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-java
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-java
ghcr.io/openhands/agent-server:04c9fe3-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:04c9fe3-python
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-python
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-python
ghcr.io/openhands/agent-server:04c9fe3-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 04c9fe3-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 04c9fe3-python-amd64) are also available if needed

…e it

Some model APIs reject http(s) image_url content blocks and only accept
base64 data URLs (notably Moonshot's public Kimi endpoint, which fails
SWE-bench Multimodal runs against moonshot/kimi-k2.6 with
'MoonshotException - Invalid request: unsupported image url' for every
GitHub user-content URL — see #3155).

Approach:
- Add a typed capability bit `requires_inline_image_data` to
  `ModelFeatures`, sourced from a narrow substring list
  (`REQUIRES_INLINE_IMAGE_DATA_MODELS`, currently `moonshot/kimi-k2.6`
  only). This matches the existing pattern used for
  `force_string_serializer`, `send_reasoning_content`, etc.
- New `openhands.sdk.llm.utils.image_inline.maybe_inline_image_urls`
  pass — mirrors the shape of `image_resize.py` — fetches each non-data
  URL on `ImageContent` and rewrites it to `data:{mime};base64,{...}`.
  Bounded in-memory LRU cache so the same image isn't re-downloaded
  every conversation turn. Fetch failures fall back to the original URL
  with a warning.
- Wire into `LLM.format_messages_for_llm` right before the existing
  resize pass (inline → resize chaining gives free large-image
  protection for the inlined path).
- Add `LLM.inline_image_urls: bool | None` as an explicit override for
  proxy/alias deployments that hide the underlying model from the
  capability substring match — same shape as
  `force_string_serializer: bool | None`.
- Activate for kimi-k2.6 in `.github/run-eval/resolve_model_config.py`
  via `inline_image_urls=True` (and add it to `SDK_ONLY_PARAMS` so
  it is not forwarded to the preflight `litellm.completion` call).

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm
   llm.py5458883%489, 513, 556, 813, 922, 924–925, 953, 999, 1010–1012, 1016, 1022–1025, 1027–1034, 1042–1044, 1054–1056, 1059–1060, 1064, 1067–1068, 1070–1071, 1073, 1312–1313, 1538–1539, 1548, 1573, 1575–1580, 1582–1599, 1602–1606, 1608–1609, 1615–1624, 1681, 1683
openhands-sdk/openhands/sdk/llm/utils
   image_inline.py1051387%79, 83, 87–88, 142, 171–172, 181–182, 194, 198–200
   model_features.py60198%35
TOTAL26500763871% 

@juanmichelini

This comment was marked as outdated.

@openhands-ai

This comment was marked as outdated.

This comment was marked as outdated.

@openhands-ai

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kimi k-2.6 public endpoint gives error on swebenchmultimodal even but kimi k-2.6 private endpoint works

2 participants