feat(llm): inline http(s) image URLs as base64 for models that require it (Kimi K2.6) by juanmichelini · Pull Request #3219 · OpenHands/software-agent-sdk

juanmichelini · 2026-05-12T03:39:22Z

A human has tested these changes.

Why

swebenchmultimodal evaluations against litellm_proxy/moonshot/kimi-k2.6 fail 100% of conversations with:

litellm.BadRequestError: MoonshotException - Invalid request: unsupported image url: https://user-images.githubusercontent.com/...

Moonshot's public Kimi API explicitly does not support URL-formatted images — only base64 data: URLs or ms://<file_id> references (see Use the Kimi Vision Model):

URL-formatted images: Not supported, currently only supports base64-encoded image content and images/videos uploaded via file ID

LiteLLM's docs claim it downloads URLs → base64 when the upstream doesn't support URLs, but litellm/llms/moonshot/chat/transformation.py does not wire convert_url_to_base64 (unlike Gemini, Bedrock Anthropic, Vertex, Azure-AI, OpenAI), so URLs are forwarded straight to api.moonshot.ai and rejected. Full investigation: #3155 (comment).

This PR adds the missing piece in the SDK so the public Kimi endpoint behaves the same as the private one.

Summary

ModelFeatures.requires_inline_image_data: bool capability bit, populated by a narrow substring list REQUIRES_INLINE_IMAGE_DATA_MODELS that currently contains only "moonshot/kimi-k2.6". Same pattern as force_string_serializer, send_reasoning_content, etc. Provider-name matching is deliberately not used because (a) proxies/aliases erase the provider name and (b) the same provider can host the same model behind a URL-tolerant upstream (Bedrock/Azure-AI/Fireworks Kimi do not have this restriction).
New openhands.sdk.llm.utils.image_inline.maybe_inline_image_urls pass that mirrors the shape of image_resize.py. For each ImageContent, any entry that isn't already data: is fetched and rewritten as data:{mime};base64,{...}. Bounded in-memory LRU cache (64 MB by default, configurable via OH_INLINE_IMAGE_CACHE_BYTES) so the same image isn't re-downloaded every conversation turn. Per-image size cap (OH_INLINE_IMAGE_MAX_MB, default 20 MB) and 30 s fetch timeout. Fetch failures fall back to the original URL with a warning — we never silently drop images.
LLM.inline_image_urls: bool | None explicit override (same bool | None shape as force_string_serializer) for proxy/alias deployments that hide the underlying model from the capability substring match.
Wired into LLM.format_messages_for_llm right before the existing resize pass. Inline → resize chaining means the existing data:image/…-only resize util now also protects oversized Moonshot images for free.
Activated for kimi-k2.6 (and only that model) in .github/run-eval/resolve_model_config.py via "inline_image_urls": True. Added "inline_image_urls" to SDK_ONLY_PARAMS so it is not forwarded to the preflight litellm.completion call.

Issue Number

Fixes #3155.

How to Test

Unit tests (in this repo):

uv run pytest tests/sdk/llm/test_image_inline.py \
              tests/sdk/llm/test_model_features.py \
              tests/sdk/llm/test_llm_image_resizing.py \
              tests/sdk/llm/test_llm.py \
              tests/cross/test_resolve_model_config.py -q

Result on this branch: 255 passed.

Full LLM suite still green:

uv run pytest tests/sdk/llm -q
# => 741 passed

The new test file tests/sdk/llm/test_image_inline.py covers:

No-op fast path when the capability is off or vision is disabled.
HTTP URL is rewritten to data:image/png;base64,… end-to-end.
data: URLs pass through unchanged with no network call.
Fetch failure falls back to original URL.
In-memory cache deduplicates repeated URLs.
get_features("moonshot/kimi-k2.6").requires_inline_image_data is True, plus negative cases for sibling Kimi models and Bedrock/Fireworks-hosted Kimi.
LLM.inline_image_urls=True opts in even on unrelated models.
LLM(model="litellm_proxy/moonshot/kimi-k2.6") auto-inlines without override.
LLM.inline_image_urls=False opts out even when the capability default would opt in (verified by asserting httpx.Client is never constructed).

End-to-end (still TODO before merging):

A full swebenchmultimodal rerun against litellm_proxy/moonshot/kimi-k2.6 is the canonical real-world test — the failure mode reproduced in #3155 is exactly the upstream API rejecting image_url.url http(s) values, so unit tests can only confirm we are now sending base64 instead. Suggested rollout: trigger one small eval_limit=10 swebenchmultimodal run via the existing eval-trigger workflow with MODEL_IDS=kimi-k2.6 and confirm the new run has 0 LLMBadRequestError: MoonshotException - Invalid request: unsupported image url failures (the public run linked in the issue had 10/10).

Video/Screenshots

N/A — backend-only change, no UI.

Type

Notes

The capability list is intentionally narrow (only moonshot/kimi-k2.6 today). Reviewers should expand it only after confirming a specific upstream rejects URL images in production. The same model behind Bedrock/Azure-AI/Fireworks does not need it.
Defaults preserve every existing model's behaviour: requires_inline_image_data=False, LLM.inline_image_urls=None. Old serialised configs deserialise unchanged.
A parallel upstream fix in BerriAI/litellm (adding convert_url_to_base64 to the Moonshot transformer, mirroring gemini/chat/transformation.py) would benefit every LiteLLM consumer and let us remove this pass. That is independent of this PR.

_{This PR was created by an AI agent (OpenHands) on behalf of @juanmichelini.}

@juanmichelini can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:04c9fe3-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-04c9fe3-python \
  ghcr.io/openhands/agent-server:04c9fe3-python

All tags pushed for this build

ghcr.io/openhands/agent-server:04c9fe3-golang-amd64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-golang-amd64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-golang-amd64
ghcr.io/openhands/agent-server:04c9fe3-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:04c9fe3-golang-arm64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-golang-arm64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-golang-arm64
ghcr.io/openhands/agent-server:04c9fe3-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:04c9fe3-java-amd64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-java-amd64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-java-amd64
ghcr.io/openhands/agent-server:04c9fe3-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:04c9fe3-java-arm64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-java-arm64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-java-arm64
ghcr.io/openhands/agent-server:04c9fe3-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:04c9fe3-python-amd64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-python-amd64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-python-amd64
ghcr.io/openhands/agent-server:04c9fe3-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:04c9fe3-python-arm64
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-python-arm64
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-python-arm64
ghcr.io/openhands/agent-server:04c9fe3-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:04c9fe3-golang
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-golang
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-golang
ghcr.io/openhands/agent-server:04c9fe3-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:04c9fe3-java
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-java
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-java
ghcr.io/openhands/agent-server:04c9fe3-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:04c9fe3-python
ghcr.io/openhands/agent-server:04c9fe35a783c6909302760b13707c03bafcde20-python
ghcr.io/openhands/agent-server:openhands-inline-image-urls-moonshot-python
ghcr.io/openhands/agent-server:04c9fe3-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., 04c9fe3-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 04c9fe3-python-amd64) are also available if needed

…e it Some model APIs reject http(s) image_url content blocks and only accept base64 data URLs (notably Moonshot's public Kimi endpoint, which fails SWE-bench Multimodal runs against moonshot/kimi-k2.6 with 'MoonshotException - Invalid request: unsupported image url' for every GitHub user-content URL — see #3155). Approach: - Add a typed capability bit `requires_inline_image_data` to `ModelFeatures`, sourced from a narrow substring list (`REQUIRES_INLINE_IMAGE_DATA_MODELS`, currently `moonshot/kimi-k2.6` only). This matches the existing pattern used for `force_string_serializer`, `send_reasoning_content`, etc. - New `openhands.sdk.llm.utils.image_inline.maybe_inline_image_urls` pass — mirrors the shape of `image_resize.py` — fetches each non-data URL on `ImageContent` and rewrites it to `data:{mime};base64,{...}`. Bounded in-memory LRU cache so the same image isn't re-downloaded every conversation turn. Fetch failures fall back to the original URL with a warning. - Wire into `LLM.format_messages_for_llm` right before the existing resize pass (inline → resize chaining gives free large-image protection for the inlined path). - Add `LLM.inline_image_urls: bool | None` as an explicit override for proxy/alias deployments that hide the underlying model from the capability substring match — same shape as `force_string_serializer: bool | None`. - Activate for kimi-k2.6 in `.github/run-eval/resolve_model_config.py` via `inline_image_urls=True` (and add it to `SDK_ONLY_PARAMS` so it is not forwarded to the preflight `litellm.completion` call). Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-05-12T03:39:55Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-12T03:40:01Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-12T03:42:22Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/llm
llm.py	545	88	83%	489, 513, 556, 813, 922, 924–925, 953, 999, 1010–1012, 1016, 1022–1025, 1027–1034, 1042–1044, 1054–1056, 1059–1060, 1064, 1067–1068, 1070–1071, 1073, 1312–1313, 1538–1539, 1548, 1573, 1575–1580, 1582–1599, 1602–1606, 1608–1609, 1615–1624, 1681, 1683
openhands-sdk/openhands/sdk/llm/utils
image_inline.py	105	13	87%	79, 83, 87–88, 142, 171–172, 181–182, 194, 198–200
model_features.py	60	1	98%	35
TOTAL	26500	7638	71%

This comment was marked as outdated.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): inline http(s) image URLs as base64 for models that require it (Kimi K2.6)#3219

feat(llm): inline http(s) image URLs as base64 for models that require it (Kimi K2.6)#3219
juanmichelini wants to merge 1 commit into
mainfrom
openhands/inline-image-urls-moonshot

juanmichelini commented May 12, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

juanmichelini commented May 12, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

Issue Number

How to Test

Video/Screenshots

Type

Notes

Uh oh!

github-actions Bot commented May 12, 2026

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented May 12, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

juanmichelini commented May 12, 2026 •

edited by github-actions Bot

Loading