fix(sdk): resize Anthropic many-image inputs#2552
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands pls merge from main, resolve all conflicts. Then do /codereview-roasted /github-pr-review |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
xingyaoww
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable — Works, but the structure needs improvement
Linus's Three Questions:
- Is this solving a real problem? — Yes. Anthropic's many-image limit is a real production failure.
- Is there a simpler way? — Yes. This is ~80 lines of image manipulation code jammed into a 1500-line god-class. Extract it.
- What will this break? — Adding
pillowas a hard runtime dependency to the core SDK is the biggest concern. Every user now pays for PIL whether they use images or not.
VERDICT:
❌ Needs rework — The fix is directionally correct, but the dependency strategy and code placement need redesign before merging.
KEY INSIGHT:
The core problem is treating PIL as a hard SDK dependency and stuffing image-processing plumbing into the LLM class, when this should be a lazy-loaded utility module.
|
OpenHands encountered an error: Request timeout after 30 seconds to https://ypvqnvwqmvbumzax.prod-runtime.all-hands.dev/api/conversations/9650ebc3-b326-46ce-88ee-b683ce63e259/ask_agent See the conversation for more information. |
Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands do /codereview-roasted /github-pr-review |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
xingyaoww
left a comment
There was a problem hiding this comment.
🟡 Acceptable — Core logic is correct and the tests are solid, but there are design issues worth addressing before merge.
Linus-Style Analysis:
The fundamental idea is sound: intercept oversized base64 images before they hit Anthropic's API limits. The Anthropic docs research is correct (20-image threshold, 2000px vs 8000px caps), and the resize logic itself is clean.
But the plumbing around it — threading a PIL module as Any through three functions, silent in-place mutation — adds unnecessary complexity that a simpler design would eliminate entirely.
This review was generated by an AI agent (OpenHands).
SummaryThe request was to perform a Checklist
Key Findings Posted
No extraneous changes were made — this was purely a review action with no code modifications. |
|
[Automatic Post]: This PR seems to be currently waiting for review. @xingyaoww @Zheng-Lu @openhands-ai[bot], could you please take a look when you have a chance? |
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
|
[Automatic Post]: It has been a while since there was any activity on this PR. @Zheng-Lu, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
[Automatic Post]: This PR seems to be currently waiting for review. @xingyaoww, could you please take a look when you have a chance? |
|
@OpenHands in the existing example file that tests for vision support, can you add another send_message that sends image inputs that would properly exercise the stuff we've introduced in this PR |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Add a second send_message in 17_image_input.py that sends 21 oversized base64 images (2500×100 px each), triggering the Anthropic many-image threshold (>20 images → 2000 px cap). This exercises the image resizing path introduced in this PR end-to-end. Co-authored-by: openhands <openhands@all-hands.dev>
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
The wait-on-check-action was using github.event.pull_request.head.ref (branch name) to look up check runs. For fork PRs, the branch name doesn't exist in the upstream repo, causing a 422 error: 'No commit found for SHA: fix/2467-image-downscale' Switch to github.event.pull_request.head.sha which always resolves correctly regardless of whether the PR is from a fork. Co-authored-by: openhands <openhands@all-hands.dev>
The Build & Push workflow doesn't run for fork PRs, so the wait-on-check-action was timing out and failing with 'The requested check was never run against this ref'. Set fail-on-no-checks: false so the wait step is a no-op when the check doesn't exist (fork PRs) while still waiting properly when it does (non-fork PRs). Co-authored-by: openhands <openhands@all-hands.dev>
Fork PRs have restricted GITHUB_TOKEN permissions and cannot write comments on the upstream repo's PR. The update_comment function was calling exit 1 on failure, which aborted the entire test run. Change to emit warnings and continue when comment creation fails, clearing API_URL to skip subsequent comment attempts. Co-authored-by: openhands <openhands@all-hands.dev>
SummaryThe user asked me to fix the CI failure at Was the request addressed?
Are the changes concise?Yes — all three changes are to Remaining limitationThe 48/53 example test failures after the infrastructure fixes are due to GitHub Actions not exposing repository secrets ( |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @Zheng-Lu, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
@OpenHands please merge from main and resolve conflicts |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
…lict Keep main's litellm>=1.83.7 and PR's pillow>=12.1.1 dependency. Co-authored-by: openhands <openhands@all-hands.dev>
SummaryThe request to merge from Checklist
ConcisenessThe changes are minimal — only the merge commit resolving one conflict. No extraneous changes were made. |
xingyaoww
left a comment
There was a problem hiding this comment.
Thanks! Sorry for taking so long :(
Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: Xingyao Wang <xingyaoww@gmail.com>
#2467
Summary
Reproduces and fixes the Anthropic many-image failure by resizing oversized base64 images during LLM message formatting.
What Changed
LLM.format_messages_for_llmpillowas a runtime dependency for in-memory image resizingValidation
pytest tests/sdk/llm/test_llm_image_resizing.pyPassedpytest tests/sdk/llm/test_llm_image_resizing.py tests/sdk/llm/test_vision_support.pyPassedruff check openhands-sdk/openhands/sdk/llm/llm.py tests/sdk/llm/test_llm_image_resizing.pyPassedpyright openhands-sdk/openhands/sdk/llm/llm.py tests/sdk/llm/test_llm_image_resizing.pyPassedProof
Now the multiple-images request with at least one image > 2000px doesn't throw the error litellm.BadRequestError
