Skip to content

fix: detect image MIME type via magic bytes and add count budget for tool results#4940

Open
mossgowild wants to merge 1 commit intomicrosoft:mainfrom
mossgowild:moss/fix-tool-result-image
Open

fix: detect image MIME type via magic bytes and add count budget for tool results#4940
mossgowild wants to merge 1 commit intomicrosoft:mainfrom
mossgowild:moss/fix-tool-result-image

Conversation

@mossgowild
Copy link
Copy Markdown

@mossgowild mossgowild commented Apr 2, 2026

Problem

Two issues when tool results contain images (introduced by terminal image extraction in vscode#301560):

  1. Wrong MIME type for Anthropic — VS Code core infers mimeType from file extension (e.g. a file named .png containing JPEG data). When this is sent to Anthropic, the API rejects it with "declared media_type doesn't match actual image content".

  2. Too many images for Gemini — When terminal output or other tool results reference many screenshots/images, all of them get injected into the prompt with no limit, causing Gemini to error with max_prompt_images exceeded.

Related: microsoft/vscode#307431

Fix

Magic bytes detectiondetectImageMimeType() inspects the first bytes of image data (JPEG FF D8 FF, PNG 89 50 4E 47, GIF, WebP, BMP) and returns the correct MIME type. Used in imageDataPartToTSX() instead of trusting part.mimeType.

Image count budgetPrimitiveToolResult now tracks imageCountBudgetLeft, initialized to endpoint.maxPromptImages / 2 (reserving budget for user-attached images). Once exhausted, additional images from tool results are silently skipped.

Also adds null-safety for this.endpoint access in a few places.

Testing

Unit tests for detectImageMimeType covering all supported formats, edge cases (short data, empty data, unknown format), and the specific bug scenario (JPEG content with PNG extension).

…tool results

- Add detectImageMimeType() using magic bytes (JPEG/PNG/GIF/WebP/BMP) to
  correctly identify image format regardless of declared file extension.
  Fixes Anthropic API error where declared media_type doesn't match actual
  image content (e.g. file named .png but contains JPEG data).

- Use detected MIME type in imageDataPartToTSX() instead of relying on
  part.mimeType which is inferred from file extension by VS Code core.

- Add imageCountBudgetLeft to PrimitiveToolResult to cap the number of
  images injected per tool result at half of endpoint.maxPromptImages,
  reserving budget for user-attached images. Fixes Gemini error when
  terminal output references more images than the model supports.

- Add unit tests for detectImageMimeType covering all supported formats
  and edge cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants