Conversation
Add --image flag to the bench tool to enable benchmarking VLM models served via Lemonade Server. When an image file path or URL is provided, the benchmark sends multimodal prompts (image + text) using the OpenAI chat completions format. The -p flag controls the text portion of the prompt while image tokens are handled by the server. - Add _prepare_image_url() to ServerAdapter for base64/URL image handling - Add image parameter to ServerAdapter.generate() with multimodal payload - Add --image CLI argument to ServerBench parser - Add VLM-specific prompt prefix for image description benchmarks - Override parse() in ServerBench to extract image before prompt processing - Existing text-only LLM benchmarking is completely unaffected
Replace the --image-detail flag (low/high/auto) with --image-size which accepts either WIDTHxHEIGHT (e.g. 1024x800) for exact dimensions or a single integer (e.g. 384) to cap the longest side while preserving aspect ratio. The image is resized client-side using Pillow before base64 encoding, which reduces visual token count to fit within the model's context window.
Document the --image and --image-size flags for Vision-Language Model benchmarking, with examples showing exact dimensions and aspect-ratio- preserving resize modes.
…or imports inside function
formatting the line
ramkrishna2910
left a comment
There was a problem hiding this comment.
Feature works, added a few comments.
|
@praveen-iyer Great addition! Can you also extend the |
Thanks for the idea. I'm hoping to address this as part of a new PR as we need this PR to be merged soon for an engagement! |

Summary
Add support for benchmarking Vision-Language Models (VLMs) served via Lemonade Server, alongside the existing text-only LLM benchmarking.
Changes
--imageflag: Provide an image file path or URL to thebenchtool. Each benchmark iteration sends a multimodal prompt (image + text) using the OpenAI chat completions format.--image-sizeflag: Resize the image client-side before sending to control visual token count. AcceptsWIDTHxHEIGHT(e.g.1024x800) for exact dimensions or a single integer (e.g.384) to cap the longest side while preserving aspect ratio.--imageis used, the synthetic prompt prefix is replaced with one that encourages detailed image description for longer output.Example Output