-
Notifications
You must be signed in to change notification settings - Fork 262
Open
Labels
Description
Proposal
I'm currently conducting a benchmarking experiment involving a range of vision-language models (VLMs), including both open-source models like Qwen and paid services such as Gemini and GPT-4V.
Pezzo seems like a great tool for managing prompts, but at the moment, there's no clear way to use it for testing and comparing prompts across these VLMs—especially for vision-based models.
It would be extremely helpful to have support for:
- Managing and sending prompts to various VLMs, including those with visual input
- Connecting to these providers via API (e.g., Gemini, GPT-4V, Qwen-VL, etc.)
- Storing and comparing results to streamline benchmarking workflows
This kind of functionality would make Pezzo an invaluable part of research and evaluation pipelines for multimodal models.
Use-Case
No response
Is this a feature you are interested in implementing yourself?
Maybe