A pi extension that provides a dedicated analyze_image tool for vision-based image analysis using configurable vision models.
Many models don't have vision capabilities (e.g., GLM text models). This extension provides an analyze_image tool that agents can call on-demand to analyze images using separate vision-capable models like glm-4.6v or Claude Sonnet.
- Dedicated analyze_image tool: Agents can call this tool on demand when they need to understand image content
- Configurable vision models: Define multiple vision provider/model combinations in settings
- Automatic model selection: Uses the configured default or first available model
- Image preview for users: Shows image in TUI during analysis (display only, not stored in session)
- Session-efficient: Returns text-only results to avoid flooding session with base64 image data
- Image metadata: Displays path, size, and MIME type
- Manual analysis command:
/analyze-image <path>for interactive image analysis - Runtime model switching:
/vision-modelto view and switch between configured models
Install globally from git:
pi install git:github.com/kifirkin/pi-visionOr install for a specific project (writes to .pi/settings.json):
pi install -l git:github.com/kifirkin/pi-visionTo try it without installing:
pi -e git:github.com/kifirkin/pi-visionYou must configure vision models before using this extension. Add configuration to your settings file:
{
"visionModels": [
{ "provider": "zai", "model": "glm-4.6v" },
{ "provider": "anthropic", "model": "claude-sonnet-4-5" },
{ "provider": "openai", "model": "gpt-4o" }
],
"visionModel": "zai/glm-4.6v"
}Project settings override global settings:
{
"visionModels": [
{ "provider": "zai", "model": "glm-4.6v" }
]
}| Setting | Type | Required | Description |
|---|---|---|---|
visionModels |
Array | Yes | List of {provider, model} objects for vision analysis |
visionModel |
String | No | Default model to use, format: "provider/model". If not set, uses first model in list. |
maxImageSizeMB |
Number | No | Warn when images exceed this size in MB (default: 5). Analysis still proceeds, but may be slower. |
Vision models automatically resize images, but large files affect performance:
| Size | Recommendation |
|---|---|
| < 1MB | ✅ Optimal - fast analysis |
| 1-5MB | ✅ Good - standard for screenshots |
| 5-10MB | |
| > 10MB |
Tips for large images:
- The extension warns when images exceed
maxImageSizeMB(default: 5MB) - Vision models typically resize to ~1024px or ~2000px on longest side anyway
- For 4K screenshots, consider cropping to the relevant area
- PNG screenshots can often be converted to JPEG for photos (not diagrams/code)
The LLM can call the analyze_image tool when needed:
analyze_image({"path": "./screenshot.png"})
The tool will:
- Check if the file is a supported image format
- Show progress with image metadata (path, size, MIME type)
- Display image preview in TUI for user (during analysis only)
- Use the configured vision model to analyze the image
- Return text-only analysis result (stored in session)
- Include image path in tool details for traceability
Important design decision for token efficiency:
- During analysis: User sees image preview in TUI via
onUpdate - After analysis: Session stores only text analysis + image path reference
- Image data: NOT stored in session history to avoid token flooding (~4800 tokens per image)
This ensures:
- ✅ Users can see images during analysis
- ✅ Session has trace of what images were analyzed (via path in details)
- ✅ No base64 image data in session = no token cost for subsequent LLM calls
- ✅ Compaction not impacted by large image payloads
Example stored in session:
[Image analyzed: /path/to/screenshot.png]
{comprehensive analysis text...}
With tool details:
{
"path": "/path/to/screenshot.png",
"visionModel": { "provider": "zai", "model": "glm-4.6v" },
"imageAnalyzed": true
}Manually analyze an image:
/analyze-image ./screenshot.png
View or change the active vision model:
/vision-model # Show active and available models
/vision-model zai/glm-4.6v # Switch to specific model
- JPEG (.jpg, .jpeg)
- PNG (.png)
- GIF (.gif)
- WebP (.webp)
MIT License - see LICENSE file