pi-vision

A pi extension that provides a dedicated analyze_image tool for vision-based image analysis using configurable vision models.

Why?

Many models don't have vision capabilities (e.g., GLM text models). This extension provides an analyze_image tool that agents can call on-demand to analyze images using separate vision-capable models like glm-4.6v or Claude Sonnet.

Features

Dedicated analyze_image tool: Agents can call this tool on demand when they need to understand image content
Configurable vision models: Define multiple vision provider/model combinations in settings
Automatic model selection: Uses the configured default or first available model
Image preview for users: Shows image in TUI during analysis (display only, not stored in session)
Session-efficient: Returns text-only results to avoid flooding session with base64 image data
Image metadata: Displays path, size, and MIME type
Manual analysis command: /analyze-image <path> for interactive image analysis
Runtime model switching: /vision-model to view and switch between configured models

Installation

Install globally from git:

pi install git:github.com/kifirkin/pi-vision

Or install for a specific project (writes to .pi/settings.json):

pi install -l git:github.com/kifirkin/pi-vision

To try it without installing:

pi -e git:github.com/kifirkin/pi-vision

Configuration (Required)

You must configure vision models before using this extension. Add configuration to your settings file:

Global settings (`~/.pi/agent/settings.json`):

{
  "visionModels": [
    { "provider": "zai", "model": "glm-4.6v" },
    { "provider": "anthropic", "model": "claude-sonnet-4-5" },
    { "provider": "openai", "model": "gpt-4o" }
  ],
  "visionModel": "zai/glm-4.6v"
}

Project settings (`.pi/settings.json`):

Project settings override global settings:

{
  "visionModels": [
    { "provider": "zai", "model": "glm-4.6v" }
  ]
}

Configuration options:

Setting	Type	Required	Description
`visionModels`	Array	Yes	List of `{provider, model}` objects for vision analysis
`visionModel`	String	No	Default model to use, format: `"provider/model"`. If not set, uses first model in list.
`maxImageSizeMB`	Number	No	Warn when images exceed this size in MB (default: 5). Analysis still proceeds, but may be slower.

Image Size Considerations

Vision models automatically resize images, but large files affect performance:

Size	Recommendation
< 1MB	✅ Optimal - fast analysis
1-5MB	✅ Good - standard for screenshots
5-10MB	⚠️ Slow - consider resizing first
> 10MB	⚠️ Very slow - strongly recommend resizing

Tips for large images:

The extension warns when images exceed maxImageSizeMB (default: 5MB)
Vision models typically resize to ~1024px or ~2000px on longest side anyway
For 4K screenshots, consider cropping to the relevant area
PNG screenshots can often be converted to JPEG for photos (not diagrams/code)

Usage

As an Agent Tool

The LLM can call the analyze_image tool when needed:

analyze_image({"path": "./screenshot.png"})

The tool will:

Check if the file is a supported image format
Show progress with image metadata (path, size, MIME type)
Display image preview in TUI for user (during analysis only)
Use the configured vision model to analyze the image
Return text-only analysis result (stored in session)
Include image path in tool details for traceability

Session Behavior

Important design decision for token efficiency:

During analysis: User sees image preview in TUI via onUpdate
After analysis: Session stores only text analysis + image path reference
Image data: NOT stored in session history to avoid token flooding (~4800 tokens per image)

This ensures:

✅ Users can see images during analysis
✅ Session has trace of what images were analyzed (via path in details)
✅ No base64 image data in session = no token cost for subsequent LLM calls
✅ Compaction not impacted by large image payloads

Example stored in session:

[Image analyzed: /path/to/screenshot.png]

{comprehensive analysis text...}

With tool details:

{
  "path": "/path/to/screenshot.png",
  "visionModel": { "provider": "zai", "model": "glm-4.6v" },
  "imageAnalyzed": true
}

Commands

`/analyze-image <path>`

Manually analyze an image:

/analyze-image ./screenshot.png

`/vision-model [provider/model]`

View or change the active vision model:

/vision-model                    # Show active and available models
/vision-model zai/glm-4.6v      # Switch to specific model

Supported Image Formats

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
WebP (.webp)

License

MIT License - see LICENSE file

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pi-vision

Why?

Features

Installation

Configuration (Required)

Global settings (`~/.pi/agent/settings.json`):

Project settings (`.pi/settings.json`):

Configuration options:

Image Size Considerations

Usage

As an Agent Tool

Session Behavior

Commands

`/analyze-image <path>`

`/vision-model [provider/model]`

Supported Image Formats

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pi-vision

Why?

Features

Installation

Configuration (Required)

Global settings (~/.pi/agent/settings.json):

Project settings (.pi/settings.json):

Configuration options:

Image Size Considerations

Usage

As an Agent Tool

Session Behavior

Commands

/analyze-image <path>

/vision-model [provider/model]

Supported Image Formats

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Global settings (`~/.pi/agent/settings.json`):

Project settings (`.pi/settings.json`):

`/analyze-image <path>`

`/vision-model [provider/model]`

Packages