Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 29 additions & 29 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
{
"name": "ai-agent-dev",
"build": {
"dockerfile": "Dockerfile"
},

// This is where your repo will be mounted inside the container
"remoteUser": "vscode",
"workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",

"customizations": {
"vscode": {
"settings": {
"python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python",
"python.envFile": "${workspaceFolder}/.env"
},
"extensions": [
"ms-python.python",
"ms-python.vscode-pylance",
"tamasfe.even-better-toml"
]
}
},

"forwardPorts": [7860],

// Install project in editable mode after the container is built
"postCreateCommand": "rm -rf .venv && uv venv && uv pip install -e . && echo '. $PWD/.venv/bin/activate' >> /home/vscode/.bashrc"
}
{
"name": "ai-agent-dev",
"build": {
"dockerfile": "Dockerfile"
},
// This is where your repo will be mounted inside the container
"remoteUser": "vscode",
"workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
"customizations": {
"vscode": {
"settings": {
"python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python",
"python.envFile": "${workspaceFolder}/.env"
},
"extensions": [
"ms-python.python",
"ms-python.vscode-pylance",
"tamasfe.even-better-toml"
]
}
},
"forwardPorts": [7860],
// Install project in editable mode after the container is built
"postCreateCommand": "rm -rf .venv && uv venv && uv pip install -e . && echo '. $PWD/.venv/bin/activate' >> /home/vscode/.bashrc"
}
36 changes: 18 additions & 18 deletions .env.dist
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
OPENAI_API_KEY=sk-xxxx
GITHUB_TOKEN=ghp_xxxx
# Optional model overrides (defaults work):
OPENAI_MODEL=gpt-4o

# Software catalog
SOFTWARE_CATALOG=path/to/your/catalog.jsonl

# Pipeline configuration
TOP_K=8 # Number of candidates to retrieve
NUM_CHOICES=3 # Number of tools to recommend
USE_AGENT=1 # Use pydantic-ai agent (1) or standard pipeline (0)

# Logging configuration
LOGLEVEL_CONSOLE=WARNING
LOGLEVEL_FILE=INFO
FILE_LOG=1
LOG_DIR=logs
OPENAI_API_KEY=sk-xxxx
GITHUB_TOKEN=ghp_xxxx
# Optional model overrides (defaults work):
OPENAI_MODEL=gpt-4o
# Software catalog
SOFTWARE_CATALOG=path/to/your/catalog.jsonl
# Pipeline configuration
TOP_K=8 # Number of candidates to retrieve
NUM_CHOICES=3 # Number of tools to recommend
USE_AGENT=1 # Use pydantic-ai agent (1) or standard pipeline (0)
# Logging configuration
LOGLEVEL_CONSOLE=WARNING
LOGLEVEL_FILE=INFO
FILE_LOG=1
LOG_DIR=logs
LOG_PROMPTS=0 # write selector prompt snapshots
258 changes: 129 additions & 129 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -1,130 +1,130 @@
# AI Agent — Copilot Instructions

This is a **RAG + VLM imaging tool recommender** that helps users find the right imaging software for their images and tasks. Users drop an image, describe their task, and get ranked software recommendations with demo links.

## Architecture Overview

The system follows a two-stage pipeline:

1. **Retrieval Stage** (`retriever/`, `api/pipeline.py`): Fast text search using BGE-M3 embeddings + CrossEncoder reranker. No LLM calls. Returns top-K candidates.

2. **Selection Stage** (`generator/`): Single VLM call (OpenAI GPT-4o/mini) that sees the image + candidates + metadata and returns ranked recommendations with accuracy scores.

### Key Components

- **`api/pipeline.RAGImagingPipeline`**: Main orchestrator. Handles file validation, metadata extraction, retrieval, and VLM selection.
- **`retriever/embedders.py`**: FAISS vector index with BGE-M3 + CrossEncoder reranker for candidate retrieval.
- **`generator/generator.VLMToolSelector`**: Vision-language model that selects best tools from candidates.
- **`utils/image_meta.py`**: Robust metadata extraction for DICOM, NIfTI, TIFF stacks with medical imaging focus.
- **`utils/tags.py`**: Control tags system for query refinement (`[EXCLUDE:tool1|tool2]`, `[NO_RERANK]`, `[REFINE]`).

## Data Flow Patterns

### Input Processing
- Files validated via `utils/file_validator.py` (size limits, format checks)
- Images converted to PNG previews for VLM via `utils/previews.py`
- Metadata extracted preserving original format info (critical for format compatibility matching)
- Format tokens added to retrieval query (e.g. `format:DICOM format:NIfTI`)

### Retrieval Query Construction
```python
# Clean task text + format tokens from uploaded files
query = f"{clean_task} format:{ext_tokens}" # e.g. "segment lungs format:DICOM"
```

### VLM Selection Input
The VLM receives:
- **Text**: User task + candidate table + original file metadata
- **Image**: PNG preview (converted from any format)
- **Metadata**: Original extension, dimensions, file info (crucial for IO compatibility)

## Critical Patterns

### Error Handling
- **Graceful degradation**: If image conversion fails, continue text-only
- **Robust metadata**: All metadata extraction wrapped in try/catch with sensible defaults
- **File validation**: Early validation prevents downstream errors

### Control Tags System
Users can control behavior via tags in their queries:
- `[EXCLUDE:toolname1|toolname2]` - Exclude specific tools from results
- `[NO_RERANK]` - Skip CrossEncoder reranker (faster, less accurate)
- `[REFINE]` - Force clarification turn for alternatives

### Conversation Flow
- **Complete**: Normal success with tool recommendations
- **Needs Clarification**: VLM asks followup questions when task is ambiguous
- **Terminal No-Tool**: No suitable tools found with explanation

## Development Workflows

### Running the App
```bash
# Install with pip using pyproject.toml
pip install -e ".[dev]"

# Configure .env with OPENAI_API_KEY and SOFTWARE_CATALOG path
ai_agent ui # Launches Gradio on port 7860
```

### Testing
- **`tests/full_test.py`**: End-to-end pipeline tests driven by `tests/data/test_data.json`
- Uses test doubles for VLM calls to avoid API costs
- Run with: `pytest tests/`

### Change Documentation
- **`CHANGELOG.md`**: Follow [Keep a Changelog](https://keepachangelog.com/) format
- Use semantic versioning with sections: Added, Changed, Deprecated, Removed, Fixed, Security
- Update CHANGELOG.md for ALL user-facing changes before merging PRs
- Format: `### Added\n- New feature description` under version heading
- Version entries: `## [x.y.z] - YYYY-MM-DD`

### Environment Management
- **uv**: Fast Python package manager used in `tools/image/Dockerfile`
- Creates isolated `.venv` environments for reproducible builds
- Dockerfile uses `uv venv && uv pip install -e .` pattern for container builds

### Logging & Debugging
- Set `LOG_PROMPTS=1` to save VLM prompts + images to `logs/`
- File logs in `logs/app_YYYYMMDD.log` with structured JSON events
- Console/file log levels configurable via `.env`

## Project Conventions

### Schema Patterns
- **Pydantic models** in `generator/schema.py` with robust field validation and aliasing for catalog compatibility
- **Enum-based** conversation states and tool reasons for type safety
- **Field normalization**: Dimensions (2D/3D/4D), modalities (CT/MRI/XR), file formats via validators

### Catalog Integration
- Software catalog in JSONL format following schema.org SoftwareSourceCode structure
- **Runnable examples**: Links to HuggingFace Spaces, notebooks, web demos
- **Supporting data**: Format compatibility info used for matching

### Module Boundaries
- `api/`: Pipeline orchestration, no UI dependencies
- `generator/`: Pure VLM logic, no retrieval dependencies
- `retriever/`: Pure vector search, no generation dependencies
- `utils/`: Shared utilities, no business logic
- `ui/`: Gradio interface only

### Configuration
- Environment-based config via `.env` (API keys, model names, catalog paths)
- Sensible defaults for all settings
- No hardcoded paths or credentials

## Medical Imaging Context

This tool specializes in medical/scientific imaging:
- **Modalities**: CT, MRI, X-ray, Ultrasound, PET, SPECT, Microscopy
- **Formats**: DICOM, NIfTI, TIFF stacks, standard images
- **Dimensions**: 2D images, 3D volumes, 4D timeseries
- **Tasks**: Segmentation, registration, analysis, visualization

The VLM selection considers format compatibility as a primary factor - tools supporting the user's input format are strongly preferred.

## Security Notes
- Only makes external calls to OpenAI VLM API (with user image preview)
- Never uploads user data to third-party tool demos
- Returns links only; user chooses whether to visit demos
# AI Agent — Copilot Instructions
This is a **RAG + VLM imaging tool recommender** that helps users find the right imaging software for their images and tasks. Users drop an image, describe their task, and get ranked software recommendations with demo links.
## Architecture Overview
The system follows a two-stage pipeline:
1. **Retrieval Stage** (`retriever/`, `api/pipeline.py`): Fast text search using BGE-M3 embeddings + CrossEncoder reranker. No LLM calls. Returns top-K candidates.
2. **Selection Stage** (`generator/`): Single VLM call (OpenAI GPT-4o/mini) that sees the image + candidates + metadata and returns ranked recommendations with accuracy scores.
### Key Components
- **`api/pipeline.RAGImagingPipeline`**: Main orchestrator. Handles file validation, metadata extraction, retrieval, and VLM selection.
- **`retriever/embedders.py`**: FAISS vector index with BGE-M3 + CrossEncoder reranker for candidate retrieval.
- **`generator/generator.VLMToolSelector`**: Vision-language model that selects best tools from candidates.
- **`utils/image_meta.py`**: Robust metadata extraction for DICOM, NIfTI, TIFF stacks with medical imaging focus.
- **`utils/tags.py`**: Control tags system for query refinement (`[EXCLUDE:tool1|tool2]`, `[NO_RERANK]`, `[REFINE]`).
## Data Flow Patterns
### Input Processing
- Files validated via `utils/file_validator.py` (size limits, format checks)
- Images converted to PNG previews for VLM via `utils/previews.py`
- Metadata extracted preserving original format info (critical for format compatibility matching)
- Format tokens added to retrieval query (e.g. `format:DICOM format:NIfTI`)
### Retrieval Query Construction
```python
# Clean task text + format tokens from uploaded files
query = f"{clean_task} format:{ext_tokens}" # e.g. "segment lungs format:DICOM"
```
### VLM Selection Input
The VLM receives:
- **Text**: User task + candidate table + original file metadata
- **Image**: PNG preview (converted from any format)
- **Metadata**: Original extension, dimensions, file info (crucial for IO compatibility)
## Critical Patterns
### Error Handling
- **Graceful degradation**: If image conversion fails, continue text-only
- **Robust metadata**: All metadata extraction wrapped in try/catch with sensible defaults
- **File validation**: Early validation prevents downstream errors
### Control Tags System
Users can control behavior via tags in their queries:
- `[EXCLUDE:toolname1|toolname2]` - Exclude specific tools from results
- `[NO_RERANK]` - Skip CrossEncoder reranker (faster, less accurate)
- `[REFINE]` - Force clarification turn for alternatives
### Conversation Flow
- **Complete**: Normal success with tool recommendations
- **Needs Clarification**: VLM asks followup questions when task is ambiguous
- **Terminal No-Tool**: No suitable tools found with explanation
## Development Workflows
### Running the App
```bash
# Install with pip using pyproject.toml
pip install -e ".[dev]"
# Configure .env with OPENAI_API_KEY and SOFTWARE_CATALOG path
ai_agent ui # Launches Gradio on port 7860
```
### Testing
- **`tests/full_test.py`**: End-to-end pipeline tests driven by `tests/data/test_data.json`
- Uses test doubles for VLM calls to avoid API costs
- Run with: `pytest tests/`
### Change Documentation
- **`CHANGELOG.md`**: Follow [Keep a Changelog](https://keepachangelog.com/) format
- Use semantic versioning with sections: Added, Changed, Deprecated, Removed, Fixed, Security
- Update CHANGELOG.md for ALL user-facing changes before merging PRs
- Format: `### Added\n- New feature description` under version heading
- Version entries: `## [x.y.z] - YYYY-MM-DD`
### Environment Management
- **uv**: Fast Python package manager used in `tools/image/Dockerfile`
- Creates isolated `.venv` environments for reproducible builds
- Dockerfile uses `uv venv && uv pip install -e .` pattern for container builds
### Logging & Debugging
- Set `LOG_PROMPTS=1` to save VLM prompts + images to `logs/`
- File logs in `logs/app_YYYYMMDD.log` with structured JSON events
- Console/file log levels configurable via `.env`
## Project Conventions
### Schema Patterns
- **Pydantic models** in `generator/schema.py` with robust field validation and aliasing for catalog compatibility
- **Enum-based** conversation states and tool reasons for type safety
- **Field normalization**: Dimensions (2D/3D/4D), modalities (CT/MRI/XR), file formats via validators
### Catalog Integration
- Software catalog in JSONL format following schema.org SoftwareSourceCode structure
- **Runnable examples**: Links to HuggingFace Spaces, notebooks, web demos
- **Supporting data**: Format compatibility info used for matching
### Module Boundaries
- `api/`: Pipeline orchestration, no UI dependencies
- `generator/`: Pure VLM logic, no retrieval dependencies
- `retriever/`: Pure vector search, no generation dependencies
- `utils/`: Shared utilities, no business logic
- `ui/`: Gradio interface only
### Configuration
- Environment-based config via `.env` (API keys, model names, catalog paths)
- Sensible defaults for all settings
- No hardcoded paths or credentials
## Medical Imaging Context
This tool specializes in medical/scientific imaging:
- **Modalities**: CT, MRI, X-ray, Ultrasound, PET, SPECT, Microscopy
- **Formats**: DICOM, NIfTI, TIFF stacks, standard images
- **Dimensions**: 2D images, 3D volumes, 4D timeseries
- **Tasks**: Segmentation, registration, analysis, visualization
The VLM selection considers format compatibility as a primary factor - tools supporting the user's input format are strongly preferred.
## Security Notes
- Only makes external calls to OpenAI VLM API (with user image preview)
- Never uploads user data to third-party tool demos
- Returns links only; user chooses whether to visit demos
- Prompt logging is optional and local-only
Loading