Skip to content

Commit 17148e6

Browse files
authored
Merge pull request #7 from Imaging-Plaza/feature-ui
Feature UI
2 parents 22000fe + 0000052 commit 17148e6

36 files changed

Lines changed: 5059 additions & 4966 deletions

.devcontainer/devcontainer.json

Lines changed: 29 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,29 @@
1-
{
2-
"name": "ai-agent-dev",
3-
"build": {
4-
"dockerfile": "Dockerfile"
5-
},
6-
7-
// This is where your repo will be mounted inside the container
8-
"remoteUser": "vscode",
9-
"workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
10-
11-
"customizations": {
12-
"vscode": {
13-
"settings": {
14-
"python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python",
15-
"python.envFile": "${workspaceFolder}/.env"
16-
},
17-
"extensions": [
18-
"ms-python.python",
19-
"ms-python.vscode-pylance",
20-
"tamasfe.even-better-toml"
21-
]
22-
}
23-
},
24-
25-
"forwardPorts": [7860],
26-
27-
// Install project in editable mode after the container is built
28-
"postCreateCommand": "rm -rf .venv && uv venv && uv pip install -e . && echo '. $PWD/.venv/bin/activate' >> /home/vscode/.bashrc"
29-
}
1+
{
2+
"name": "ai-agent-dev",
3+
"build": {
4+
"dockerfile": "Dockerfile"
5+
},
6+
7+
// This is where your repo will be mounted inside the container
8+
"remoteUser": "vscode",
9+
"workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
10+
11+
"customizations": {
12+
"vscode": {
13+
"settings": {
14+
"python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python",
15+
"python.envFile": "${workspaceFolder}/.env"
16+
},
17+
"extensions": [
18+
"ms-python.python",
19+
"ms-python.vscode-pylance",
20+
"tamasfe.even-better-toml"
21+
]
22+
}
23+
},
24+
25+
"forwardPorts": [7860],
26+
27+
// Install project in editable mode after the container is built
28+
"postCreateCommand": "rm -rf .venv && uv venv && uv pip install -e . && echo '. $PWD/.venv/bin/activate' >> /home/vscode/.bashrc"
29+
}

.env.dist

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
1-
OPENAI_API_KEY=sk-xxxx
2-
GITHUB_TOKEN=ghp_xxxx
3-
# Optional model overrides (defaults work):
4-
OPENAI_MODEL=gpt-4o
5-
6-
# Software catalog
7-
SOFTWARE_CATALOG=path/to/your/catalog.jsonl
8-
9-
# Pipeline configuration
10-
TOP_K=8 # Number of candidates to retrieve
11-
NUM_CHOICES=3 # Number of tools to recommend
12-
USE_AGENT=1 # Use pydantic-ai agent (1) or standard pipeline (0)
13-
14-
# Logging configuration
15-
LOGLEVEL_CONSOLE=WARNING
16-
LOGLEVEL_FILE=INFO
17-
FILE_LOG=1
18-
LOG_DIR=logs
1+
OPENAI_API_KEY=sk-xxxx
2+
GITHUB_TOKEN=ghp_xxxx
3+
# Optional model overrides (defaults work):
4+
OPENAI_MODEL=gpt-4o
5+
6+
# Software catalog
7+
SOFTWARE_CATALOG=path/to/your/catalog.jsonl
8+
9+
# Pipeline configuration
10+
TOP_K=8 # Number of candidates to retrieve
11+
NUM_CHOICES=3 # Number of tools to recommend
12+
USE_AGENT=1 # Use pydantic-ai agent (1) or standard pipeline (0)
13+
14+
# Logging configuration
15+
LOGLEVEL_CONSOLE=WARNING
16+
LOGLEVEL_FILE=INFO
17+
FILE_LOG=1
18+
LOG_DIR=logs
1919
LOG_PROMPTS=0 # write selector prompt snapshots

.github/copilot-instructions.md

Lines changed: 129 additions & 129 deletions
Original file line numberDiff line numberDiff line change
@@ -1,130 +1,130 @@
1-
# AI Agent — Copilot Instructions
2-
3-
This is a **RAG + VLM imaging tool recommender** that helps users find the right imaging software for their images and tasks. Users drop an image, describe their task, and get ranked software recommendations with demo links.
4-
5-
## Architecture Overview
6-
7-
The system follows a two-stage pipeline:
8-
9-
1. **Retrieval Stage** (`retriever/`, `api/pipeline.py`): Fast text search using BGE-M3 embeddings + CrossEncoder reranker. No LLM calls. Returns top-K candidates.
10-
11-
2. **Selection Stage** (`generator/`): Single VLM call (OpenAI GPT-4o/mini) that sees the image + candidates + metadata and returns ranked recommendations with accuracy scores.
12-
13-
### Key Components
14-
15-
- **`api/pipeline.RAGImagingPipeline`**: Main orchestrator. Handles file validation, metadata extraction, retrieval, and VLM selection.
16-
- **`retriever/embedders.py`**: FAISS vector index with BGE-M3 + CrossEncoder reranker for candidate retrieval.
17-
- **`generator/generator.VLMToolSelector`**: Vision-language model that selects best tools from candidates.
18-
- **`utils/image_meta.py`**: Robust metadata extraction for DICOM, NIfTI, TIFF stacks with medical imaging focus.
19-
- **`utils/tags.py`**: Control tags system for query refinement (`[EXCLUDE:tool1|tool2]`, `[NO_RERANK]`, `[REFINE]`).
20-
21-
## Data Flow Patterns
22-
23-
### Input Processing
24-
- Files validated via `utils/file_validator.py` (size limits, format checks)
25-
- Images converted to PNG previews for VLM via `utils/previews.py`
26-
- Metadata extracted preserving original format info (critical for format compatibility matching)
27-
- Format tokens added to retrieval query (e.g. `format:DICOM format:NIfTI`)
28-
29-
### Retrieval Query Construction
30-
```python
31-
# Clean task text + format tokens from uploaded files
32-
query = f"{clean_task} format:{ext_tokens}" # e.g. "segment lungs format:DICOM"
33-
```
34-
35-
### VLM Selection Input
36-
The VLM receives:
37-
- **Text**: User task + candidate table + original file metadata
38-
- **Image**: PNG preview (converted from any format)
39-
- **Metadata**: Original extension, dimensions, file info (crucial for IO compatibility)
40-
41-
## Critical Patterns
42-
43-
### Error Handling
44-
- **Graceful degradation**: If image conversion fails, continue text-only
45-
- **Robust metadata**: All metadata extraction wrapped in try/catch with sensible defaults
46-
- **File validation**: Early validation prevents downstream errors
47-
48-
### Control Tags System
49-
Users can control behavior via tags in their queries:
50-
- `[EXCLUDE:toolname1|toolname2]` - Exclude specific tools from results
51-
- `[NO_RERANK]` - Skip CrossEncoder reranker (faster, less accurate)
52-
- `[REFINE]` - Force clarification turn for alternatives
53-
54-
### Conversation Flow
55-
- **Complete**: Normal success with tool recommendations
56-
- **Needs Clarification**: VLM asks followup questions when task is ambiguous
57-
- **Terminal No-Tool**: No suitable tools found with explanation
58-
59-
## Development Workflows
60-
61-
### Running the App
62-
```bash
63-
# Install with pip using pyproject.toml
64-
pip install -e ".[dev]"
65-
66-
# Configure .env with OPENAI_API_KEY and SOFTWARE_CATALOG path
67-
ai_agent ui # Launches Gradio on port 7860
68-
```
69-
70-
### Testing
71-
- **`tests/full_test.py`**: End-to-end pipeline tests driven by `tests/data/test_data.json`
72-
- Uses test doubles for VLM calls to avoid API costs
73-
- Run with: `pytest tests/`
74-
75-
### Change Documentation
76-
- **`CHANGELOG.md`**: Follow [Keep a Changelog](https://keepachangelog.com/) format
77-
- Use semantic versioning with sections: Added, Changed, Deprecated, Removed, Fixed, Security
78-
- Update CHANGELOG.md for ALL user-facing changes before merging PRs
79-
- Format: `### Added\n- New feature description` under version heading
80-
- Version entries: `## [x.y.z] - YYYY-MM-DD`
81-
82-
### Environment Management
83-
- **uv**: Fast Python package manager used in `tools/image/Dockerfile`
84-
- Creates isolated `.venv` environments for reproducible builds
85-
- Dockerfile uses `uv venv && uv pip install -e .` pattern for container builds
86-
87-
### Logging & Debugging
88-
- Set `LOG_PROMPTS=1` to save VLM prompts + images to `logs/`
89-
- File logs in `logs/app_YYYYMMDD.log` with structured JSON events
90-
- Console/file log levels configurable via `.env`
91-
92-
## Project Conventions
93-
94-
### Schema Patterns
95-
- **Pydantic models** in `generator/schema.py` with robust field validation and aliasing for catalog compatibility
96-
- **Enum-based** conversation states and tool reasons for type safety
97-
- **Field normalization**: Dimensions (2D/3D/4D), modalities (CT/MRI/XR), file formats via validators
98-
99-
### Catalog Integration
100-
- Software catalog in JSONL format following schema.org SoftwareSourceCode structure
101-
- **Runnable examples**: Links to HuggingFace Spaces, notebooks, web demos
102-
- **Supporting data**: Format compatibility info used for matching
103-
104-
### Module Boundaries
105-
- `api/`: Pipeline orchestration, no UI dependencies
106-
- `generator/`: Pure VLM logic, no retrieval dependencies
107-
- `retriever/`: Pure vector search, no generation dependencies
108-
- `utils/`: Shared utilities, no business logic
109-
- `ui/`: Gradio interface only
110-
111-
### Configuration
112-
- Environment-based config via `.env` (API keys, model names, catalog paths)
113-
- Sensible defaults for all settings
114-
- No hardcoded paths or credentials
115-
116-
## Medical Imaging Context
117-
118-
This tool specializes in medical/scientific imaging:
119-
- **Modalities**: CT, MRI, X-ray, Ultrasound, PET, SPECT, Microscopy
120-
- **Formats**: DICOM, NIfTI, TIFF stacks, standard images
121-
- **Dimensions**: 2D images, 3D volumes, 4D timeseries
122-
- **Tasks**: Segmentation, registration, analysis, visualization
123-
124-
The VLM selection considers format compatibility as a primary factor - tools supporting the user's input format are strongly preferred.
125-
126-
## Security Notes
127-
- Only makes external calls to OpenAI VLM API (with user image preview)
128-
- Never uploads user data to third-party tool demos
129-
- Returns links only; user chooses whether to visit demos
1+
# AI Agent — Copilot Instructions
2+
3+
This is a **RAG + VLM imaging tool recommender** that helps users find the right imaging software for their images and tasks. Users drop an image, describe their task, and get ranked software recommendations with demo links.
4+
5+
## Architecture Overview
6+
7+
The system follows a two-stage pipeline:
8+
9+
1. **Retrieval Stage** (`retriever/`, `api/pipeline.py`): Fast text search using BGE-M3 embeddings + CrossEncoder reranker. No LLM calls. Returns top-K candidates.
10+
11+
2. **Selection Stage** (`generator/`): Single VLM call (OpenAI GPT-4o/mini) that sees the image + candidates + metadata and returns ranked recommendations with accuracy scores.
12+
13+
### Key Components
14+
15+
- **`api/pipeline.RAGImagingPipeline`**: Main orchestrator. Handles file validation, metadata extraction, retrieval, and VLM selection.
16+
- **`retriever/embedders.py`**: FAISS vector index with BGE-M3 + CrossEncoder reranker for candidate retrieval.
17+
- **`generator/generator.VLMToolSelector`**: Vision-language model that selects best tools from candidates.
18+
- **`utils/image_meta.py`**: Robust metadata extraction for DICOM, NIfTI, TIFF stacks with medical imaging focus.
19+
- **`utils/tags.py`**: Control tags system for query refinement (`[EXCLUDE:tool1|tool2]`, `[NO_RERANK]`, `[REFINE]`).
20+
21+
## Data Flow Patterns
22+
23+
### Input Processing
24+
- Files validated via `utils/file_validator.py` (size limits, format checks)
25+
- Images converted to PNG previews for VLM via `utils/previews.py`
26+
- Metadata extracted preserving original format info (critical for format compatibility matching)
27+
- Format tokens added to retrieval query (e.g. `format:DICOM format:NIfTI`)
28+
29+
### Retrieval Query Construction
30+
```python
31+
# Clean task text + format tokens from uploaded files
32+
query = f"{clean_task} format:{ext_tokens}" # e.g. "segment lungs format:DICOM"
33+
```
34+
35+
### VLM Selection Input
36+
The VLM receives:
37+
- **Text**: User task + candidate table + original file metadata
38+
- **Image**: PNG preview (converted from any format)
39+
- **Metadata**: Original extension, dimensions, file info (crucial for IO compatibility)
40+
41+
## Critical Patterns
42+
43+
### Error Handling
44+
- **Graceful degradation**: If image conversion fails, continue text-only
45+
- **Robust metadata**: All metadata extraction wrapped in try/catch with sensible defaults
46+
- **File validation**: Early validation prevents downstream errors
47+
48+
### Control Tags System
49+
Users can control behavior via tags in their queries:
50+
- `[EXCLUDE:toolname1|toolname2]` - Exclude specific tools from results
51+
- `[NO_RERANK]` - Skip CrossEncoder reranker (faster, less accurate)
52+
- `[REFINE]` - Force clarification turn for alternatives
53+
54+
### Conversation Flow
55+
- **Complete**: Normal success with tool recommendations
56+
- **Needs Clarification**: VLM asks followup questions when task is ambiguous
57+
- **Terminal No-Tool**: No suitable tools found with explanation
58+
59+
## Development Workflows
60+
61+
### Running the App
62+
```bash
63+
# Install with pip using pyproject.toml
64+
pip install -e ".[dev]"
65+
66+
# Configure .env with OPENAI_API_KEY and SOFTWARE_CATALOG path
67+
ai_agent ui # Launches Gradio on port 7860
68+
```
69+
70+
### Testing
71+
- **`tests/full_test.py`**: End-to-end pipeline tests driven by `tests/data/test_data.json`
72+
- Uses test doubles for VLM calls to avoid API costs
73+
- Run with: `pytest tests/`
74+
75+
### Change Documentation
76+
- **`CHANGELOG.md`**: Follow [Keep a Changelog](https://keepachangelog.com/) format
77+
- Use semantic versioning with sections: Added, Changed, Deprecated, Removed, Fixed, Security
78+
- Update CHANGELOG.md for ALL user-facing changes before merging PRs
79+
- Format: `### Added\n- New feature description` under version heading
80+
- Version entries: `## [x.y.z] - YYYY-MM-DD`
81+
82+
### Environment Management
83+
- **uv**: Fast Python package manager used in `tools/image/Dockerfile`
84+
- Creates isolated `.venv` environments for reproducible builds
85+
- Dockerfile uses `uv venv && uv pip install -e .` pattern for container builds
86+
87+
### Logging & Debugging
88+
- Set `LOG_PROMPTS=1` to save VLM prompts + images to `logs/`
89+
- File logs in `logs/app_YYYYMMDD.log` with structured JSON events
90+
- Console/file log levels configurable via `.env`
91+
92+
## Project Conventions
93+
94+
### Schema Patterns
95+
- **Pydantic models** in `generator/schema.py` with robust field validation and aliasing for catalog compatibility
96+
- **Enum-based** conversation states and tool reasons for type safety
97+
- **Field normalization**: Dimensions (2D/3D/4D), modalities (CT/MRI/XR), file formats via validators
98+
99+
### Catalog Integration
100+
- Software catalog in JSONL format following schema.org SoftwareSourceCode structure
101+
- **Runnable examples**: Links to HuggingFace Spaces, notebooks, web demos
102+
- **Supporting data**: Format compatibility info used for matching
103+
104+
### Module Boundaries
105+
- `api/`: Pipeline orchestration, no UI dependencies
106+
- `generator/`: Pure VLM logic, no retrieval dependencies
107+
- `retriever/`: Pure vector search, no generation dependencies
108+
- `utils/`: Shared utilities, no business logic
109+
- `ui/`: Gradio interface only
110+
111+
### Configuration
112+
- Environment-based config via `.env` (API keys, model names, catalog paths)
113+
- Sensible defaults for all settings
114+
- No hardcoded paths or credentials
115+
116+
## Medical Imaging Context
117+
118+
This tool specializes in medical/scientific imaging:
119+
- **Modalities**: CT, MRI, X-ray, Ultrasound, PET, SPECT, Microscopy
120+
- **Formats**: DICOM, NIfTI, TIFF stacks, standard images
121+
- **Dimensions**: 2D images, 3D volumes, 4D timeseries
122+
- **Tasks**: Segmentation, registration, analysis, visualization
123+
124+
The VLM selection considers format compatibility as a primary factor - tools supporting the user's input format are strongly preferred.
125+
126+
## Security Notes
127+
- Only makes external calls to OpenAI VLM API (with user image preview)
128+
- Never uploads user data to third-party tool demos
129+
- Returns links only; user chooses whether to visit demos
130130
- Prompt logging is optional and local-only

0 commit comments

Comments
 (0)