|
1 | | -# AI Agent — Copilot Instructions |
2 | | - |
3 | | -This is a **RAG + VLM imaging tool recommender** that helps users find the right imaging software for their images and tasks. Users drop an image, describe their task, and get ranked software recommendations with demo links. |
4 | | - |
5 | | -## Architecture Overview |
6 | | - |
7 | | -The system follows a two-stage pipeline: |
8 | | - |
9 | | -1. **Retrieval Stage** (`retriever/`, `api/pipeline.py`): Fast text search using BGE-M3 embeddings + CrossEncoder reranker. No LLM calls. Returns top-K candidates. |
10 | | - |
11 | | -2. **Selection Stage** (`generator/`): Single VLM call (OpenAI GPT-4o/mini) that sees the image + candidates + metadata and returns ranked recommendations with accuracy scores. |
12 | | - |
13 | | -### Key Components |
14 | | - |
15 | | -- **`api/pipeline.RAGImagingPipeline`**: Main orchestrator. Handles file validation, metadata extraction, retrieval, and VLM selection. |
16 | | -- **`retriever/embedders.py`**: FAISS vector index with BGE-M3 + CrossEncoder reranker for candidate retrieval. |
17 | | -- **`generator/generator.VLMToolSelector`**: Vision-language model that selects best tools from candidates. |
18 | | -- **`utils/image_meta.py`**: Robust metadata extraction for DICOM, NIfTI, TIFF stacks with medical imaging focus. |
19 | | -- **`utils/tags.py`**: Control tags system for query refinement (`[EXCLUDE:tool1|tool2]`, `[NO_RERANK]`, `[REFINE]`). |
20 | | - |
21 | | -## Data Flow Patterns |
22 | | - |
23 | | -### Input Processing |
24 | | -- Files validated via `utils/file_validator.py` (size limits, format checks) |
25 | | -- Images converted to PNG previews for VLM via `utils/previews.py` |
26 | | -- Metadata extracted preserving original format info (critical for format compatibility matching) |
27 | | -- Format tokens added to retrieval query (e.g. `format:DICOM format:NIfTI`) |
28 | | - |
29 | | -### Retrieval Query Construction |
30 | | -```python |
31 | | -# Clean task text + format tokens from uploaded files |
32 | | -query = f"{clean_task} format:{ext_tokens}" # e.g. "segment lungs format:DICOM" |
33 | | -``` |
34 | | - |
35 | | -### VLM Selection Input |
36 | | -The VLM receives: |
37 | | -- **Text**: User task + candidate table + original file metadata |
38 | | -- **Image**: PNG preview (converted from any format) |
39 | | -- **Metadata**: Original extension, dimensions, file info (crucial for IO compatibility) |
40 | | - |
41 | | -## Critical Patterns |
42 | | - |
43 | | -### Error Handling |
44 | | -- **Graceful degradation**: If image conversion fails, continue text-only |
45 | | -- **Robust metadata**: All metadata extraction wrapped in try/catch with sensible defaults |
46 | | -- **File validation**: Early validation prevents downstream errors |
47 | | - |
48 | | -### Control Tags System |
49 | | -Users can control behavior via tags in their queries: |
50 | | -- `[EXCLUDE:toolname1|toolname2]` - Exclude specific tools from results |
51 | | -- `[NO_RERANK]` - Skip CrossEncoder reranker (faster, less accurate) |
52 | | -- `[REFINE]` - Force clarification turn for alternatives |
53 | | - |
54 | | -### Conversation Flow |
55 | | -- **Complete**: Normal success with tool recommendations |
56 | | -- **Needs Clarification**: VLM asks followup questions when task is ambiguous |
57 | | -- **Terminal No-Tool**: No suitable tools found with explanation |
58 | | - |
59 | | -## Development Workflows |
60 | | - |
61 | | -### Running the App |
62 | | -```bash |
63 | | -# Install with pip using pyproject.toml |
64 | | -pip install -e ".[dev]" |
65 | | - |
66 | | -# Configure .env with OPENAI_API_KEY and SOFTWARE_CATALOG path |
67 | | -ai_agent ui # Launches Gradio on port 7860 |
68 | | -``` |
69 | | - |
70 | | -### Testing |
71 | | -- **`tests/full_test.py`**: End-to-end pipeline tests driven by `tests/data/test_data.json` |
72 | | -- Uses test doubles for VLM calls to avoid API costs |
73 | | -- Run with: `pytest tests/` |
74 | | - |
75 | | -### Change Documentation |
76 | | -- **`CHANGELOG.md`**: Follow [Keep a Changelog](https://keepachangelog.com/) format |
77 | | -- Use semantic versioning with sections: Added, Changed, Deprecated, Removed, Fixed, Security |
78 | | -- Update CHANGELOG.md for ALL user-facing changes before merging PRs |
79 | | -- Format: `### Added\n- New feature description` under version heading |
80 | | -- Version entries: `## [x.y.z] - YYYY-MM-DD` |
81 | | - |
82 | | -### Environment Management |
83 | | -- **uv**: Fast Python package manager used in `tools/image/Dockerfile` |
84 | | -- Creates isolated `.venv` environments for reproducible builds |
85 | | -- Dockerfile uses `uv venv && uv pip install -e .` pattern for container builds |
86 | | - |
87 | | -### Logging & Debugging |
88 | | -- Set `LOG_PROMPTS=1` to save VLM prompts + images to `logs/` |
89 | | -- File logs in `logs/app_YYYYMMDD.log` with structured JSON events |
90 | | -- Console/file log levels configurable via `.env` |
91 | | - |
92 | | -## Project Conventions |
93 | | - |
94 | | -### Schema Patterns |
95 | | -- **Pydantic models** in `generator/schema.py` with robust field validation and aliasing for catalog compatibility |
96 | | -- **Enum-based** conversation states and tool reasons for type safety |
97 | | -- **Field normalization**: Dimensions (2D/3D/4D), modalities (CT/MRI/XR), file formats via validators |
98 | | - |
99 | | -### Catalog Integration |
100 | | -- Software catalog in JSONL format following schema.org SoftwareSourceCode structure |
101 | | -- **Runnable examples**: Links to HuggingFace Spaces, notebooks, web demos |
102 | | -- **Supporting data**: Format compatibility info used for matching |
103 | | - |
104 | | -### Module Boundaries |
105 | | -- `api/`: Pipeline orchestration, no UI dependencies |
106 | | -- `generator/`: Pure VLM logic, no retrieval dependencies |
107 | | -- `retriever/`: Pure vector search, no generation dependencies |
108 | | -- `utils/`: Shared utilities, no business logic |
109 | | -- `ui/`: Gradio interface only |
110 | | - |
111 | | -### Configuration |
112 | | -- Environment-based config via `.env` (API keys, model names, catalog paths) |
113 | | -- Sensible defaults for all settings |
114 | | -- No hardcoded paths or credentials |
115 | | - |
116 | | -## Medical Imaging Context |
117 | | - |
118 | | -This tool specializes in medical/scientific imaging: |
119 | | -- **Modalities**: CT, MRI, X-ray, Ultrasound, PET, SPECT, Microscopy |
120 | | -- **Formats**: DICOM, NIfTI, TIFF stacks, standard images |
121 | | -- **Dimensions**: 2D images, 3D volumes, 4D timeseries |
122 | | -- **Tasks**: Segmentation, registration, analysis, visualization |
123 | | - |
124 | | -The VLM selection considers format compatibility as a primary factor - tools supporting the user's input format are strongly preferred. |
125 | | - |
126 | | -## Security Notes |
127 | | -- Only makes external calls to OpenAI VLM API (with user image preview) |
128 | | -- Never uploads user data to third-party tool demos |
129 | | -- Returns links only; user chooses whether to visit demos |
| 1 | +# AI Agent — Copilot Instructions |
| 2 | + |
| 3 | +This is a **RAG + VLM imaging tool recommender** that helps users find the right imaging software for their images and tasks. Users drop an image, describe their task, and get ranked software recommendations with demo links. |
| 4 | + |
| 5 | +## Architecture Overview |
| 6 | + |
| 7 | +The system follows a two-stage pipeline: |
| 8 | + |
| 9 | +1. **Retrieval Stage** (`retriever/`, `api/pipeline.py`): Fast text search using BGE-M3 embeddings + CrossEncoder reranker. No LLM calls. Returns top-K candidates. |
| 10 | + |
| 11 | +2. **Selection Stage** (`generator/`): Single VLM call (OpenAI GPT-4o/mini) that sees the image + candidates + metadata and returns ranked recommendations with accuracy scores. |
| 12 | + |
| 13 | +### Key Components |
| 14 | + |
| 15 | +- **`api/pipeline.RAGImagingPipeline`**: Main orchestrator. Handles file validation, metadata extraction, retrieval, and VLM selection. |
| 16 | +- **`retriever/embedders.py`**: FAISS vector index with BGE-M3 + CrossEncoder reranker for candidate retrieval. |
| 17 | +- **`generator/generator.VLMToolSelector`**: Vision-language model that selects best tools from candidates. |
| 18 | +- **`utils/image_meta.py`**: Robust metadata extraction for DICOM, NIfTI, TIFF stacks with medical imaging focus. |
| 19 | +- **`utils/tags.py`**: Control tags system for query refinement (`[EXCLUDE:tool1|tool2]`, `[NO_RERANK]`, `[REFINE]`). |
| 20 | + |
| 21 | +## Data Flow Patterns |
| 22 | + |
| 23 | +### Input Processing |
| 24 | +- Files validated via `utils/file_validator.py` (size limits, format checks) |
| 25 | +- Images converted to PNG previews for VLM via `utils/previews.py` |
| 26 | +- Metadata extracted preserving original format info (critical for format compatibility matching) |
| 27 | +- Format tokens added to retrieval query (e.g. `format:DICOM format:NIfTI`) |
| 28 | + |
| 29 | +### Retrieval Query Construction |
| 30 | +```python |
| 31 | +# Clean task text + format tokens from uploaded files |
| 32 | +query = f"{clean_task} format:{ext_tokens}" # e.g. "segment lungs format:DICOM" |
| 33 | +``` |
| 34 | + |
| 35 | +### VLM Selection Input |
| 36 | +The VLM receives: |
| 37 | +- **Text**: User task + candidate table + original file metadata |
| 38 | +- **Image**: PNG preview (converted from any format) |
| 39 | +- **Metadata**: Original extension, dimensions, file info (crucial for IO compatibility) |
| 40 | + |
| 41 | +## Critical Patterns |
| 42 | + |
| 43 | +### Error Handling |
| 44 | +- **Graceful degradation**: If image conversion fails, continue text-only |
| 45 | +- **Robust metadata**: All metadata extraction wrapped in try/catch with sensible defaults |
| 46 | +- **File validation**: Early validation prevents downstream errors |
| 47 | + |
| 48 | +### Control Tags System |
| 49 | +Users can control behavior via tags in their queries: |
| 50 | +- `[EXCLUDE:toolname1|toolname2]` - Exclude specific tools from results |
| 51 | +- `[NO_RERANK]` - Skip CrossEncoder reranker (faster, less accurate) |
| 52 | +- `[REFINE]` - Force clarification turn for alternatives |
| 53 | + |
| 54 | +### Conversation Flow |
| 55 | +- **Complete**: Normal success with tool recommendations |
| 56 | +- **Needs Clarification**: VLM asks followup questions when task is ambiguous |
| 57 | +- **Terminal No-Tool**: No suitable tools found with explanation |
| 58 | + |
| 59 | +## Development Workflows |
| 60 | + |
| 61 | +### Running the App |
| 62 | +```bash |
| 63 | +# Install with pip using pyproject.toml |
| 64 | +pip install -e ".[dev]" |
| 65 | + |
| 66 | +# Configure .env with OPENAI_API_KEY and SOFTWARE_CATALOG path |
| 67 | +ai_agent ui # Launches Gradio on port 7860 |
| 68 | +``` |
| 69 | + |
| 70 | +### Testing |
| 71 | +- **`tests/full_test.py`**: End-to-end pipeline tests driven by `tests/data/test_data.json` |
| 72 | +- Uses test doubles for VLM calls to avoid API costs |
| 73 | +- Run with: `pytest tests/` |
| 74 | + |
| 75 | +### Change Documentation |
| 76 | +- **`CHANGELOG.md`**: Follow [Keep a Changelog](https://keepachangelog.com/) format |
| 77 | +- Use semantic versioning with sections: Added, Changed, Deprecated, Removed, Fixed, Security |
| 78 | +- Update CHANGELOG.md for ALL user-facing changes before merging PRs |
| 79 | +- Format: `### Added\n- New feature description` under version heading |
| 80 | +- Version entries: `## [x.y.z] - YYYY-MM-DD` |
| 81 | + |
| 82 | +### Environment Management |
| 83 | +- **uv**: Fast Python package manager used in `tools/image/Dockerfile` |
| 84 | +- Creates isolated `.venv` environments for reproducible builds |
| 85 | +- Dockerfile uses `uv venv && uv pip install -e .` pattern for container builds |
| 86 | + |
| 87 | +### Logging & Debugging |
| 88 | +- Set `LOG_PROMPTS=1` to save VLM prompts + images to `logs/` |
| 89 | +- File logs in `logs/app_YYYYMMDD.log` with structured JSON events |
| 90 | +- Console/file log levels configurable via `.env` |
| 91 | + |
| 92 | +## Project Conventions |
| 93 | + |
| 94 | +### Schema Patterns |
| 95 | +- **Pydantic models** in `generator/schema.py` with robust field validation and aliasing for catalog compatibility |
| 96 | +- **Enum-based** conversation states and tool reasons for type safety |
| 97 | +- **Field normalization**: Dimensions (2D/3D/4D), modalities (CT/MRI/XR), file formats via validators |
| 98 | + |
| 99 | +### Catalog Integration |
| 100 | +- Software catalog in JSONL format following schema.org SoftwareSourceCode structure |
| 101 | +- **Runnable examples**: Links to HuggingFace Spaces, notebooks, web demos |
| 102 | +- **Supporting data**: Format compatibility info used for matching |
| 103 | + |
| 104 | +### Module Boundaries |
| 105 | +- `api/`: Pipeline orchestration, no UI dependencies |
| 106 | +- `generator/`: Pure VLM logic, no retrieval dependencies |
| 107 | +- `retriever/`: Pure vector search, no generation dependencies |
| 108 | +- `utils/`: Shared utilities, no business logic |
| 109 | +- `ui/`: Gradio interface only |
| 110 | + |
| 111 | +### Configuration |
| 112 | +- Environment-based config via `.env` (API keys, model names, catalog paths) |
| 113 | +- Sensible defaults for all settings |
| 114 | +- No hardcoded paths or credentials |
| 115 | + |
| 116 | +## Medical Imaging Context |
| 117 | + |
| 118 | +This tool specializes in medical/scientific imaging: |
| 119 | +- **Modalities**: CT, MRI, X-ray, Ultrasound, PET, SPECT, Microscopy |
| 120 | +- **Formats**: DICOM, NIfTI, TIFF stacks, standard images |
| 121 | +- **Dimensions**: 2D images, 3D volumes, 4D timeseries |
| 122 | +- **Tasks**: Segmentation, registration, analysis, visualization |
| 123 | + |
| 124 | +The VLM selection considers format compatibility as a primary factor - tools supporting the user's input format are strongly preferred. |
| 125 | + |
| 126 | +## Security Notes |
| 127 | +- Only makes external calls to OpenAI VLM API (with user image preview) |
| 128 | +- Never uploads user data to third-party tool demos |
| 129 | +- Returns links only; user chooses whether to visit demos |
130 | 130 | - Prompt logging is optional and local-only |
0 commit comments