|
| 1 | +# AI Agent — Copilot Instructions |
| 2 | + |
| 3 | +This is a **RAG + VLM imaging tool recommender** that helps users find the right imaging software for their images and tasks. Users drop an image, describe their task, and get ranked software recommendations with demo links. |
| 4 | + |
| 5 | +## Architecture Overview |
| 6 | + |
| 7 | +The system follows a two-stage pipeline: |
| 8 | + |
| 9 | +1. **Retrieval Stage** (`retriever/`, `api/pipeline.py`): Fast text search using BGE-M3 embeddings + CrossEncoder reranker. No LLM calls. Returns top-K candidates. |
| 10 | + |
| 11 | +2. **Selection Stage** (`generator/`): Single VLM call (OpenAI GPT-4o/mini) that sees the image + candidates + metadata and returns ranked recommendations with accuracy scores. |
| 12 | + |
| 13 | +### Key Components |
| 14 | + |
| 15 | +- **`api/pipeline.RAGImagingPipeline`**: Main orchestrator. Handles file validation, metadata extraction, retrieval, and VLM selection. |
| 16 | +- **`retriever/embedders.py`**: FAISS vector index with BGE-M3 + CrossEncoder reranker for candidate retrieval. |
| 17 | +- **`generator/generator.VLMToolSelector`**: Vision-language model that selects best tools from candidates. |
| 18 | +- **`utils/image_meta.py`**: Robust metadata extraction for DICOM, NIfTI, TIFF stacks with medical imaging focus. |
| 19 | +- **`utils/tags.py`**: Control tags system for query refinement (`[EXCLUDE:tool1|tool2]`, `[NO_RERANK]`, `[REFINE]`). |
| 20 | + |
| 21 | +## Data Flow Patterns |
| 22 | + |
| 23 | +### Input Processing |
| 24 | +- Files validated via `utils/file_validator.py` (size limits, format checks) |
| 25 | +- Images converted to PNG previews for VLM via `utils/previews.py` |
| 26 | +- Metadata extracted preserving original format info (critical for format compatibility matching) |
| 27 | +- Format tokens added to retrieval query (e.g. `format:DICOM format:NIfTI`) |
| 28 | + |
| 29 | +### Retrieval Query Construction |
| 30 | +```python |
| 31 | +# Clean task text + format tokens from uploaded files |
| 32 | +query = f"{clean_task} format:{ext_tokens}" # e.g. "segment lungs format:DICOM" |
| 33 | +``` |
| 34 | + |
| 35 | +### VLM Selection Input |
| 36 | +The VLM receives: |
| 37 | +- **Text**: User task + candidate table + original file metadata |
| 38 | +- **Image**: PNG preview (converted from any format) |
| 39 | +- **Metadata**: Original extension, dimensions, file info (crucial for IO compatibility) |
| 40 | + |
| 41 | +## Critical Patterns |
| 42 | + |
| 43 | +### Error Handling |
| 44 | +- **Graceful degradation**: If image conversion fails, continue text-only |
| 45 | +- **Robust metadata**: All metadata extraction wrapped in try/catch with sensible defaults |
| 46 | +- **File validation**: Early validation prevents downstream errors |
| 47 | + |
| 48 | +### Control Tags System |
| 49 | +Users can control behavior via tags in their queries: |
| 50 | +- `[EXCLUDE:toolname1|toolname2]` - Exclude specific tools from results |
| 51 | +- `[NO_RERANK]` - Skip CrossEncoder reranker (faster, less accurate) |
| 52 | +- `[REFINE]` - Force clarification turn for alternatives |
| 53 | + |
| 54 | +### Conversation Flow |
| 55 | +- **Complete**: Normal success with tool recommendations |
| 56 | +- **Needs Clarification**: VLM asks followup questions when task is ambiguous |
| 57 | +- **Terminal No-Tool**: No suitable tools found with explanation |
| 58 | + |
| 59 | +## Development Workflows |
| 60 | + |
| 61 | +### Running the App |
| 62 | +```bash |
| 63 | +# Install with pip using pyproject.toml |
| 64 | +pip install -e ".[dev]" |
| 65 | + |
| 66 | +# Configure .env with OPENAI_API_KEY and SOFTWARE_CATALOG path |
| 67 | +ai_agent ui # Launches Gradio on port 7860 |
| 68 | +``` |
| 69 | + |
| 70 | +### Testing |
| 71 | +- **`tests/full_test.py`**: End-to-end pipeline tests driven by `tests/data/test_data.json` |
| 72 | +- Uses test doubles for VLM calls to avoid API costs |
| 73 | +- Run with: `pytest tests/` |
| 74 | + |
| 75 | +### Change Documentation |
| 76 | +- **`CHANGELOG.md`**: Follow [Keep a Changelog](https://keepachangelog.com/) format |
| 77 | +- Use semantic versioning with sections: Added, Changed, Deprecated, Removed, Fixed, Security |
| 78 | +- Update CHANGELOG.md for ALL user-facing changes before merging PRs |
| 79 | +- Format: `### Added\n- New feature description` under version heading |
| 80 | +- Version entries: `## [x.y.z] - YYYY-MM-DD` |
| 81 | + |
| 82 | +### Environment Management |
| 83 | +- **uv**: Fast Python package manager used in `tools/image/Dockerfile` |
| 84 | +- Creates isolated `.venv` environments for reproducible builds |
| 85 | +- Dockerfile uses `uv venv && uv pip install -e .` pattern for container builds |
| 86 | + |
| 87 | +### Logging & Debugging |
| 88 | +- Set `LOG_PROMPTS=1` to save VLM prompts + images to `logs/` |
| 89 | +- File logs in `logs/app_YYYYMMDD.log` with structured JSON events |
| 90 | +- Console/file log levels configurable via `.env` |
| 91 | + |
| 92 | +## Project Conventions |
| 93 | + |
| 94 | +### Schema Patterns |
| 95 | +- **Pydantic models** in `generator/schema.py` with robust field validation and aliasing for catalog compatibility |
| 96 | +- **Enum-based** conversation states and tool reasons for type safety |
| 97 | +- **Field normalization**: Dimensions (2D/3D/4D), modalities (CT/MRI/XR), file formats via validators |
| 98 | + |
| 99 | +### Catalog Integration |
| 100 | +- Software catalog in JSONL format following schema.org SoftwareSourceCode structure |
| 101 | +- **Runnable examples**: Links to HuggingFace Spaces, notebooks, web demos |
| 102 | +- **Supporting data**: Format compatibility info used for matching |
| 103 | + |
| 104 | +### Module Boundaries |
| 105 | +- `api/`: Pipeline orchestration, no UI dependencies |
| 106 | +- `generator/`: Pure VLM logic, no retrieval dependencies |
| 107 | +- `retriever/`: Pure vector search, no generation dependencies |
| 108 | +- `utils/`: Shared utilities, no business logic |
| 109 | +- `ui/`: Gradio interface only |
| 110 | + |
| 111 | +### Configuration |
| 112 | +- Environment-based config via `.env` (API keys, model names, catalog paths) |
| 113 | +- Sensible defaults for all settings |
| 114 | +- No hardcoded paths or credentials |
| 115 | + |
| 116 | +## Medical Imaging Context |
| 117 | + |
| 118 | +This tool specializes in medical/scientific imaging: |
| 119 | +- **Modalities**: CT, MRI, X-ray, Ultrasound, PET, SPECT, Microscopy |
| 120 | +- **Formats**: DICOM, NIfTI, TIFF stacks, standard images |
| 121 | +- **Dimensions**: 2D images, 3D volumes, 4D timeseries |
| 122 | +- **Tasks**: Segmentation, registration, analysis, visualization |
| 123 | + |
| 124 | +The VLM selection considers format compatibility as a primary factor - tools supporting the user's input format are strongly preferred. |
| 125 | + |
| 126 | +## Security Notes |
| 127 | +- Only makes external calls to OpenAI VLM API (with user image preview) |
| 128 | +- Never uploads user data to third-party tool demos |
| 129 | +- Returns links only; user chooses whether to visit demos |
| 130 | +- Prompt logging is optional and local-only |
0 commit comments