Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .env.dist
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ GITHUB_TOKEN=ghp_xxxx

# Additional API keys for alternative models (if using EPFL or other providers)
EPFL_API_KEY=sk-xxxx
EPFL_API_KEY_EMBEDDER=sk-xxxx

# Software catalog
SOFTWARE_CATALOG=path/to/your/catalog.jsonl
Expand All @@ -12,6 +13,8 @@ SOFTWARE_CATALOG=path/to/your/catalog.jsonl
TOP_K=8 # Number of candidates to retrieve
NUM_CHOICES=3 # Number of tools to recommend
USE_AGENT=1 # Use pydantic-ai agent (1) or standard pipeline (0)
AGENT_OUTPUT_RETRIES=3 # Structured output validation retries
EMBED_CATALOG_ON_START=1 # Pre-embed catalog at startup if FAISS is empty

# Logging configuration
LOGLEVEL_CONSOLE=WARNING
Expand Down
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ All notable changes to this project will be documented in this file.
- **Agent run_example tool**: Removed autonomous tool execution capability from agent. Agent now only recommends tools - all execution requires explicit user approval via approval buttons. This enforces consistent security/UX model where users maintain full control over tool execution. The underlying `gradio_space_tool.py` remains for UI-initiated demo execution.

### Added
- **Startup catalog pre-embedding**: Pipeline now pre-embeds the software catalog at application startup when FAISS is empty, so user requests only require query embedding + FAISS search. Controlled by `EMBED_CATALOG_ON_START` (default `1`).
- **Project and maintainer documentation expansion**:
- Added [AGENTS.md](AGENTS.md) with repository-wide agent workflow guidance, dev-container-first defaults, and documentation maintenance rules.
- Added [docs/guide.md](docs/guide.md) as a detailed contributor map covering module responsibilities, Python/package defaults, command baseline, known inconsistencies, and improvement guidelines.
Expand Down Expand Up @@ -54,8 +55,26 @@ All notable changes to this project will be documented in this file.
- **MCP Tools Subpackage** (`agent/tools/mcp/`): Organized separation of registered imaging tools (MCP protocol) from agent utilities. Base models, registry, and imaging tools (e.g., lungs_segmentation) now in dedicated subpackage for clarity.
- **Base Tool Models** (`agent/tools/mcp/base.py`): Standard Pydantic schemas for tool consistency
- **Tool Registration**: Lungs segmentation tool self-registers with complete field mappings
- **In-process metadata cache** (`utils/image_meta.py`): `summarize_image_metadata` now caches results keyed by `(resolved_path, mtime_ns, size_bytes)`. Eliminates redundant file reads across the three call sites per request. Saves ~290 ms per warm request.
- **Pre-computed metadata forwarded to `run_agent`** (`ui/handlers.py`): The `image_metadata` string computed by `_build_preview_for_vlm` is now forwarded directly to `run_agent`, skipping the second `summarize_image_metadata` call that previously occurred unconditionally inside the agent entry point.
- **Dynamic `Agent` instance caching** (`agent/agent.py`): Runtime `Agent` objects created for custom model/endpoint combinations are stored in a module-level dict keyed by `(model, base_url, api_key_env, num_choices)`. Subsequent requests with the same non-default configuration reuse the cached instance.

### Changed
- **Preview image simplification and size cap**: VLM preview generation no longer writes metadata text overlays onto preview images. Previews are now downscaled with aspect ratio preservation using a configurable maximum side length (`PREVIEW_MAX_SIDE_PX`, default `500`) to keep large images lightweight.
- **Batched-only repository verification in agent loop**: Agent prompt and runtime tool registration now use `repo_info_batch(urls)` as the single repository verification path (including one-item lists for single repos), removing mixed single-vs-batch behavior during recommendation runs.

### Removed
- **Legacy single-repo agent adapter**: Removed the unused `repo_info` agent adapter function to avoid confusion; agent recommendation runs now verify repositories exclusively through `repo_info_batch`.
- **Fast mode controls**: Removed fast mode from runtime, prompts, and UI settings to keep behavior deterministic and avoid dual execution paths.
- **Repo summary performance optimization**: `repo_info` now uses an in-memory TTL cache (configurable via `REPO_INFO_CACHE_TTL_SECONDS`, default 3600s) and in-flight request deduplication for identical repository URLs. This avoids repeated DeepWiki/repocards fetches during iterative agent runs and parallel tool calls.
- **Parallel repository verification tool**: Added `repo_info_batch(urls)` tool to fetch multiple GitHub repository summaries concurrently, reducing end-to-end latency when verifying several finalists.
- **Latency observability**: Agent now logs per-tool durations and a request-level latency summary (`total_ms`, metadata time, model execution time, and aggregated tool timing) to make bottlenecks directly visible in runtime logs.
- **Startup sync freshness skip**: Added optional local-freshness short-circuit in catalog sync to avoid repeated remote SPARQL fetches on quick restarts when local catalog + FAISS artifacts are present. Controlled by `SYNC_SKIP_IF_FRESH_SECONDS` (disabled by default) and `SYNC_FORCE=1` to bypass.
- **Preview generation cache**: Added in-memory TTL cache for generated VLM previews keyed by file fingerprints (path/mtime/size), reducing repeated 3D orthogonal composite generation for identical inputs. Controlled by `PREVIEW_CACHE_TTL_SECONDS` (default 1800) and `PREVIEW_CACHE_MAX_ENTRIES` (default 64).
- **Port fallback pre-selection**: UI launch now pre-checks for the first available port in the fallback range before calling Gradio launch, reducing repeated bind-failure retries when the preferred port is busy.
- **Config-driven retrieval backends**: Added `retrieval.embedder` and `retrieval.reranker` blocks in `config.yaml` so embedder/reranker setup can be configured without `.env`-only wiring. Supports `backend: remote|local` with simple fields (`model_name`, `base_url`, `api_key_env`, `timeout_s`, optional `device`). Environment variables remain as fallback.
Comment thread
qchapp marked this conversation as resolved.
- **Remote embedder integration**: Retrieval embeddings now call the EPFL OpenAI-compatible endpoint by default (`https://inference-rcp.epfl.ch/v1`) using model `Qwen/Qwen3-Embedding-8B` and key from `EPFL_API_KEY_EMBEDDER`.
Comment thread
qchapp marked this conversation as resolved.
- **Remote reranker integration**: Retrieval reranking now calls a remote endpoint instead of loading a local CrossEncoder model. Default settings target EPFL (`https://inference-rcp.epfl.ch/v1`) with model `BAAI/bge-reranker-v2-m3`, using `EPFL_API_KEY_EMBEDDER` for authentication.
- **Documentation cleanup**: Removed stale references to `[NO_RERANK]` and `[REFINE]` control-tag behavior from user/architecture docs, and updated structure/instruction docs to current agent layout (`agent/tools/`, `agent/utils.AgentState`), active CLI usage (`ai_agent chat`), and current testing guidance.
- CLI now supports `ai_agent chat`
- **DeepWiki MCP integration**: Repository info tool now uses DeepWiki MCP server (https://mcp.deepwiki.com/sse) as primary source for GitHub repository documentation. DeepWiki provides fast, pre-indexed documentation access without API rate limits.
Expand Down Expand Up @@ -101,6 +120,10 @@ All notable changes to this project will be documented in this file.
- CLI no more supports `ai_agent ui` command

### Fixed
- **Startup refresh regression**: Fixed CLI background refresh unpacking after UI function signature changes (`ValueError: too many values to unpack`) and delayed first auto-refresh cycle to avoid duplicate immediate catalog sync right after startup sync.
- **Structured output validation robustness**: Reduced `Exceeded maximum retries ... for output validation` failures by increasing agent output retries for custom endpoint runs and making ToolSelection parsing more tolerant to common formatting drift (status/reason/rank/accuracy coercion).
- **Retrieval query drift guardrails**: Added sanitization for LLM-generated retrieval queries to strip repository-oriented terms (e.g., `github`, `repository`, `official`) and avoid tool-name-only drift. Repeated `search_tools` attempts are now rerouted as alternative searches instead of failing with quota errors.
- **FAISS rebuild on embedder changes**: When the catalog content is unchanged but the embedding model/dimension changes, sync now detects incompatible/missing FAISS artifacts and rebuilds the index instead of keeping stale artifacts. This prevents empty retrieval results after embedder migrations.
- **Pydantic Forward Reference**: Reordered class definitions in `schema.py` so `Conversation` and `ConversationStatus` are defined before `ToolSelection` to prevent "class-not-fully-defined" errors.
- **Conversation Context**: Agent now properly maintains conversation history, enabling natural understanding of follow-up requests like "show me alternatives".
- **Clear Button**: Disabled during processing to prevent race conditions with ongoing requests.
Expand Down
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,16 @@ GITHUB_TOKEN=ghp_xxxx

# Optional: Alternative model providers (EPFL, etc.)
EPFL_API_KEY=sk-xxxx
EPFL_API_KEY_EMBEDDER=sk-xxxx

# Software catalog path
SOFTWARE_CATALOG=dataset/catalog.jsonl

# Pipeline configuration
TOP_K=8 # Number of candidates to retrieve
NUM_CHOICES=3 # Number of tools to recommend
AGENT_OUTPUT_RETRIES=3 # Structured output validation retries
EMBED_CATALOG_ON_START=1 # Pre-embed catalog if FAISS is empty

# Logging configuration
LOGLEVEL_CONSOLE=WARNING
Expand Down Expand Up @@ -119,6 +122,29 @@ available_models:
base_url: null
provider: "OpenAI"
api_key_env: "OPENAI_API_KEY"

retrieval:
embedder:
backend: "remote" # "remote" or "local"
model_name: "Qwen/Qwen3-Embedding-8B"
base_url: "https://inference-rcp.epfl.ch/v1"
api_key_env: "EPFL_API_KEY_EMBEDDER"
timeout_s: 20
# local example:
# backend: "local"
# model_name: "BAAI/bge-m3"
# device: "cpu" # optional

reranker:
backend: "remote" # "remote" or "local"
model_name: "BAAI/bge-reranker-v2-m3"
base_url: "https://inference-rcp.epfl.ch/v1"
api_key_env: "EPFL_API_KEY_EMBEDDER"
timeout_s: 20
# local example:
# backend: "local"
# model_name: "BAAI/bge-reranker-v2-m3"
# device: "cpu" # optional
```

### Running the App
Expand Down Expand Up @@ -252,10 +278,13 @@ User Input (Image + Text Query)
| Variable | Description | Default | Required |
|----------|-------------|---------|----------|
| `OPENAI_API_KEY` | OpenAI API key | - | ✅ |
| `EPFL_API_KEY_EMBEDDER` | API key for remote embedder and reranker endpoints | - | ✅ (when `retrieval.embedder.backend: remote` and/or `retrieval.reranker.backend: remote`) |
| `GITHUB_TOKEN` | GitHub token for repo info | - | ❌ |
| `SOFTWARE_CATALOG` | Path to catalog JSONL | `dataset/catalog.jsonl` | ✅ |
| `TOP_K` | Retrieval candidates count | `8` | ❌ |
| `NUM_CHOICES` | Tools to recommend | `3` | ❌ |
| `AGENT_OUTPUT_RETRIES` | Structured output validation retries | `3` | ❌ |
| `EMBED_CATALOG_ON_START` | Pre-embed catalog on startup when FAISS is empty | `1` | ❌ |
| `LOGLEVEL_CONSOLE` | Console log level | `WARNING` | ❌ |
| `LOGLEVEL_FILE` | File log level | `INFO` | ❌ |
| `FILE_LOG` | Enable file logging | `1` | ❌ |
Expand Down
24 changes: 23 additions & 1 deletion config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,26 @@ available_models:
name: "openai/gpt-oss-120b"
base_url: "https://inference-rcp.epfl.ch/v1"
provider: "EPFL"
api_key_env: "EPFL_API_KEY"
api_key_env: "EPFL_API_KEY"

# Retrieval stack (embedder + reranker)
retrieval:
embedder:
backend: "remote" # "remote" or "local"
model_name: "Qwen/Qwen3-Embedding-8B"
base_url: "https://inference-rcp.epfl.ch/v1"
api_key_env: "EPFL_API_KEY_EMBEDDER"
timeout_s: 20
# local example:
# backend: "local"
# model_name: "BAAI/bge-m3"

reranker:
backend: "remote" # "remote" or "local"
model_name: "BAAI/bge-reranker-v2-m3"
base_url: "https://inference-rcp.epfl.ch/v1"
api_key_env: "EPFL_API_KEY_EMBEDDER"
timeout_s: 20
# local example:
Comment thread
qchapp marked this conversation as resolved.
# backend: "local"
# model_name: "BAAI/bge-reranker-v2-m3"
Loading
Loading