Skip to content

Commit 98af8ec

Browse files
authored
Merge pull request #27 from Imaging-Plaza/feature/speeding
Feature/speeding
2 parents 1631455 + ddc595a commit 98af8ec

36 files changed

Lines changed: 1883 additions & 596 deletions

.env.dist

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ GITHUB_TOKEN=ghp_xxxx
44

55
# Additional API keys for alternative models (if using EPFL or other providers)
66
EPFL_API_KEY=sk-xxxx
7+
EPFL_API_KEY_EMBEDDER=sk-xxxx
78

89
# Software catalog
910
SOFTWARE_CATALOG=path/to/your/catalog.jsonl
@@ -12,6 +13,8 @@ SOFTWARE_CATALOG=path/to/your/catalog.jsonl
1213
TOP_K=8 # Number of candidates to retrieve
1314
NUM_CHOICES=3 # Number of tools to recommend
1415
USE_AGENT=1 # Use pydantic-ai agent (1) or standard pipeline (0)
16+
AGENT_OUTPUT_RETRIES=3 # Structured output validation retries
17+
EMBED_CATALOG_ON_START=1 # Pre-embed catalog at startup if FAISS is empty
1518

1619
# Logging configuration
1720
LOGLEVEL_CONSOLE=WARNING

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ All notable changes to this project will be documented in this file.
88
- **Agent run_example tool**: Removed autonomous tool execution capability from agent. Agent now only recommends tools - all execution requires explicit user approval via approval buttons. This enforces consistent security/UX model where users maintain full control over tool execution. The underlying `gradio_space_tool.py` remains for UI-initiated demo execution.
99

1010
### Added
11+
- **Startup catalog pre-embedding**: Pipeline now pre-embeds the software catalog at application startup when FAISS is empty, so user requests only require query embedding + FAISS search. Controlled by `EMBED_CATALOG_ON_START` (default `1`).
1112
- **Project and maintainer documentation expansion**:
1213
- Added [AGENTS.md](AGENTS.md) with repository-wide agent workflow guidance, dev-container-first defaults, and documentation maintenance rules.
1314
- Added [docs/guide.md](docs/guide.md) as a detailed contributor map covering module responsibilities, Python/package defaults, command baseline, known inconsistencies, and improvement guidelines.
@@ -54,8 +55,26 @@ All notable changes to this project will be documented in this file.
5455
- **MCP Tools Subpackage** (`agent/tools/mcp/`): Organized separation of registered imaging tools (MCP protocol) from agent utilities. Base models, registry, and imaging tools (e.g., lungs_segmentation) now in dedicated subpackage for clarity.
5556
- **Base Tool Models** (`agent/tools/mcp/base.py`): Standard Pydantic schemas for tool consistency
5657
- **Tool Registration**: Lungs segmentation tool self-registers with complete field mappings
58+
- **In-process metadata cache** (`utils/image_meta.py`): `summarize_image_metadata` now caches results keyed by `(resolved_path, mtime_ns, size_bytes)`. Eliminates redundant file reads across the three call sites per request. Saves ~290 ms per warm request.
59+
- **Pre-computed metadata forwarded to `run_agent`** (`ui/handlers.py`): The `image_metadata` string computed by `_build_preview_for_vlm` is now forwarded directly to `run_agent`, skipping the second `summarize_image_metadata` call that previously occurred unconditionally inside the agent entry point.
60+
- **Dynamic `Agent` instance caching** (`agent/agent.py`): Runtime `Agent` objects created for custom model/endpoint combinations are stored in a module-level dict keyed by `(model, base_url, api_key_env, num_choices)`. Subsequent requests with the same non-default configuration reuse the cached instance.
5761

5862
### Changed
63+
- **Preview image simplification and size cap**: VLM preview generation no longer writes metadata text overlays onto preview images. Previews are now downscaled with aspect ratio preservation using a configurable maximum side length (`PREVIEW_MAX_SIDE_PX`, default `500`) to keep large images lightweight.
64+
- **Batched-only repository verification in agent loop**: Agent prompt and runtime tool registration now use `repo_info_batch(urls)` as the single repository verification path (including one-item lists for single repos), removing mixed single-vs-batch behavior during recommendation runs.
65+
66+
### Removed
67+
- **Legacy single-repo agent adapter**: Removed the unused `repo_info` agent adapter function to avoid confusion; agent recommendation runs now verify repositories exclusively through `repo_info_batch`.
68+
- **Fast mode controls**: Removed fast mode from runtime, prompts, and UI settings to keep behavior deterministic and avoid dual execution paths.
69+
- **Repo summary performance optimization**: `repo_info` now uses an in-memory TTL cache (configurable via `REPO_INFO_CACHE_TTL_SECONDS`, default 3600s) and in-flight request deduplication for identical repository URLs. This avoids repeated DeepWiki/repocards fetches during iterative agent runs and parallel tool calls.
70+
- **Parallel repository verification tool**: Added `repo_info_batch(urls)` tool to fetch multiple GitHub repository summaries concurrently, reducing end-to-end latency when verifying several finalists.
71+
- **Latency observability**: Agent now logs per-tool durations and a request-level latency summary (`total_ms`, metadata time, model execution time, and aggregated tool timing) to make bottlenecks directly visible in runtime logs.
72+
- **Startup sync freshness skip**: Added optional local-freshness short-circuit in catalog sync to avoid repeated remote SPARQL fetches on quick restarts when local catalog + FAISS artifacts are present. Controlled by `SYNC_SKIP_IF_FRESH_SECONDS` (disabled by default) and `SYNC_FORCE=1` to bypass.
73+
- **Preview generation cache**: Added in-memory TTL cache for generated VLM previews keyed by file fingerprints (path/mtime/size), reducing repeated 3D orthogonal composite generation for identical inputs. Controlled by `PREVIEW_CACHE_TTL_SECONDS` (default 1800) and `PREVIEW_CACHE_MAX_ENTRIES` (default 64).
74+
- **Port fallback pre-selection**: UI launch now pre-checks for the first available port in the fallback range before calling Gradio launch, reducing repeated bind-failure retries when the preferred port is busy.
75+
- **Config-driven retrieval backends**: Added `retrieval.embedder` and `retrieval.reranker` blocks in `config.yaml` so embedder/reranker setup can be configured without `.env`-only wiring. Supports `backend: remote|local` with simple fields (`model_name`, `base_url`, `api_key_env`, `timeout_s`, optional `device`). Environment variables remain as fallback.
76+
- **Remote embedder integration**: Retrieval embeddings now call the EPFL OpenAI-compatible endpoint by default (`https://inference-rcp.epfl.ch/v1`) using model `Qwen/Qwen3-Embedding-8B` and key from `EPFL_API_KEY_EMBEDDER`.
77+
- **Remote reranker integration**: Retrieval reranking now calls a remote endpoint instead of loading a local CrossEncoder model. Default settings target EPFL (`https://inference-rcp.epfl.ch/v1`) with model `BAAI/bge-reranker-v2-m3`, using `EPFL_API_KEY_EMBEDDER` for authentication.
5978
- **Documentation cleanup**: Removed stale references to `[NO_RERANK]` and `[REFINE]` control-tag behavior from user/architecture docs, and updated structure/instruction docs to current agent layout (`agent/tools/`, `agent/utils.AgentState`), active CLI usage (`ai_agent chat`), and current testing guidance.
6079
- CLI now supports `ai_agent chat`
6180
- **DeepWiki MCP integration**: Repository info tool now uses DeepWiki MCP server (https://mcp.deepwiki.com/sse) as primary source for GitHub repository documentation. DeepWiki provides fast, pre-indexed documentation access without API rate limits.
@@ -101,6 +120,10 @@ All notable changes to this project will be documented in this file.
101120
- CLI no more supports `ai_agent ui` command
102121

103122
### Fixed
123+
- **Startup refresh regression**: Fixed CLI background refresh unpacking after UI function signature changes (`ValueError: too many values to unpack`) and delayed first auto-refresh cycle to avoid duplicate immediate catalog sync right after startup sync.
124+
- **Structured output validation robustness**: Reduced `Exceeded maximum retries ... for output validation` failures by increasing agent output retries for custom endpoint runs and making ToolSelection parsing more tolerant to common formatting drift (status/reason/rank/accuracy coercion).
125+
- **Retrieval query drift guardrails**: Added sanitization for LLM-generated retrieval queries to strip repository-oriented terms (e.g., `github`, `repository`, `official`) and avoid tool-name-only drift. Repeated `search_tools` attempts are now rerouted as alternative searches instead of failing with quota errors.
126+
- **FAISS rebuild on embedder changes**: When the catalog content is unchanged but the embedding model/dimension changes, sync now detects incompatible/missing FAISS artifacts and rebuilds the index instead of keeping stale artifacts. This prevents empty retrieval results after embedder migrations.
104127
- **Pydantic Forward Reference**: Reordered class definitions in `schema.py` so `Conversation` and `ConversationStatus` are defined before `ToolSelection` to prevent "class-not-fully-defined" errors.
105128
- **Conversation Context**: Agent now properly maintains conversation history, enabling natural understanding of follow-up requests like "show me alternatives".
106129
- **Clear Button**: Disabled during processing to prevent race conditions with ongoing requests.

README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,13 +62,16 @@ GITHUB_TOKEN=ghp_xxxx
6262
6363
# Optional: Alternative model providers (EPFL, etc.)
6464
EPFL_API_KEY=sk-xxxx
65+
EPFL_API_KEY_EMBEDDER=sk-xxxx
6566
6667
# Software catalog path
6768
SOFTWARE_CATALOG=dataset/catalog.jsonl
6869
6970
# Pipeline configuration
7071
TOP_K=8 # Number of candidates to retrieve
7172
NUM_CHOICES=3 # Number of tools to recommend
73+
AGENT_OUTPUT_RETRIES=3 # Structured output validation retries
74+
EMBED_CATALOG_ON_START=1 # Pre-embed catalog if FAISS is empty
7275
7376
# Logging configuration
7477
LOGLEVEL_CONSOLE=WARNING
@@ -119,6 +122,29 @@ available_models:
119122
base_url: null
120123
provider: "OpenAI"
121124
api_key_env: "OPENAI_API_KEY"
125+
126+
retrieval:
127+
embedder:
128+
backend: "remote" # "remote" or "local"
129+
model_name: "Qwen/Qwen3-Embedding-8B"
130+
base_url: "https://inference-rcp.epfl.ch/v1"
131+
api_key_env: "EPFL_API_KEY_EMBEDDER"
132+
timeout_s: 20
133+
# local example:
134+
# backend: "local"
135+
# model_name: "BAAI/bge-m3"
136+
# device: "cpu" # optional
137+
138+
reranker:
139+
backend: "remote" # "remote" or "local"
140+
model_name: "BAAI/bge-reranker-v2-m3"
141+
base_url: "https://inference-rcp.epfl.ch/v1"
142+
api_key_env: "EPFL_API_KEY_EMBEDDER"
143+
timeout_s: 20
144+
# local example:
145+
# backend: "local"
146+
# model_name: "BAAI/bge-reranker-v2-m3"
147+
# device: "cpu" # optional
122148
```
123149

124150
### Running the App
@@ -252,10 +278,13 @@ User Input (Image + Text Query)
252278
| Variable | Description | Default | Required |
253279
|----------|-------------|---------|----------|
254280
| `OPENAI_API_KEY` | OpenAI API key | - ||
281+
| `EPFL_API_KEY_EMBEDDER` | API key for remote embedder and reranker endpoints | - | ✅ (when `retrieval.embedder.backend: remote` and/or `retrieval.reranker.backend: remote`) |
255282
| `GITHUB_TOKEN` | GitHub token for repo info | - ||
256283
| `SOFTWARE_CATALOG` | Path to catalog JSONL | `dataset/catalog.jsonl` ||
257284
| `TOP_K` | Retrieval candidates count | `8` ||
258285
| `NUM_CHOICES` | Tools to recommend | `3` ||
286+
| `AGENT_OUTPUT_RETRIES` | Structured output validation retries | `3` ||
287+
| `EMBED_CATALOG_ON_START` | Pre-embed catalog on startup when FAISS is empty | `1` ||
259288
| `LOGLEVEL_CONSOLE` | Console log level | `WARNING` ||
260289
| `LOGLEVEL_FILE` | File log level | `INFO` ||
261290
| `FILE_LOG` | Enable file logging | `1` ||

config.yaml

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,26 @@ available_models:
3939
name: "openai/gpt-oss-120b"
4040
base_url: "https://inference-rcp.epfl.ch/v1"
4141
provider: "EPFL"
42-
api_key_env: "EPFL_API_KEY"
42+
api_key_env: "EPFL_API_KEY"
43+
44+
# Retrieval stack (embedder + reranker)
45+
retrieval:
46+
embedder:
47+
backend: "remote" # "remote" or "local"
48+
model_name: "Qwen/Qwen3-Embedding-8B"
49+
base_url: "https://inference-rcp.epfl.ch/v1"
50+
api_key_env: "EPFL_API_KEY_EMBEDDER"
51+
timeout_s: 20
52+
# local example:
53+
# backend: "local"
54+
# model_name: "BAAI/bge-m3"
55+
56+
reranker:
57+
backend: "remote" # "remote" or "local"
58+
model_name: "BAAI/bge-reranker-v2-m3"
59+
base_url: "https://inference-rcp.epfl.ch/v1"
60+
api_key_env: "EPFL_API_KEY_EMBEDDER"
61+
timeout_s: 20
62+
# local example:
63+
# backend: "local"
64+
# model_name: "BAAI/bge-reranker-v2-m3"

0 commit comments

Comments
 (0)