Feature/speeding#27
Conversation
There was a problem hiding this comment.
Pull request overview
This PR focuses on speeding up the imaging-tool RAG agent by adding remote embedder/reranker support, multiple in-process caches (agent instances, repo summaries, previews, metadata), and improved observability/configuration so repeated requests are cheaper and easier to debug.
Changes:
- Added config-driven retrieval backends (remote/local embedder + reranker) and optional startup catalog pre-embedding.
- Introduced multiple in-memory caches (preview generation, image metadata, repo info w/ TTL + in-flight dedup, dynamic agent instance reuse) plus latency logging.
- Refactored repo verification toward batched/parallel behavior (
repo_info_batch) and tightened query sanitization to reduce retrieval drift.
Reviewed changes
Copilot reviewed 36 out of 36 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_reranker_remote.py | Adds unit tests for remote reranker request/response handling. |
| tests/test_previews_cache.py | Adds tests for preview cache TTL/capacity + resize behavior. |
| tests/test_pipeline_catalog_reader.py | Tests catalog reader accepting JSONL + JSON array. |
| tests/test_gpt4o_vision.py | Moves smoke-test execution under main() / __main__ guard. |
| tests/test_epfl_vision.py | Refactors EPFL vision smoke test into main() returning exit codes. |
| tests/test_embedder_remote.py | Adds unit tests for remote embeddings client behavior and dim probing. |
| tests/test_deepwiki_repo_info.py | Adds tests for repo info cache hit + in-flight dedup + clear helper. |
| src/ai_agent/utils/utils.py | Adds _env_flag() helper for robust boolean env parsing. |
| src/ai_agent/utils/previews.py | Adds preview resizing + TTL cache keyed by file fingerprints; removes annotation overlays. |
| src/ai_agent/utils/image_meta.py | Adds in-process metadata cache keyed by (path, mtime_ns, size). |
| src/ai_agent/utils/image_io.py | Removes unused variables while stacking DICOM frames. |
| src/ai_agent/utils/config.py | Adds get_retrieval_config() for retrieval stack configuration. |
| src/ai_agent/ui/handlers.py | Imports run_agent at module scope; forwards precomputed image_metadata into agent call. |
| src/ai_agent/ui/app.py | Adds ensure_logging_initialized() and pre-checks for port fallback availability. |
| src/ai_agent/retriever/vector_index.py | Switches to absolute imports for embedder/doc types. |
| src/ai_agent/retriever/utils.py | Adds device auto-resolution helper for local ST backends (cuda/mps/cpu). |
| src/ai_agent/retriever/text_embedder.py | Reworks embedder into remote OpenAI-compatible client with optional local backend. |
| src/ai_agent/retriever/reranker.py | Reworks reranker into remote OpenAI-compatible client with optional local backend. |
| src/ai_agent/generator/schema.py | Adds validators/coercions for status/rank/accuracy/reason robustness. |
| src/ai_agent/generator/prompts.py | Updates agent tooling rules prompt to use repo_info_batch and constrain tool usage. |
| src/ai_agent/cli.py | Fixes refresh loop timing/unpacking; ensures logging init in run_chat(). |
| src/ai_agent/catalog/sync.py | Adds startup “skip sync if fresh” and robust FAISS rebuild-on-incompatibility behavior. |
| src/ai_agent/api/pipeline.py | Adds retrieval stack config loading + optional startup catalog embedding when index empty. |
| src/ai_agent/agent/tools/utils.py | Cleans typing import; (context) provides lightweight catalog loading helper used by tools. |
| src/ai_agent/agent/tools/search_tool.py | Adds retrieval query sanitization using catalog tool names. |
| src/ai_agent/agent/tools/search_alternative_tool.py | Adds retrieval query sanitization for alternative searches. |
| src/ai_agent/agent/tools/repo_info_tool.py | Adds TTL cache + in-flight dedup for repo summaries. |
| src/ai_agent/agent/tools/query_utils.py | Adds sanitize_retrieval_query() to remove repo-drift/low-signal tokens. |
| src/ai_agent/agent/tools/mcp/init.py | Uses import_module for MCP tool registration. |
| src/ai_agent/agent/tools/deepwiki_tool.py | Makes DeepWiki timeout configurable via env. |
| src/ai_agent/agent/tools/init.py | Uses import_module for tool registration imports. |
| src/ai_agent/agent/agent.py | Adds dynamic Agent instance caching, tool timing logs, and repo_info_batch. |
| README.md | Documents new env vars and retrieval config block example. |
| config.yaml | Adds retrieval.embedder / retrieval.reranker configuration blocks. |
| CHANGELOG.md | Documents new performance/config changes (but headings need cleanup). |
| .env.dist | Adds new env vars (EPFL_API_KEY_EMBEDDER, AGENT_OUTPUT_RETRIES, EMBED_CATALOG_ON_START). |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR improves the imaging-agent stack’s performance and configurability by moving retrieval to configurable local/remote backends, adding multiple in-process caches (previews, image metadata, repo summaries, agent instances), and updating the agent prompt/tooling to use batched repository verification.
Changes:
- Add config-driven remote/local embedder + reranker support (EPFL OpenAI-compatible endpoints) and optional startup catalog pre-embedding.
- Introduce caching for VLM previews, image metadata, repo summaries (TTL + in-flight dedup), and dynamic Agent instances; add latency logging for observability.
- Replace single-repo verification with
repo_info_batch, plus related prompt/docs/test updates.
Reviewed changes
Copilot reviewed 36 out of 36 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_reranker_remote.py | Unit tests for remote reranker request/response handling. |
| tests/test_previews_cache.py | Unit tests for preview cache behavior (TTL, capacity, missing files). |
| tests/test_pipeline_catalog_reader.py | Tests for catalog reader supporting JSONL + JSON-array formats. |
| tests/test_gpt4o_vision.py | Refactors smoke-test script to avoid import-time side effects. |
| tests/test_epfl_vision.py | Refactors EPFL vision smoke-test into a main() entrypoint. |
| tests/test_embedder_remote.py | Unit tests for remote embedder request/response handling and dim probing. |
| tests/test_deepwiki_repo_info.py | Adds tests for repo info TTL cache and in-flight dedup helpers. |
| src/ai_agent/utils/utils.py | Adds _env_flag helper for robust boolean env parsing. |
| src/ai_agent/utils/previews.py | Adds preview resizing + in-memory preview cache keyed by file fingerprints. |
| src/ai_agent/utils/image_meta.py | Adds in-process metadata caching keyed by (path, mtime_ns, size). |
| src/ai_agent/utils/image_io.py | Removes unused variables in DICOM series stacking path. |
| src/ai_agent/utils/config.py | Adds get_retrieval_config() to expose retrieval settings from config.yaml. |
| src/ai_agent/ui/handlers.py | Passes precomputed image_metadata into run_agent to avoid duplicate work. |
| src/ai_agent/ui/app.py | Adds ensure_logging_initialized() and improves port fallback selection. |
| src/ai_agent/retriever/vector_index.py | Switches to absolute imports for embedder/doc types. |
| src/ai_agent/retriever/utils.py | Adds _resolve_local_device() helper (cuda/mps/cpu selection). |
| src/ai_agent/retriever/text_embedder.py | Implements remote embeddings client + optional local backend under legacy class name. |
| src/ai_agent/retriever/reranker.py | Implements remote reranker client + optional local backend and payload parsing. |
| src/ai_agent/generator/schema.py | Adds coercion validators to tolerate structured-output drift (status/rank/accuracy/reason). |
| src/ai_agent/generator/prompts.py | Updates agent tool rules to use repo_info_batch and one-search-per-run guidance. |
| src/ai_agent/cli.py | Fixes UI function unpacking, ensures logging early, delays first background refresh. |
| src/ai_agent/catalog/sync.py | Adds “skip if fresh” support and rebuilds FAISS when artifacts are incompatible. |
| src/ai_agent/api/pipeline.py | Loads retrieval backends from config; adds startup catalog embedding and catalog reader. |
| src/ai_agent/agent/tools/utils.py | Caches catalog tool names for query sanitization support. |
| src/ai_agent/agent/tools/search_tool.py | Sanitizes retrieval queries using known tool names before retrieval. |
| src/ai_agent/agent/tools/search_alternative_tool.py | Sanitizes alternative queries similarly to reduce repo/tool-name drift. |
| src/ai_agent/agent/tools/repo_info_tool.py | Adds repo-summary TTL cache and in-flight deduplication. |
| src/ai_agent/agent/tools/query_utils.py | Adds sanitize_retrieval_query() to strip repo-drift and low-signal terms. |
| src/ai_agent/agent/tools/mcp/init.py | Switches MCP tool registration to dynamic import. |
| src/ai_agent/agent/tools/deepwiki_tool.py | Makes DeepWiki timeout configurable via env var. |
| src/ai_agent/agent/tools/init.py | Switches tool registration to dynamic imports. |
| src/ai_agent/agent/agent.py | Adds agent instance caching, latency logging, and repo_info_batch tool adapter. |
| README.md | Documents new env vars and retrieval configuration blocks. |
| config.yaml | Adds retrieval.embedder / retrieval.reranker blocks for config-driven retrieval. |
| CHANGELOG.md | Documents caching, retrieval backend changes, and startup embedding behavior. |
| .env.dist | Adds new env var placeholders and knobs (embedder key, output retries, startup embed). |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This pull request introduces significant improvements to the agent's retrieval, caching, and configuration systems, with a focus on performance, observability, and flexibility. The most notable changes include the addition of a pre-embedding step for the software catalog, remote embedding/reranking support, dynamic agent instance caching, and a new parallel repository verification tool. Configuration files and documentation have been updated to reflect these enhancements.
Retrieval and Catalog Embedding Enhancements
EMBED_CATALOG_ON_STARTenvironment variable. This reduces latency on user requests by ensuring embeddings are ready at startup. [1] [2] [3] [4]EPFL_API_KEY_EMBEDDERvariable. Retrieval stack settings can now be specified inconfig.yamlunder theretrievalblock. [1] [2] [3]Agent and Tool Performance Improvements
Agentinstances keyed by model and endpoint configuration, reducing overhead for repeated requests with the same settings.Repository Verification Refactor
repo_infotool with a newrepo_info_batchtool that fetches multiple repository summaries in parallel, improving efficiency and simplifying code paths. Single-repo verification now also uses the batch tool for consistency. [1] [2]Configuration and Documentation Updates
.env.dist,README.md, andCHANGELOG.mdto document new environment variables, configuration options, and the retrieval stack structure. [1] [2] [3] [4] [5] [6] [7]Bug Fixes and Robustness
These changes collectively make the agent more robust, configurable, and performant, especially in environments using remote retrieval components or handling high request volumes.