Skip to content

Commit edf6c0a

Browse files
committed
update to the documentation
1 parent 47fd106 commit edf6c0a

10 files changed

Lines changed: 677 additions & 471 deletions

File tree

.github/workflows/deploy_docs.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ on:
44
push:
55
branches:
66
- main
7+
- feature/docs
78

89
permissions:
910
contents: read
@@ -46,7 +47,7 @@ jobs:
4647
path: ./site
4748

4849
deploy:
49-
if: github.ref == 'refs/heads/main'
50+
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/feature/docs'
5051
environment:
5152
name: github-pages
5253
url: ${{ steps.deployment.outputs.page_url }}

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,20 @@
22

33
All notable changes to this project will be documented in this file.
44

5+
## [Unreleased]
6+
7+
### Changed
8+
9+
- **Documentation sync**: Updated all documentation files to match current implementation exactly
10+
- `docs/architecture/retrieval.md`: Fixed embedder model (Qwen3-Embedding-8B via remote EPFL endpoint, not local BGE-M3); fixed reranker (BAAI/bge-reranker-v2-m3 remote)
11+
- `docs/architecture/agent.md`: Fixed default model (`openai/gpt-oss-120b` EPFL, not `gpt-4o-mini`); fixed tool names (`repo_info_batch`, not `repo_info`); updated tool caps and agent setup code
12+
- `docs/architecture/catalog.md`: Added full GraphDB SPARQL sync flow, mermaid diagram, all required environment variables
13+
- `docs/reference/environment.md`: Complete rewrite — comprehensive table of 27+ env vars including EPFL keys, GRAPHDB_* vars, SYNC_* vars, EMBED_CATALOG_ON_START, AGENT_CACHE_MAX, AGENT_OUTPUT_RETRIES, IMAGE_META_CACHE_MAX, DEBUG
14+
- `docs/getting-started/configuration.md`: Fixed default model and config.yaml example to match actual defaults; added EPFL API key docs and GraphDB sync vars; added local retrieval instructions
15+
- `docs/development/structure.md`: Added `core/` module, `queries/` directory, updated `agent/tools/` listing with all tool files and mcp/ subdir; fixed generator schema models; improved retriever pipeline description
16+
- `docs/reference/cli.md`: Updated `ai_agent sync` to describe actual SPARQL/GraphDB mechanism and required env vars
17+
- `docs/index.md`: Updated retrieval stack description to Qwen3-Embedding-8B + BGE-M3
18+
519
## [1.0.0]
620

721
### 🚀 Major Features

docs/architecture/agent.md

Lines changed: 70 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -45,21 +45,28 @@ graph TB
4545

4646
### Agent Definition
4747

48+
The agent model is determined by `config.yaml` (`agent_model` section). When a `base_url` is provided, `OpenAIChatModel` (chat/completions API) is used; otherwise `OpenAIResponsesModel` (Responses API) is used:
49+
4850
```python
4951
from pydantic_ai import Agent
50-
from pydantic_ai.models.openai import OpenAIResponsesModel
52+
from pydantic_ai.models.openai import OpenAIResponsesModel, OpenAIChatModel
5153
from pydantic_ai.providers.openai import OpenAIProvider
5254
from ai_agent.generator.prompts import get_agent_system_prompt
53-
from ai_agent.generator.schema import ToolSelection
5455
from ai_agent.agent.utils import AgentState
5556

56-
provider = OpenAIProvider(api_key=os.getenv("OPENAI_API_KEY"))
57-
openai_model = OpenAIResponsesModel(model_name="gpt-4o-mini", provider=provider)
57+
# Custom endpoint (e.g. EPFL) → OpenAIChatModel
58+
provider = OpenAIProvider(base_url="https://inference-rcp.epfl.ch/v1", api_key=api_key)
59+
openai_model = OpenAIChatModel(model_name="openai/gpt-oss-120b", provider=provider)
60+
61+
# Default OpenAI endpoint → OpenAIResponsesModel
62+
# provider = OpenAIProvider(api_key=api_key)
63+
# openai_model = OpenAIResponsesModel(model_name="gpt-4o-mini", provider=provider)
5864

5965
agent = Agent(
6066
model=openai_model,
6167
system_prompt=get_agent_system_prompt(num_choices=3),
6268
deps_type=AgentState,
69+
output_retries=3, # configurable via AGENT_OUTPUT_RETRIES
6370
)
6471
```
6572

@@ -68,7 +75,11 @@ agent = Agent(
6875
- `model`: VLM model to use (configurable via `config.yaml`)
6976
- `system_prompt`: Agent role, scoring rules, and output format (from `generator/prompts.py`)
7077
- `deps_type`: `AgentState` — tracks tool calls, quotas, and session overrides
71-
- `output_type`: `ToolSelection` — passed to `agent.run_sync()` to enforce structured JSON output (not set on the `Agent` constructor)
78+
- `output_retries`: Number of times the agent retries if output validation fails (env `AGENT_OUTPUT_RETRIES`, default `3`)
79+
- `output_type`: `ToolSelection` — passed to `agent.run_sync()` to enforce structured JSON output
80+
81+
!!! note "Agent cache"
82+
A bounded LRU cache of agent instances (max size `AGENT_CACHE_MAX`, default 16) avoids rebuilding the provider/model objects on every request when the same custom endpoint + model combination is used repeatedly.
7283

7384
### Conversation State
7485

@@ -98,111 +109,87 @@ class AgentState(BaseModel):
98109

99110
## Agent Tools
100111

101-
Tools extend agent capabilities beyond chat:
112+
The agent has three registered tools, each with a hard per-run call limit enforced by the `@limit_tool_calls` decorator:
113+
114+
| Tool | Cap | Description |
115+
|------|-----|-------------|
116+
| `search_tools` | 1 | Initial semantic search — called exactly once per run |
117+
| `search_alternative` | 3 | Alternative query formulation for broader/different search |
118+
| `repo_info_batch` | 4 | Batch GitHub repository summary lookup |
119+
120+
### search_tools
121+
122+
Initial tool retrieval — called **exactly once** per agent run:
123+
124+
```python
125+
@agent.tool(retries=2, prepare=cap_prepare)
126+
@limit_tool_calls("search_tools", cap=1)
127+
async def search_tools(
128+
ctx: RunContext[AgentState],
129+
query: str,
130+
excluded: List[str] | None = None,
131+
top_k: int = 12,
132+
) -> List[dict]:
133+
"""Initial semantic search with automatic reranking."""
134+
...
135+
```
136+
137+
Automatically injects `excluded_tools`, `image_paths`, and `original_formats` from `AgentState`, so the LLM does not reason about file paths.
102138

103139
### search_alternative
104140

105-
Request alternative search with different query formulation:
141+
Try a different query formulation (up to 3 times per run):
106142

107143
```python
108144
@agent.tool(retries=2, prepare=cap_prepare)
109145
@limit_tool_calls("search_alternative", cap=3)
110146
async def search_alternative(
111-
ctx: RunContext[AgentState],
147+
ctx: RunContext[AgentState],
112148
alternative_query: str,
113149
excluded: List[str] | None = None,
114150
top_k: int = 12,
115151
) -> List[dict]:
116-
"""Search for tools using an alternative query formulation."""
117-
118-
inp = SearchAlternativeInput(
119-
alternative_query=alternative_query,
120-
excluded=excluded or [],
121-
top_k=top_k,
122-
original_formats=ctx.deps.original_formats,
123-
image_paths=ctx.deps.image_paths,
124-
)
125-
out = tool_search_alternative(inp)
126-
return [c.model_dump(mode="python") for c in out.candidates]
152+
"""Search with an alternative query formulation (includes automatic reranking)."""
153+
...
127154
```
128155

129-
**Usage**:
130-
131-
- Agent invokes when user asks for alternatives
132-
- Up to 3 calls per conversation
133-
- Formulates semantically different queries
134-
135156
**Example**:
136157
```
137158
User: Show me alternatives
138-
Agent: [Calls search_alternative with "pulmonary segmentation CT"]
159+
Agent: [Calls search_alternative with "pulmonary airway segmentation CT volume"]
139160
```
140161

141-
### repo_info
162+
### repo_info_batch
142163

143-
Fetch GitHub repository details:
164+
Fetch GitHub repository summaries for **multiple repositories in parallel** (up to 4 calls per run):
144165

145166
```python
146167
@agent.tool(retries=2, prepare=cap_prepare)
147-
@limit_tool_calls("repo_info", cap=12)
148-
async def repo_info(ctx: RunContext[AgentState], url: str, tool_name: str | None = None) -> dict:
149-
"""Fetch a short summary of a GitHub repository."""
150-
151-
# Normalize to canonical GitHub URL
152-
norm_url = coerce_github_url_or_none(url)
153-
154-
# Call tool_repo_summary (tries DeepWiki MCP first, falls back to repocards)
155-
out = await tool_repo_summary(RepoSummaryInput(url=norm_url, tool_name=tool_name))
156-
return out.model_dump(mode="python")
168+
@limit_tool_calls("repo_info_batch", cap=4)
169+
async def repo_info_batch(
170+
ctx: RunContext[AgentState],
171+
urls: List[str],
172+
) -> List[dict]:
173+
"""Fetch repository summaries for multiple repositories in parallel."""
174+
...
157175
```
158176

159-
**Data sources**:
160-
161-
1. **DeepWiki MCP**: Pre-indexed, fast, no rate limits
162-
2. **Repocards**: Direct fetch, fallback for new repos
177+
- Accepts a list of GitHub URLs; non-GitHub URLs are skipped with a `NON_GITHUB_URL` reason
178+
- Deduplicates URLs before fetching
179+
- Uses `asyncio.gather()` for parallel fetching
180+
- Falls back gracefully if a single repo fetch fails
163181

164-
**Returns**:
182+
**Data sources** (tried in order):
165183

166-
- Repository description
167-
- Stars, language, topics
168-
- Last update date
169-
- License information
184+
1. **DeepWiki MCP**: Pre-indexed repository documentation — fast, no rate limits
185+
2. **Repocards**: Direct library-based fetch — fallback for repos not yet in DeepWiki
170186

171187
**Example**:
172188
```
173-
User: Tell me about TotalSegmentator
174-
Agent: [Calls repo_info("https://github.com/wasserth/TotalSegmentator")]
175-
176-
TotalSegmentator is an automated multi-organ segmentation tool...
177-
⭐ 1.2k stars | Python | Apache-2.0 license
178-
Topics: segmentation, medical-imaging, deep-learning
189+
Agent: [Calls repo_info_batch(["https://github.com/wasserth/TotalSegmentator",
190+
"https://github.com/MIC-DKFZ/nnUNet"])]
179191
```
180192

181-
### run_example
182-
183-
Execute Gradio Space demos (optional, experimental):
184-
185-
```python
186-
@agent.tool(retries=0, prepare=cap_prepare)
187-
@limit_tool_calls("run_example", cap=1)
188-
async def run_example(
189-
ctx: RunContext[AgentState],
190-
tool_name: str,
191-
endpoint_url: str | None = None,
192-
extra_text: str | None = None,
193-
) -> dict:
194-
"""Run an example / demo for a given tool via its Gradio space."""
195-
196-
out = tool_run_example(RunExampleInput(
197-
tool_name=tool_name,
198-
endpoint_url=endpoint_url,
199-
extra_text=extra_text,
200-
))
201-
return out.model_dump(mode="python")
202-
```
203-
204-
**Status**: Partially implemented, limited to specific demo formats.
205-
206193
## Selection and Ranking
207194

208195
The PydanticAI agent performs tool selection and ranking directly as part of its LLM reasoning step. There is no separate `VLMToolSelector` class — the agent's system prompt (defined in `generator/prompts.py`) encodes the scoring rules, and the `ToolSelection` Pydantic schema (defined in `generator/schema.py`) enforces structured output.
@@ -212,7 +199,7 @@ The PydanticAI agent performs tool selection and ranking directly as part of its
212199
The agent system prompt is assembled by `get_agent_system_prompt()` in `generator/prompts.py` and covers:
213200

214201
- **Image analysis**: Instructions to analyze the attached preview image and reference visual observations in explanations
215-
- **Tool call sequence**: When to call `search_tools`, `search_alternative`, `repo_info`, and `run_example`
202+
- **Tool call sequence**: When to call `search_tools`, `search_alternative`, `repo_info_batch`
216203
- **Scoring rules**: Accuracy (0–100) = Task match (40) + Format compatibility (30) + Features (30)
217204
- **Output format**: Single JSON object matching the `ToolSelection` schema
218205

@@ -227,7 +214,7 @@ system_prompt = get_agent_system_prompt(num_choices=3)
227214

228215
#### Step 1: Tool Calls (Retrieval)
229216

230-
The agent calls `search_tools` once (and optionally `search_alternative` up to 3 times) to retrieve candidate tools from the vector index:
217+
The agent calls `search_tools` **exactly once** (and optionally `search_alternative` up to 3 times) to retrieve candidate tools from the vector index:
231218

232219
```
233220
Agent → search_tools(query="segment lungs", top_k=12)
@@ -236,11 +223,11 @@ Agent → search_tools(query="segment lungs", top_k=12)
236223

237224
#### Step 2: Verification
238225

239-
For each finalist the agent plans to recommend, it calls `repo_info` to fetch up-to-date GitHub metadata:
226+
For finalists the agent plans to recommend, it calls `repo_info_batch` in a **single batch call**:
240227

241228
```
242-
Agent → repo_info(url="https://github.com/wasserth/TotalSegmentator")
243-
← {stars: 1200, language: "Python", topics: [...], description: "..."}
229+
Agent → repo_info_batch(urls=["https://github.com/wasserth/TotalSegmentator", ...])
230+
[{stars: 1200, language: "Python", topics: [...], description: "..."}, ...]
244231
```
245232

246233
#### Step 3: Structured Output

0 commit comments

Comments
 (0)