Skip to content

Commit 2c6e8b1

Browse files
committed
feat(v2 runtime): V2_LLM_MODEL_OVERRIDE to swap the model across all profiles
When the default `openai/gpt-oss-120b` on the shared EPFL inference endpoint goes degraded (200 OK + empty body, observed 2026-05-19/20), operators previously had to edit `src/v1/llm/model_config.py` to repoint all 18 analysis profiles at a working model and ship a patch. Add a single env override read in `load_model_config()` that rewrites the `model` field on every loaded profile when set, leaving the JSON per-profile env overrides (`LLM_ANALYSIS_MODELS`, etc.) untouched. Documented in `.env.example` with concrete fallback candidates that were verified working on the endpoint at degradation time (`Qwen/Qwen3-30B-A3B-Instruct-2507`, `mistralai/Ministral-3-8B-Instruct-2512`).
1 parent 2ff9b6f commit 2c6e8b1

2 files changed

Lines changed: 97 additions & 2 deletions

File tree

.env.example

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,38 @@ OPENROUTER_API_KEY=your-openrouter-key-here
7575
# `excluded_entities` with reason "critic pruning".
7676
# V2_APPLY_CRITIC_PRUNING=false
7777

78+
# Maximum number of LLM agents that may run concurrently inside any fan-out
79+
# stage (person/org/article/membership/contribution). Higher values speed up
80+
# heavy repos (renku-class, ~80 contributors) but push more parallel load
81+
# onto the model endpoint. Default: 6.
82+
# V2_MAX_CONCURRENT_AGENTS=6
83+
84+
# Per-link veracity stage (Selenium fetch + LLM verdict on every external
85+
# URL the pipeline collects). Off by default — it is the slowest stage and
86+
# rarely changes outputs; enable only when auditing link rot or veracity.
87+
# V2_LINK_VERACITY_ENABLED=false
88+
89+
# Hybrid refiner stage: after the rule-based pipeline runs, the LLM agent
90+
# pool refines/repairs each entity (canonical IDs, missing fields). Default
91+
# `true` for `agent_runtime=llm` and ignored for `rule_based`.
92+
# V2_HYBRID_REFINER_ENABLED=true
93+
94+
# Top-N bookend contributors to materialise in context_gather (first/last
95+
# committers by date). Larger values widen the scout brief but inflate the
96+
# token bill. Default: 50.
97+
# V2_CONTRIBUTOR_BOOKENDS_TOP_N=50
98+
99+
# Override the LLM model used by every v2 agent profile (rewrites the
100+
# `model` field across every `MODEL_CONFIGS[*]` entry at load time).
101+
# Useful when the default model on the shared inference endpoint is
102+
# degraded and you want a temporary fallback without editing
103+
# `src/v1/llm/model_config.py`. Unset / empty = use the defaults
104+
# (currently `openai/gpt-oss-120b`).
105+
# Examples for inference-rcp.epfl.ch when gpt-oss is down:
106+
# V2_LLM_MODEL_OVERRIDE=Qwen/Qwen3-30B-A3B-Instruct-2507
107+
# V2_LLM_MODEL_OVERRIDE=mistralai/Ministral-3-8B-Instruct-2512
108+
# V2_LLM_MODEL_OVERRIDE=
109+
78110
# Scout-mode upgrade for the `context_summary` LLM stage. When `true`,
79111
# the summary agent gets the broad RAG-search toolkit (orcid, ror,
80112
# infoscience, openalex, zenodo, ethz_research_collection, huggingface,
@@ -121,6 +153,12 @@ OPENROUTER_API_KEY=your-openrouter-key-here
121153
# `logs/v2_queries/` (relative to the server's working directory).
122154
# V2_QUERY_LOG_DIR=logs/v2_queries
123155

156+
# Log level for the v2 skill subprocesses (selenium_fetch and the other
157+
# search_* skills under src/v2/skills/). Independent from LOG_LEVEL so the
158+
# noisy skill output can stay quiet while the main FastAPI app runs at INFO.
159+
# Default: WARNING.
160+
# V2_SKILL_LOG_LEVEL=WARNING
161+
124162
# ---------------------------------------------------------------------------
125163
# src/index/* (Infoscience + OpenAlex RAG indexing)
126164
# ---------------------------------------------------------------------------
@@ -134,6 +172,10 @@ OPENROUTER_API_KEY=your-openrouter-key-here
134172
# `config/index/infoscience.yaml`. RCP and Infoscience auth tokens are
135173
# read from `RCP_TOKEN` / `INFOSCIENCE_TOKEN` above.
136174

175+
# Set to `false` to disable the V2 Infoscience RAG agent tool. Default is
176+
# on; the tool degrades gracefully when the Qdrant collection is missing.
177+
# V2_INFOSCIENCE_RAG_ENABLED=true
178+
137179
# ---------------------------------------------------------------------------
138180
# src/index/openalex — OpenAlex ingestion + RAG over EPFL/Switzerland
139181
# ---------------------------------------------------------------------------
@@ -157,6 +199,10 @@ OPENROUTER_API_KEY=your-openrouter-key-here
157199
# INDEX_OPENALEX_SCOPE_ROR=https://ror.org/02s376052
158200
# INDEX_OPENALEX_SCOPE_COUNTRY=ch
159201

202+
# Set to `false` to disable the V2 OpenAlex RAG agent tool. Default is on;
203+
# the tool degrades gracefully when the Qdrant collection is missing.
204+
# V2_OPENALEX_RAG_ENABLED=true
205+
160206
# ---------------------------------------------------------------------------
161207
# src/index/huggingface — HuggingFace ingestion + RAG over EPFL/Switzerland
162208
# ---------------------------------------------------------------------------
@@ -173,6 +219,10 @@ OPENROUTER_API_KEY=your-openrouter-key-here
173219
# Optional: override the active scope at runtime (epfl | switzerland).
174220
# INDEX_HUGGINGFACE_SCOPE=epfl
175221

222+
# Set to `false` to disable the V2 HuggingFace RAG agent tool. Default is on;
223+
# the tool degrades gracefully when the Qdrant collection is missing.
224+
# V2_HUGGINGFACE_RAG_ENABLED=true
225+
176226
# ---------------------------------------------------------------------------
177227
# src/index/zenodo — Zenodo ingestion + RAG over EPFL/Switzerland
178228
# ---------------------------------------------------------------------------
@@ -189,6 +239,10 @@ OPENROUTER_API_KEY=your-openrouter-key-here
189239
# Optional: override the active scope at runtime (epfl | switzerland).
190240
# INDEX_ZENODO_SCOPE=epfl
191241

242+
# Set to `false` to disable the V2 Zenodo RAG agent tool. Default is on;
243+
# the tool degrades gracefully when the Qdrant collection is missing.
244+
# V2_ZENODO_RAG_ENABLED=true
245+
192246

193247
# ---------------------------------------------------------------------------
194248
# SWISSUbase index module (`src/index/swissubase/`)
@@ -335,5 +389,38 @@ EPFL_GRAPH_PASSWORD=
335389
# V2_CONCEPT_TAGGING_OPENALEX_RELATED_ENABLED=false
336390

337391

392+
# ---------------------------------------------------------------------------
393+
# Pure-API RAG tools (no local index — direct upstream API calls per query)
394+
# ---------------------------------------------------------------------------
395+
# These tools wrap public REST APIs as LLM agent tools. They don't ingest
396+
# anything locally; each agent invocation issues a fresh HTTP call. All
397+
# default to on; flip to `false` to remove the tool from the agent's
398+
# toolkit (useful for offline testing or when upstream is flaky).
399+
400+
# ORCID public-API search (people lookup by name / affiliation / ORCID iD).
401+
# Requires ORCID_CLIENT_ID / ORCID_CLIENT_SECRET below for authenticated
402+
# access; without them the tool runs against the public anonymous bucket.
403+
# V2_ORCID_RAG_ENABLED=true
404+
405+
# ROR (Research Organization Registry) search — organization disambiguation
406+
# by name / acronym / country. Credential-free public API.
407+
# V2_ROR_RAG_ENABLED=true
408+
409+
# SNSF (Swiss National Science Foundation) grant + person search via the
410+
# public Data Portal API. Credential-free.
411+
# V2_SNSF_RAG_ENABLED=true
412+
413+
# ETHZ Research Collection (DSpace) tool. Uses ETHZ_RESEARCH_COLLECTION_TOKEN
414+
# when set; otherwise falls back to public endpoints.
415+
# V2_ETHZ_RESEARCH_COLLECTION_RAG_ENABLED=true
416+
417+
# Federated RAG router — single agent-facing tool that fan-outs to every
418+
# enabled RAG provider above (Infoscience + OpenAlex + Zenodo + HuggingFace +
419+
# OAM + ORCID + ROR + SNSF + ETHZ-RC + GitHub + RenkuLab + SwissUbase) and
420+
# merges hits. Off by default for individual-tool clarity; turn on for
421+
# scout-mode brainstorm runs.
422+
# V2_FEDERATED_RAG_ENABLED=true
423+
424+
338425
ORCID_CLIENT_ID=
339426
ORCID_CLIENT_SECRET=

src/v1/llm/model_config.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -508,8 +508,16 @@ def load_model_config(analysis_type: str) -> List[Dict[str, Any]]:
508508
logger.error(f"Invalid JSON in {env_var}: {e}")
509509
logger.info(f"Falling back to default configuration for {analysis_type}")
510510

511-
# Return default configuration
512-
return MODEL_CONFIGS.get(analysis_type, [])
511+
configs = list(MODEL_CONFIGS.get(analysis_type, []))
512+
513+
# Swap the `model` field across every profile when an override env var is
514+
# set. Useful when the default model on a shared inference endpoint is
515+
# degraded and a temporary fallback is needed without editing this file.
516+
override = os.getenv("V2_LLM_MODEL_OVERRIDE")
517+
if override and configs:
518+
configs = [{**c, "model": override} for c in configs]
519+
520+
return configs
513521

514522

515523
def create_pydantic_ai_model(config: Dict[str, Any]):

0 commit comments

Comments
 (0)