Skip to content

Commit 9cd6d4f

Browse files
refactor(config): refactor config schema for better readability and env-only LLM credentials
- Adopt canonical config layout: - run.turns.{mode,exact,min,max,distribution,mean} - run.data.seeding.{question,topics.*} - run.distributed.{enabled,backend,spawn,ray.*} - tools.web_search.{enabled,serper_num_results,serper_timeout} - tools.retrieval.{top_k,chunking,index,embeddings,reranker} - Enforce env-only agent credentials and forbid YAML API-key fields: - llm.api_key, llm.api_key_env - llm.agents.{user,assistant,judge}.api_key/api_key_env - legacy stage-role equivalents with api key fields - Resolve agent credentials only through mapping vars: - LLM_USER_API_KEY_ENV - LLM_ASSISTANT_API_KEY_ENV - LLM_JUDGE_API_KEY_ENV and hard-fail when mapping var or mapped secret var is missing/empty. - Rename mapping vars from QA/KB/JUDGE to USER/ASSISTANT/JUDGE. - Keep legacy env mapping aliases and legacy config paths with deprecation warnings: - top-level retrieval -> tools.retrieval - flat tools.web_search_enabled/serper_* -> tools.web_search.* - legacy non-LLM keys (turns, seeding/topics, distributed/ray, retrieval/models) - Update runtime consumers (loader, runner, retrieval, distributed bootstrap/provisioning) to canonical paths. - Refresh config/docs/examples (.env.example, README, config reference, architecture, distributed and retrieval docs). - Add/refresh tests: - config llm validation - schema normalization/migration (including tools.retrieval and tools.web_search) - env loader behavior - llm credential resolution - distributed provisioning alignment BREAKING CHANGE: YAML API-key fields are no longer accepted. Agent credentials must come from environment mapping vars and provider secret env vars.
1 parent 377457b commit 9cd6d4f

21 files changed

+1893
-461
lines changed

.env.example

Lines changed: 8 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,14 @@
1-
# --- Secrets ---
2-
# Required for OpenAI-compatible chat completions
3-
OPENAI_API_KEY="..."
1+
# Keep this file strictly for provider secrets and per-agent key-mapping vars.
2+
# Provider secret names are flexible (common patterns: *_API_KEY or *_TOKEN).
43

5-
# Optional: only needed if web_search is enabled and invoked
4+
OPENAI_API_KEY="..."
65
SERPER_API_KEY="..."
7-
8-
# Other model providers (DeepSeek, Gemini, etc.)
96
DEEPSEEK_API_KEY="..."
107
GEMINI_API_KEY="..."
11-
# ANTHROPIC_API_KEY="..."
12-
13-
# Optional by default. Required for `dlgforge push` / auto-push to Hugging Face Hub
14-
# and for gated/private Hugging Face models.
8+
ANTHROPIC_API_KEY="..."
159
HF_TOKEN="..."
1610

17-
# --- Configuration ---
18-
19-
JUDGE_GRANULARITY="turn" # turn or conversation
20-
21-
# Recommended fallback model used when llm.model is empty
22-
# OPENAI_MODEL="gpt-5.2"
23-
24-
# Base URL for OpenAI endpoint. Defaults to "https://api.openai.com/v1" if not set.
25-
# Remove/comment OPENAI_BASE_URL from .env for mixed-provider runs.
26-
# OPENAI_BASE_URL="https://api.openai.com/v1"
27-
28-
# OpenAI-compatible custom endpoint (vLLM, gateway, etc.), e.g. if you have a local vLLM instance running, you can set it to
29-
# OPENAI_BASE_URL="http://localhost:8000/v1"
30-
31-
# OpenAI-compatible custom endpoint (LM Studio), you can set it to
32-
# OPENAI_BASE_URL="http://localhost:1234/v1"
33-
34-
# Optional global override used by all agents
35-
# LLM_MODEL="gpt-5.2"
36-
37-
# Optional per-agent overrides (preferred for mixed providers)
38-
LLM_QA_GENERATOR_MODEL="gpt-5.2"
39-
LLM_KB_RESPONDER_MODEL="gemini-3-flash-preview"
40-
LLM_QA_JUDGE_MODEL="gemini-3-flash-preview"
41-
42-
LLM_QA_GENERATOR_API_KEY_ENV=OPENAI_API_KEY
43-
LLM_KB_RESPONDER_API_KEY_ENV=GEMINI_API_KEY
44-
LLM_QA_JUDGE_API_KEY_ENV=GEMINI_API_KEY
45-
46-
# Optional retrieval backend overrides
47-
# KB_EMBEDDING_BACKEND=sentence_transformers
48-
# KB_EMBEDDING_MODEL_KWARGS_JSON={}
49-
# KB_EMBEDDING_TOKENIZER_KWARGS_JSON={}
50-
# KB_EMBEDDING_ENCODE_KWARGS_JSON={}
51-
52-
# Optional run language list (comma-separated)
53-
# TARGET_LANGUAGES=morocco,fr,en
54-
55-
# Optional seed topics variant override (e.g. msa, morocco, english)
56-
# SEED_TOPICS_VARIANT=
57-
58-
# HUGGINGFACE_HUB_TOKEN=
59-
# HF_PUSH_GENERATE_STATS=true
60-
# HF_PUSH_STATS_FILE=dataset_stats.json
61-
# HF_PUSH_GENERATE_PLOTS=true
62-
# HF_PUSH_PLOTS_DIR=plots
11+
# Required API-key mapping variables (agent credentials are environment-only).
12+
LLM_USER_API_KEY_ENV=OPENAI_API_KEY
13+
LLM_ASSISTANT_API_KEY_ENV=GEMINI_API_KEY
14+
LLM_JUDGE_API_KEY_ENV=GEMINI_API_KEY

README.md

Lines changed: 90 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ The pipeline runs up to three logical stages per turn:
5757
3. `qa_judge`: evaluates quality/grounding (configurable granularity).
5858

5959
It supports:
60-
- fixed turns (`run.n_turns`) or sampled turns per conversation (`min_turns/max_turns` + distribution)
60+
- fixed turns (`run.turns.mode: exact` + `run.turns.exact`) or sampled turns per conversation (`run.turns.mode: range` + `run.turns.min/max/distribution`)
6161
- batched concurrent generation (`run.batch_size`)
6262
- language loops (`run.target_languages`) with `total_samples` generated per language
6363
- deterministic exact-normalized question dedup across the full run
@@ -70,7 +70,7 @@ source .venv/bin/activate
7070
uv pip install -e .
7171
```
7272

73-
If you want managed local/cluster vLLM autostart mode (`llm.backend: vllm_managed`, Linux GPU nodes):
73+
If you want managed local/cluster vLLM autostart mode (`llm.mode: vllm_managed`, Linux GPU nodes):
7474
```bash
7575
python -m pip install -e ".[vllm]"
7676
```
@@ -81,12 +81,18 @@ cp .env.example .env
8181
```
8282

8383
Minimum required:
84-
- model source from one of:
85-
- `llm.agents.<agent>.model`
86-
- `llm.model`
87-
- `LLM_MODEL`
88-
- `OPENAI_MODEL`
89-
- provider credential env vars for whichever models you use (for example `OPENAI_API_KEY`, `GEMINI_API_KEY`, `ANTHROPIC_API_KEY`)
84+
- model per active role:
85+
- `llm.agents.user.model`
86+
- `llm.agents.assistant.model`
87+
- `llm.agents.judge.model` (when `judge.mode: online`)
88+
- provider credential env vars (for example `OPENAI_API_KEY`, `GEMINI_API_KEY`, `ANTHROPIC_API_KEY`)
89+
- provider secret names are flexible; common patterns like `*_API_KEY` and `*_TOKEN` are supported
90+
- required agent credential mapping vars:
91+
- `LLM_USER_API_KEY_ENV`
92+
- `LLM_ASSISTANT_API_KEY_ENV`
93+
- `LLM_JUDGE_API_KEY_ENV`
94+
- legacy aliases (`LLM_QA_GENERATOR_API_KEY_ENV`, `LLM_KB_RESPONDER_API_KEY_ENV`, `LLM_QA_JUDGE_API_KEY_ENV`) are still accepted with deprecation warnings
95+
- do not place `api_key` or `api_key_env` in YAML; credentials are environment-only
9096
- `HF_TOKEN` is optional for generation with the default embedding model (`sentence-transformers/all-MiniLM-L6-v2`); it is needed for gated/private HF models and `dlgforge push`
9197

9298
### 2.3 Prepare knowledge directory
@@ -97,7 +103,7 @@ mkdir -p knowledge
97103
# add your source docs under knowledge/, for example: knowledge/product_faq.md
98104
```
99105

100-
By default, retrieval embeddings/chunks are cached under `knowledge_index/` (`retrieval.persist_dir` in `config.yaml`).
106+
By default, retrieval embeddings/chunks are cached under `knowledge_index/` (`tools.retrieval.index.persist_dir` in `config.yaml`).
101107

102108
### 2.4 Run generation
103109
```bash
@@ -118,9 +124,9 @@ dlgforge run config.yaml
118124
- `logs/run.log`, `logs/llm.log`, `logs/judge.log`
119125

120126
### 2.6 Recommended mode by setup
121-
- macOS laptop + LM Studio: `run.distributed.enabled: false` + `llm.backend: openai` + `llm.base_url: http://127.0.0.1:1234/v1`
122-
- macOS laptop + distributed orchestrator: `run.distributed.enabled: true` + `llm.backend: vllm_attach` + Postgres DSN
123-
- Linux GPU nodes and self-managed cluster: `run.distributed.enabled: true` + `llm.backend: vllm_managed`
127+
- macOS laptop + LM Studio: `run.distributed.enabled: false` + `llm.mode: api` + `llm.agents.*.base_url: http://127.0.0.1:1234/v1`
128+
- macOS laptop + distributed orchestrator: `run.distributed.enabled: true` + `llm.mode: vllm_attach` + Postgres DSN
129+
- Linux GPU nodes and self-managed cluster: `run.distributed.enabled: true` + `llm.mode: vllm_managed`
124130

125131
### 2.7 Run modes (copy-paste)
126132
#### A) LM Studio local, non-distributed (recommended on macOS)
@@ -129,18 +135,19 @@ run:
129135
distributed:
130136
enabled: false
131137
llm:
132-
backend: openai
133-
base_url: http://127.0.0.1:1234/v1
134-
api_key: EMPTY
138+
mode: api
135139
agents:
136-
qa_generator:
140+
user:
137141
model: openai/gpt-oss-20b
138-
kb_responder:
142+
base_url: http://127.0.0.1:1234/v1
143+
assistant:
139144
model: openai/gpt-oss-20b
140-
qa_judge:
145+
base_url: http://127.0.0.1:1234/v1
146+
judge:
141147
model: openai/gpt-oss-20b
148+
base_url: http://127.0.0.1:1234/v1
142149
```
143-
For non-OpenAI `llm.base_url` values (for example LM Studio / vLLM), dlgforge auto-uses LiteLLM openai-compatible passthrough so namespaced model IDs like `openai/gpt-oss-20b` are forwarded unchanged.
150+
Make sure to specify the correct base URLs for each agent via `llm.agents.<role>.base_url` (refer to https://docs.litellm.ai/docs/providers for base_url and api_key formats for different providers). dlgforge auto-uses LiteLLM openai-compatible passthrough so namespaced model IDs like `openai/gpt-oss-20b` are forwarded unchanged.
144151
```bash
145152
dlgforge run config.yaml
146153
```
@@ -151,7 +158,7 @@ run:
151158
distributed:
152159
enabled: false
153160
llm:
154-
backend: openai
161+
mode: api
155162
```
156163
```bash
157164
dlgforge run config.yaml
@@ -166,7 +173,7 @@ ray:
166173
address: auto
167174
auto_start_local: true
168175
llm:
169-
backend: openai
176+
mode: api
170177
```
171178
```bash
172179
export DLGFORGE_POSTGRES_DSN='postgresql://USER:PASS@HOST:5432/DB'
@@ -181,7 +188,7 @@ run:
181188
distributed:
182189
enabled: true
183190
llm:
184-
backend: vllm_attach
191+
mode: vllm_attach
185192
routing:
186193
endpoints:
187194
- name: gpu-node-1
@@ -199,7 +206,7 @@ run:
199206
distributed:
200207
enabled: true
201208
llm:
202-
backend: vllm_managed
209+
mode: vllm_managed
203210
vllm:
204211
model: Qwen/Qwen2.5-7B-Instruct
205212
served_model_name: qwen
@@ -235,10 +242,9 @@ Recommended per-agent setup:
235242
- vLLM (OpenAI-compatible endpoint):
236243
- `provider: openai` (or `hosted_vllm`)
237244
- `base_url: http://localhost:8000/v1`
238-
- `api_key: EMPTY` (if your endpoint does not require auth)
239245

240-
Be careful with global env/config fallbacks:
241-
- `OPENAI_BASE_URL` can affect agents that do not set an agent-level `base_url`.
246+
Be careful with env/config precedence:
247+
- credentials are env-only via `LLM_USER_API_KEY_ENV`, `LLM_ASSISTANT_API_KEY_ENV`, `LLM_JUDGE_API_KEY_ENV`.
242248
- `LLM_<ROLE>_*` env vars override `config.yaml` values for that specific role.
243249
- for mixed providers, prefer setting provider/model/base_url explicitly per agent in `llm.agents.<role>`.
244250

@@ -292,42 +298,60 @@ Important:
292298
## 4) Configuration guide
293299
### 4.1 `run`
294300
Core run controls:
295-
- `run.n_turns`: fixed turns when no range sampling is used.
296301
- `run.batch_size`: number of conversations advanced concurrently.
297302
- `run.total_samples`: number of conversations to persist per language.
298303
- `run.target_languages`: list of languages.
299304
- `run.run_id`: optional explicit run id.
300305
- `run.resume_run_id`: resume checkpoint.
301306

302-
Turn count sampling:
303-
- `run.min_turns`
304-
- `run.max_turns`
305-
- `run.turn_count_distribution`: `uniform`, `poisson`, or `exponential`
306-
- `run.turn_count_mean`: mean for `poisson`/`exponential`
307+
Turn count:
308+
- `run.turns.mode`: `exact` or `range`
309+
- `run.turns.exact`: used when `mode: exact`
310+
- `run.turns.min`, `run.turns.max`: used when `mode: range`
311+
- `run.turns.distribution`: `uniform`, `poisson`, or `exponential`
312+
- `run.turns.mean`: mean for `poisson`/`exponential`
313+
314+
Data shaping:
315+
- `run.data.seeding.question`
316+
- `run.data.seeding.topics.path`
317+
- `run.data.seeding.topics.enabled`
318+
- `run.data.seeding.topics.variant`
319+
- `run.data.seeding.topics.probability`
307320

308321
Behavior:
309-
- sampled turns are clamped to `[min_turns, max_turns]`
322+
- sampled turns are clamped to `[run.turns.min, run.turns.max]`
310323
- each conversation samples independently
311324

312325
### 4.2 `llm`
313326
LiteLLM-routed settings:
314-
- `llm.provider`
315-
- `llm.base_url`
316-
- `llm.api_key` / env
317-
- per-agent overrides under `llm.agents`
318-
- `llm.backend`: `openai`, `vllm_attach`, or `vllm_managed`
327+
- `llm.mode`: `api`, `vllm_attach`, or `vllm_managed`
328+
- per-agent settings under `llm.agents.<role>`:
329+
- `provider`, `model`, `base_url`
330+
- optional sampling params (`temperature`, `max_tokens`, `top_p`, `timeout`, `max_retries`, `extra`)
331+
- credentials are environment-only via:
332+
- `LLM_USER_API_KEY_ENV`
333+
- `LLM_ASSISTANT_API_KEY_ENV`
334+
- `LLM_JUDGE_API_KEY_ENV`
319335
- `llm.routing.*`: multi-endpoint routing (used by attach/managed vLLM modes)
320-
- `llm.vllm.*`: managed vLLM replica settings when `llm.backend: vllm_managed`
336+
- `llm.vllm.*`: managed vLLM replica settings when `llm.mode: vllm_managed`
321337

322338
Agents:
323-
- `qa_generator`
324-
- `kb_responder`
325-
- `qa_judge`
326-
327-
### 4.3 `retrieval`
328-
KB search defaults:
329-
- `retrieval.default_k`
330-
- chunking and index options
339+
- `user`
340+
- `assistant`
341+
- `judge`
342+
343+
### 4.3 `tools`
344+
Tool settings:
345+
- `tools.web_search.enabled`
346+
- `tools.web_search.serper_num_results`
347+
- `tools.web_search.serper_timeout`
348+
- `tools.retrieval.top_k`
349+
- `tools.retrieval.chunking.chunk_size`
350+
- `tools.retrieval.chunking.chunk_overlap`
351+
- `tools.retrieval.index.persist_dir`
352+
- `tools.retrieval.index.rebuild`
353+
- `tools.retrieval.embeddings.*`
354+
- `tools.retrieval.reranker.*`
331355

332356
### 4.4 `coverage`
333357
Dedup and coverage behavior:
@@ -367,23 +391,22 @@ Enable one-command distributed launch from the same CLI entrypoint:
367391
run:
368392
distributed:
369393
enabled: true
370-
executor: ray
394+
backend: ray
371395
spawn:
372396
coordinator: true
373397
workers: true
374-
375-
ray:
376-
address: "auto"
377-
auto_start_local: true
378-
namespace: "dlgforge"
398+
ray:
399+
address: "auto"
400+
auto_start_local: true
401+
namespace: "dlgforge"
379402
380403
store:
381404
backend: postgres
382405
postgres:
383406
dsn: "${DLGFORGE_POSTGRES_DSN}"
384407
385408
llm:
386-
backend: vllm_attach # openai | vllm_attach | vllm_managed
409+
mode: vllm_attach # api | vllm_attach | vllm_managed
387410
routing:
388411
strategy: weighted_least_inflight
389412
endpoints:
@@ -397,10 +420,10 @@ llm:
397420

398421
Behavior:
399422
- `dlgforge run config.yaml` bootstraps coordinator + workers automatically when `run.distributed.enabled: true`
400-
- Ray init tries `ray.address` first; when `ray.address: auto` has no running cluster and `ray.auto_start_local: true`, it falls back to a local Ray runtime
401-
- `llm.backend: openai` uses hosted API (no vLLM provisioning)
402-
- `llm.backend: vllm_attach` validates configured vLLM endpoints before run
403-
- `llm.backend: vllm_managed` starts/stops vLLM servers on Ray GPU actors
423+
- Ray init tries `run.distributed.ray.address` first; when `run.distributed.ray.address: auto` has no running cluster and `run.distributed.ray.auto_start_local: true`, it falls back to a local Ray runtime
424+
- `llm.mode: api` uses hosted API (no vLLM provisioning)
425+
- `llm.mode: vllm_attach` validates configured vLLM endpoints before run
426+
- `llm.mode: vllm_managed` starts/stops vLLM servers on Ray GPU actors
404427
- current execution path runs generation from the coordinator actor while worker replicas are provisioned for lifecycle orchestration hooks
405428

406429
Bootstrap sequence:
@@ -409,8 +432,8 @@ flowchart TD
409432
A["dlgforge run config.yaml"] --> B["RunBootstrap"]
410433
B --> C["Initialize Ray"]
411434
C --> D["Validate Postgres DSN and ping"]
412-
D --> E{"llm.backend"}
413-
E -->|openai| F["No vLLM provisioning"]
435+
D --> E{"llm.mode"}
436+
E -->|api| F["No vLLM provisioning"]
414437
E -->|vllm_attach| G["Validate configured /v1/models endpoints"]
415438
E -->|vllm_managed| H["Spawn vLLM server actors and wait healthy"]
416439
F --> I["Spawn coordinator actor"]
@@ -437,10 +460,10 @@ Mode matrix:
437460
```mermaid
438461
flowchart TD
439462
S{"run.distributed.enabled"}
440-
S -->|false| L["Local mode: openai only, no Ray/Postgres requirement"]
463+
S -->|false| L["Local mode: api, no Ray/Postgres requirement"]
441464
S -->|true| D["Distributed mode: Ray plus Postgres required"]
442-
D --> B{"llm.backend"}
443-
B -->|openai| BO["Hosted OpenAI path"]
465+
D --> B{"llm.mode"}
466+
B -->|api| BO["Hosted API path"]
444467
B -->|vllm_attach| BA["Use user-provided vLLM endpoints"]
445468
B -->|vllm_managed| BM["Auto-start vLLM on Ray GPU workers (vllm extra required)"]
446469
```
@@ -611,7 +634,7 @@ dlgforge seeds-migrate config.yaml --source-file seed_topics.json --dest-file da
611634
Check:
612635
- `judge.enabled: true`
613636
- `judge.mode: online`
614-
- `llm.agents.qa_judge.model` resolves
637+
- `llm.agents.judge.model` resolves
615638
- API key present
616639

617640
### 11.2 No judged output in conversation mode
@@ -640,18 +663,18 @@ Check:
640663

641664
### 11.6 Embedding/index mismatch
642665
If retrieval errors appear after model/backend changes:
643-
- set `retrieval.rebuild_index: true` for one run, then back to `false`
666+
- set `tools.retrieval.index.rebuild: true` for one run, then back to `false`
644667
- or remove `knowledge_index/` and regenerate
645668

646669
### 11.7 Missing `knowledge/` or empty knowledge base
647670
If preflight fails with `Missing knowledge directory` or `No supported knowledge files found`:
648671
- create `knowledge/` at the repository root
649672
- add at least one `.txt`, `.md`, or `.pdf` file under `knowledge/`
650-
- rerun generation (optionally set `retrieval.rebuild_index: true` for one run after major document changes)
673+
- rerun generation (optionally set `tools.retrieval.index.rebuild: true` for one run after major document changes)
651674

652675
### 11.8 Hugging Face auth error while loading embeddings
653676
If you see model download/auth errors:
654-
- keep the default `models.embedding_model: sentence-transformers/all-MiniLM-L6-v2` (no HF token required in standard setups)
677+
- keep the default `tools.retrieval.embeddings.model: sentence-transformers/all-MiniLM-L6-v2` (no HF token required in standard setups)
655678
- if you switch to a gated/private HF embedding model, set `HF_TOKEN` (or `HUGGINGFACE_HUB_TOKEN`) in `.env`
656679

657680
---

0 commit comments

Comments
 (0)