You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- provider credential env vars (for example `OPENAI_API_KEY`, `GEMINI_API_KEY`, `ANTHROPIC_API_KEY`)
89
+
- provider secret names are flexible; common patterns like `*_API_KEY` and `*_TOKEN` are supported
90
+
- required agent credential mapping vars:
91
+
-`LLM_USER_API_KEY_ENV`
92
+
-`LLM_ASSISTANT_API_KEY_ENV`
93
+
-`LLM_JUDGE_API_KEY_ENV`
94
+
- legacy aliases (`LLM_QA_GENERATOR_API_KEY_ENV`, `LLM_KB_RESPONDER_API_KEY_ENV`, `LLM_QA_JUDGE_API_KEY_ENV`) are still accepted with deprecation warnings
95
+
- do not place `api_key` or `api_key_env` in YAML; credentials are environment-only
90
96
-`HF_TOKEN` is optional for generation with the default embedding model (`sentence-transformers/all-MiniLM-L6-v2`); it is needed for gated/private HF models and `dlgforge push`
91
97
92
98
### 2.3 Prepare knowledge directory
@@ -97,7 +103,7 @@ mkdir -p knowledge
97
103
# add your source docs under knowledge/, for example: knowledge/product_faq.md
98
104
```
99
105
100
-
By default, retrieval embeddings/chunks are cached under `knowledge_index/` (`retrieval.persist_dir` in `config.yaml`).
106
+
By default, retrieval embeddings/chunks are cached under `knowledge_index/` (`tools.retrieval.index.persist_dir` in `config.yaml`).
- Linux GPU nodes and self-managed cluster: `run.distributed.enabled: true` + `llm.mode: vllm_managed`
124
130
125
131
### 2.7 Run modes (copy-paste)
126
132
#### A) LM Studio local, non-distributed (recommended on macOS)
@@ -129,18 +135,19 @@ run:
129
135
distributed:
130
136
enabled: false
131
137
llm:
132
-
backend: openai
133
-
base_url: http://127.0.0.1:1234/v1
134
-
api_key: EMPTY
138
+
mode: api
135
139
agents:
136
-
qa_generator:
140
+
user:
137
141
model: openai/gpt-oss-20b
138
-
kb_responder:
142
+
base_url: http://127.0.0.1:1234/v1
143
+
assistant:
139
144
model: openai/gpt-oss-20b
140
-
qa_judge:
145
+
base_url: http://127.0.0.1:1234/v1
146
+
judge:
141
147
model: openai/gpt-oss-20b
148
+
base_url: http://127.0.0.1:1234/v1
142
149
```
143
-
For non-OpenAI `llm.base_url` values (for example LM Studio / vLLM), dlgforge auto-uses LiteLLM openai-compatible passthrough so namespaced model IDs like `openai/gpt-oss-20b` are forwarded unchanged.
150
+
Make sure to specify the correct base URLs for each agent via `llm.agents.<role>.base_url` (refer to https://docs.litellm.ai/docs/providers for base_url and api_key formats for different providers). dlgforge auto-uses LiteLLM openai-compatible passthrough so namespaced model IDs like `openai/gpt-oss-20b` are forwarded unchanged.
mode: vllm_attach # api | vllm_attach | vllm_managed
387
410
routing:
388
411
strategy: weighted_least_inflight
389
412
endpoints:
@@ -397,10 +420,10 @@ llm:
397
420
398
421
Behavior:
399
422
- `dlgforge run config.yaml` bootstraps coordinator + workers automatically when `run.distributed.enabled: true`
400
-
- Ray init tries `ray.address` first; when `ray.address: auto` has no running cluster and `ray.auto_start_local: true`, it falls back to a local Ray runtime
401
-
- `llm.backend: openai` uses hosted API (no vLLM provisioning)
402
-
- `llm.backend: vllm_attach` validates configured vLLM endpoints before run
403
-
- `llm.backend: vllm_managed` starts/stops vLLM servers on Ray GPU actors
423
+
- Ray init tries `run.distributed.ray.address` first; when `run.distributed.ray.address: auto` has no running cluster and `run.distributed.ray.auto_start_local: true`, it falls back to a local Ray runtime
424
+
- `llm.mode: api` uses hosted API (no vLLM provisioning)
425
+
- `llm.mode: vllm_attach` validates configured vLLM endpoints before run
426
+
- `llm.mode: vllm_managed` starts/stops vLLM servers on Ray GPU actors
404
427
- current execution path runs generation from the coordinator actor while worker replicas are provisioned for lifecycle orchestration hooks
405
428
406
429
Bootstrap sequence:
@@ -409,8 +432,8 @@ flowchart TD
409
432
A["dlgforge run config.yaml"] --> B["RunBootstrap"]
410
433
B --> C["Initialize Ray"]
411
434
C --> D["Validate Postgres DSN and ping"]
412
-
D --> E{"llm.backend"}
413
-
E -->|openai| F["No vLLM provisioning"]
435
+
D --> E{"llm.mode"}
436
+
E -->|api| F["No vLLM provisioning"]
414
437
E -->|vllm_attach| G["Validate configured /v1/models endpoints"]
415
438
E -->|vllm_managed| H["Spawn vLLM server actors and wait healthy"]
416
439
F --> I["Spawn coordinator actor"]
@@ -437,10 +460,10 @@ Mode matrix:
437
460
```mermaid
438
461
flowchart TD
439
462
S{"run.distributed.enabled"}
440
-
S -->|false| L["Local mode: openai only, no Ray/Postgres requirement"]
463
+
S -->|false| L["Local mode: api, no Ray/Postgres requirement"]
441
464
S -->|true| D["Distributed mode: Ray plus Postgres required"]
442
-
D --> B{"llm.backend"}
443
-
B -->|openai| BO["Hosted OpenAI path"]
465
+
D --> B{"llm.mode"}
466
+
B -->|api| BO["Hosted API path"]
444
467
B -->|vllm_attach| BA["Use user-provided vLLM endpoints"]
445
468
B -->|vllm_managed| BM["Auto-start vLLM on Ray GPU workers (vllm extra required)"]
0 commit comments