Lightweight synthetic multi-turn dialogue generation with a LiteLLM-routed LLM stack.
dlgforge generates grounded user-assistant conversations with:
- async batched generation
- deterministic dedup of generated user questions
- optional online judging during generation
- resumable run state
- export-ready JSONL artifacts
- optional one-command distributed bootstrap (Ray + Postgres + vLLM backends)
- 1) What this project does
- 2) Quick start (5 minutes)
- 3) How generation works
- 4) Configuration guide
- 5) Judge modes and budget control
- 6) Async batch + dedup semantics
- 7) Persona sampling behavior
- 8) Outputs and inspection
- 9) Resume and run state
- 10) CLI commands
- 11) Troubleshooting playbook
The pipeline runs up to three logical stages per turn:
qa_generator: produces the next user message.kb_responder: answers using KB retrieval (and optional web search tool).qa_judge: evaluates quality/grounding (configurable granularity).
It supports:
- fixed turns (
run.turns.mode: exact+run.turns.exact) or sampled turns per conversation (run.turns.mode: range+run.turns.min/max/distribution) - batched concurrent generation (
run.batch_size) - language loops (
run.target_languages) withtotal_samplesgenerated per language - deterministic exact-normalized question dedup across the full run
uv venv
source .venv/bin/activate
uv pip install -e .If you want managed local/cluster vLLM autostart mode (llm.mode: vllm_managed, Linux GPU nodes):
python -m pip install -e ".[vllm]"cp .env.example .envMinimum required:
- model per active role:
llm.agents.user.modelllm.agents.assistant.modelllm.agents.judge.model(whenjudge.mode: online)
- provider credential env vars (for example
OPENAI_API_KEY,GEMINI_API_KEY,ANTHROPIC_API_KEY)- provider secret names are flexible; common patterns like
*_API_KEYand*_TOKENare supported
- provider secret names are flexible; common patterns like
- required agent credential mapping vars:
LLM_USER_API_KEY_ENVLLM_ASSISTANT_API_KEY_ENVLLM_JUDGE_API_KEY_ENV- legacy aliases (
LLM_QA_GENERATOR_API_KEY_ENV,LLM_KB_RESPONDER_API_KEY_ENV,LLM_QA_JUDGE_API_KEY_ENV) are still accepted with deprecation warnings
- do not place
api_keyorapi_key_envin YAML; credentials are environment-only HF_TOKENis optional for generation with the default embedding model (sentence-transformers/all-MiniLM-L6-v2); it is needed for gated/private HF models anddlgforge push
dlgforge run requires a knowledge/ folder at the project root with at least one supported file (.txt, .md, or .pdf).
mkdir -p knowledge
# add your source docs under knowledge/, for example: knowledge/product_faq.mdBy default, retrieval embeddings/chunks are cached under knowledge_index/ (tools.retrieval.index.persist_dir in config.yaml).
uv run env PYTHONPATH=src python -m dlgforge run config.yamlIf the package is installed in the environment (uv pip install -e .), these are equivalent:
uv run dlgforge run config.yaml
dlgforge run config.yamloutputs/conversations/*.jsonoutputs/conversations_sharegpt.jsonloutputs/turns.jsonloutputs/run_state/*.jsonlogs/run.log,logs/llm.log,logs/judge.log
- macOS laptop + LM Studio:
run.distributed.enabled: false+llm.mode: api+llm.agents.*.base_url: http://127.0.0.1:1234/v1 - macOS laptop + distributed orchestrator:
run.distributed.enabled: true+llm.mode: vllm_attach+ Postgres DSN - Linux GPU nodes and self-managed cluster:
run.distributed.enabled: true+llm.mode: vllm_managed
run:
distributed:
enabled: false
llm:
mode: api
agents:
user:
model: openai/gpt-oss-20b
base_url: http://127.0.0.1:1234/v1
assistant:
model: openai/gpt-oss-20b
base_url: http://127.0.0.1:1234/v1
judge:
model: openai/gpt-oss-20b
base_url: http://127.0.0.1:1234/v1Make sure to specify the correct base URLs for each agent via llm.agents.<role>.base_url (refer to https://docs.litellm.ai/docs/providers for base_url and api_key formats for different providers). dlgforge auto-uses LiteLLM openai-compatible passthrough so namespaced model IDs like openai/gpt-oss-20b are forwarded unchanged.
dlgforge run config.yamlrun:
distributed:
enabled: false
llm:
mode: apidlgforge run config.yamlrun:
distributed:
enabled: true
ray:
address: auto
auto_start_local: true
llm:
mode: apiexport DLGFORGE_POSTGRES_DSN='postgresql://USER:PASS@HOST:5432/DB'
dlgforge run config.yamlIf no Ray cluster is running, ray.auto_start_local: true lets dlgforge start a local Ray runtime automatically.
run:
distributed:
enabled: true
llm:
mode: vllm_attach
routing:
endpoints:
- name: gpu-node-1
base_url: http://10.0.0.11:8000/v1
api_key: EMPTYexport DLGFORGE_POSTGRES_DSN='postgresql://USER:PASS@HOST:5432/DB'
dlgforge run config.yamlrun:
distributed:
enabled: true
llm:
mode: vllm_managed
vllm:
model: Qwen/Qwen2.5-7B-Instruct
served_model_name: qwenpython -m pip install -e ".[vllm]"
export DLGFORGE_POSTGRES_DSN='postgresql://USER:PASS@HOST:5432/DB'
dlgforge run config.yamlNotes:
- managed mode is Linux-oriented (
vllmextra is Linux-only in this project). - on macOS, use LM Studio local mode (
A) or distributed attach mode (D). - if using managed mode, align your
llm.agents.*.modelvalues withllm.vllm.served_model_name.
When mixing providers (for example OpenAI + Gemini + LM Studio), define provider/model/base URL per agent and follow LiteLLM provider conventions:
- LiteLLM providers reference: https://docs.litellm.ai/docs/providers
Recommended per-agent setup:
- OpenAI API:
provider: openaimodel: gpt-5.2(or another OpenAI model)base_url: https://api.openai.com/v1
- Gemini (Google AI Studio):
provider: geminimodel: gemini/gemini-2.0-flash(or another Gemini model)- leave
base_urlempty unless you intentionally set a Gemini-specific proxy/base URL
- LM Studio (OpenAI-compatible local server):
provider: openai(orlm_studio)model: openai/gpt-oss-20b(or your served model name format)base_url: http://localhost:1234/v1
- vLLM (OpenAI-compatible endpoint):
provider: openai(orhosted_vllm)base_url: http://localhost:8000/v1
Be careful with env/config precedence:
- credentials are env-only via
LLM_USER_API_KEY_ENV,LLM_ASSISTANT_API_KEY_ENV,LLM_JUDGE_API_KEY_ENV. LLM_<ROLE>_*env vars overrideconfig.yamlvalues for that specific role.- for mixed providers, prefer setting provider/model/base_url explicitly per agent in
llm.agents.<role>.
Distributed modes (C, D, E) require Postgres.
Start a local Postgres with Docker:
docker run -d \
--name dlgforge-postgres \
-e POSTGRES_USER=dlgforge \
-e POSTGRES_PASSWORD=dlgforge \
-e POSTGRES_DB=dlgforge \
-p 5432:5432 \
postgres:16Set DSN:
export DLGFORGE_POSTGRES_DSN='postgresql://dlgforge:dlgforge@127.0.0.1:5432/dlgforge'Health check:
docker exec dlgforge-postgres pg_isready -U dlgforge -d dlgforgeReuse existing container:
docker start dlgforge-postgresStop when done:
docker stop dlgforge-postgresAt runtime:
- Load config + env overrides.
- Build base inputs and runtime settings.
- For each target language:
- run one or more waves until
total_samplesfor that language is reached - each wave runs
batch_sizeconversations (or remaining count)
- run one or more waves until
- Persist artifacts and optional HF auto-push.
Important:
total_samplesis per language, not global.- if
target_languageshas 5 values andtotal_samples=200, target is 1000 conversations total.
Core run controls:
run.batch_size: number of conversations advanced concurrently.run.total_samples: number of conversations to persist per language.run.target_languages: list of languages.run.run_id: optional explicit run id.run.resume_run_id: resume checkpoint.
Turn count:
run.turns.mode:exactorrangerun.turns.exact: used whenmode: exactrun.turns.min,run.turns.max: used whenmode: rangerun.turns.distribution:uniform,poisson, orexponentialrun.turns.mean: mean forpoisson/exponential
Data shaping:
run.data.seeding.questionrun.data.seeding.topics.pathrun.data.seeding.topics.enabledrun.data.seeding.topics.variantrun.data.seeding.topics.probability
Behavior:
- sampled turns are clamped to
[run.turns.min, run.turns.max] - each conversation samples independently
LiteLLM-routed settings:
llm.mode:api,vllm_attach, orvllm_managed- per-agent settings under
llm.agents.<role>:provider,model,base_url- optional sampling params (
temperature,max_tokens,top_p,timeout,max_retries,extra)
- credentials are environment-only via:
LLM_USER_API_KEY_ENVLLM_ASSISTANT_API_KEY_ENVLLM_JUDGE_API_KEY_ENV
llm.routing.*: multi-endpoint routing (used by attach/managed vLLM modes)llm.vllm.*: managed vLLM replica settings whenllm.mode: vllm_managed
Agents:
userassistantjudge
Tool settings:
tools.web_search.enabledtools.web_search.serper_num_resultstools.web_search.serper_timeouttools.retrieval.top_ktools.retrieval.chunking.chunk_sizetools.retrieval.chunking.chunk_overlaptools.retrieval.index.persist_dirtools.retrieval.index.rebuildtools.retrieval.embeddings.*tools.retrieval.reranker.*
Dedup and coverage behavior:
coverage.question_dedup_retries- coverage balancing parameters
Persona controls:
personas.enabledpersonas.path
Current recommended path:
personas:
enabled: true
path: src/dlgforge/prompts/personas.yamlIf path is missing/unreadable, built-in fallback personas are used.
Judge controls:
judge.enabledjudge.mode:onlineorofflinejudge.granularity:turnorconversationjudge.reasons: allowed labels
Output layout and export:
saving.output_dirsaving.output_columns.*(renamable JSONL columns)saving.hf_push.*
Enable one-command distributed launch from the same CLI entrypoint:
run:
distributed:
enabled: true
backend: ray
spawn:
coordinator: true
workers: true
ray:
address: "auto"
auto_start_local: true
namespace: "dlgforge"
store:
backend: postgres
postgres:
dsn: "${DLGFORGE_POSTGRES_DSN}"
llm:
mode: vllm_attach # api | vllm_attach | vllm_managed
routing:
strategy: weighted_least_inflight
endpoints:
- name: gpu-node-1
base_url: http://10.0.0.11:8000/v1
api_key: EMPTY
- name: gpu-node-2
base_url: http://10.0.0.12:8000/v1
api_key: EMPTYBehavior:
dlgforge run config.yamlbootstraps coordinator + workers automatically whenrun.distributed.enabled: true- Ray init tries
run.distributed.ray.addressfirst; whenrun.distributed.ray.address: autohas no running cluster andrun.distributed.ray.auto_start_local: true, it falls back to a local Ray runtime llm.mode: apiuses hosted API (no vLLM provisioning)llm.mode: vllm_attachvalidates configured vLLM endpoints before runllm.mode: vllm_managedstarts/stops vLLM servers on Ray GPU actors- current execution path runs generation from the coordinator actor while worker replicas are provisioned for lifecycle orchestration hooks
Bootstrap sequence:
flowchart TD
A["dlgforge run config.yaml"] --> B["RunBootstrap"]
B --> C["Initialize Ray"]
C --> D["Validate Postgres DSN and ping"]
D --> E{"llm.mode"}
E -->|api| F["No vLLM provisioning"]
E -->|vllm_attach| G["Validate configured /v1/models endpoints"]
E -->|vllm_managed| H["Spawn vLLM server actors and wait healthy"]
F --> I["Spawn coordinator actor"]
G --> I
H --> I
B --> J["Spawn worker actors"]
I --> K["Coordinator executes generation run"]
Current dispatch/execution model:
flowchart LR
U["User: dlgforge run"] --> B["RunBootstrap"]
B --> C["Coordinator actor"]
B --> W["Worker actors (provisioned)"]
C --> P["Existing generation loop (turn logic)"]
P --> R["Endpoint routing"]
R --> O["OpenAI API"]
R --> VA["Attached vLLM endpoints"]
R --> VM["Managed vLLM endpoints"]
Mode matrix:
flowchart TD
S{"run.distributed.enabled"}
S -->|false| L["Local mode: api, no Ray/Postgres requirement"]
S -->|true| D["Distributed mode: Ray plus Postgres required"]
D --> B{"llm.mode"}
B -->|api| BO["Hosted API path"]
B -->|vllm_attach| BA["Use user-provided vLLM endpoints"]
B -->|vllm_managed| BM["Auto-start vLLM on Ray GPU workers (vllm extra required)"]
Useful HF export options:
saving.hf_push.source_file: useconversations_sharegpt_judged.jsonlto include judge column.saving.hf_push.generate_stats: writes dataset stats JSON during export.saving.hf_push.stats_file: stats JSON filename (defaultdataset_stats.json).saving.hf_push.generate_plots: writes SVG distribution plots during export.saving.hf_push.plots_dir: plot output folder inside export dir.
Two orthogonal controls:
- when to run judge
judge.mode: online-> judge integrated indlgforge runjudge.mode: offline-> no judge during generation
- how often to run judge
judge.granularity: turn-> judge every turnjudge.granularity: conversation-> judge once per conversation
judge:
enabled: true
mode: online
granularity: conversation
reasons:
- irrelevant
- incorrect
- hallucinated
- weak_grounding
- vague
- incomplete
- unsafe
- otherturngranularity:- pros: fine-grained labels and diagnostics
- cons: most judge-token expensive
conversationgranularity:- pros: 1 judge call per conversation, cheaper
- cons: less localized feedback per turn
For judge.mode: online + granularity: turn:
- run
dlgforge run config.yaml - check
logs/judge.logfor[judge-online] ... - check conversation turns contain
qa_judge - check
outputs/conversations_sharegpt_judged.jsonlgrows
For judge.mode: online + granularity: conversation:
- run
dlgforge run config.yaml - check
logs/judge.logfor[judge-online-conversation] ... - check conversation payload has
conversation_judge - check judged export
judge.conversationis populated
- conversations advance independently in slots
- slot ordering is deterministic for acceptance/commit
Question dedup uses normalized exact match:
- lowercase
- collapsed whitespace
Applied across:
- duplicates within the same batch attempt
- duplicates already accepted in prior attempts/batches
- rejected duplicate slots are regenerated only for missing slots
- retries capped by
coverage.question_dedup_retries - on exhaustion: slot marked dropped (
drop_reason=dedup_exhausted)
Personas are sampled per conversation (not per run):
- each conversation gets one user persona and one assistant persona
- sampling is uniform-cycle based across available personas
- this avoids overusing a small subset when generating many samples
In batched mode:
- each slot has its own persona assignment
- assignment is persisted in run_state and reused on resume
Main files under outputs/:
synthetic_qa.jsonl: one record per conversationcoverage_ledger.jsonl: dedup/coverage memoryturns.jsonl: flattened per-turn rowsconversations_sharegpt.jsonl: ShareGPT exportconversations_sharegpt_judged.jsonl: judged ShareGPT exportconversations/<conversation_id>.json: rich conversation artifactrun_state/<run_id>.json: checkpoint state
Judge fields:
- per-turn mode:
turns[].qa_judge - conversation mode: top-level
conversation_judge - judged ShareGPT column (
judge) includes:per_turnavg_scoreconversation
Useful commands:
# runtime
tail -f logs/run.log
# judge logs
tail -f logs/judge.log
# latest conversation files
ls -lt outputs/conversations | head
# inspect conversation-level judge
rg "conversation_judge" outputs/conversations/*.json
# inspect turn-level judge fields
rg "judge_score|judge_reasons|conversation_judge_score" outputs/turns.jsonlResume from checkpoint:
run:
resume_run_id: "<existing_run_id>"Batched resume rules:
run.batch_sizemust match saved run_state batch size- slot states (
active/completed/dropped) are restored - per-slot persona inputs are restored
- dedup memory is restored from ledger and existing accepted turns
Run generation:
uv run env PYTHONPATH=src python -m dlgforge run config.yamlIf the package is installed in the active environment:
dlgforge run config.yamlJudge-only pass on existing conversations:
dlgforge judge config.yamlPush/export:
dlgforge push config.yaml
dlgforge push config.yaml --no-pushSeed migration:
dlgforge seeds-migrate config.yaml --source-file seed_topics.json --dest-file data/seeds/topics.yaml --overwriteCheck:
judge.enabled: truejudge.mode: onlinellm.agents.judge.modelresolves- API key present
Check:
judge.granularity: conversationlogs/judge.loghas[judge-online-conversation]- conversation files contain
conversation_judge
Check:
judge.granularity: turnlogs/judge.loghas[judge-online]- turn payloads contain
qa_judge
Check:
- persona file path exists and is readable
- list has enough personas
personas.enabled: true
Check:
- dedup pressure and retry budget
- model latency in
logs/llm.log - dropped slots in run_state (
drop_reason)
If retrieval errors appear after model/backend changes:
- set
tools.retrieval.index.rebuild: truefor one run, then back tofalse - or remove
knowledge_index/and regenerate
If preflight fails with Missing knowledge directory or No supported knowledge files found:
- create
knowledge/at the repository root - add at least one
.txt,.md, or.pdffile underknowledge/ - rerun generation (optionally set
tools.retrieval.index.rebuild: truefor one run after major document changes)
If you see model download/auth errors:
- keep the default
tools.retrieval.embeddings.model: sentence-transformers/all-MiniLM-L6-v2(no HF token required in standard setups) - if you switch to a gated/private HF embedding model, set
HF_TOKEN(orHUGGINGFACE_HUB_TOKEN) in.env
If you need a strict production profile, keep these defaults:
judge.mode: onlinejudge.granularity: conversation(budget-friendly)batch_sizetuned to provider throughputquestion_dedup_retries>= 3

