dlgforge

Lightweight synthetic multi-turn dialogue generation with a LiteLLM-routed LLM stack.

dlgforge generates grounded user-assistant conversations with:

async batched generation
deterministic dedup of generated user questions
optional online judging during generation
resumable run state
export-ready JSONL artifacts
optional one-command distributed bootstrap (Ray + Postgres + vLLM backends)

Documentation hub

1) What this project does

The pipeline runs up to three logical stages per turn:

qa_generator: produces the next user message.
kb_responder: answers using KB retrieval (and optional web search tool).
qa_judge: evaluates quality/grounding (configurable granularity).

It supports:

fixed turns (run.turns.mode: exact + run.turns.exact) or sampled turns per conversation (run.turns.mode: range + run.turns.min/max/distribution)
batched concurrent generation (run.batch_size)
language loops (run.target_languages) with total_samples generated per language
deterministic exact-normalized question dedup across the full run

2) Quick start (5 minutes)

2.1 Create environment and install

uv venv
source .venv/bin/activate
uv pip install -e .

If you want managed local/cluster vLLM autostart mode (llm.mode: vllm_managed, Linux GPU nodes):

python -m pip install -e ".[vllm]"

2.2 Configure env vars

cp .env.example .env

Minimum required:

model per active role:
- llm.agents.user.model
- llm.agents.assistant.model
- llm.agents.judge.model (when judge.mode: online)
provider credential env vars (for example OPENAI_API_KEY, GEMINI_API_KEY, ANTHROPIC_API_KEY)
- provider secret names are flexible; common patterns like *_API_KEY and *_TOKEN are supported
required agent credential mapping vars:
- LLM_USER_API_KEY_ENV
- LLM_ASSISTANT_API_KEY_ENV
- LLM_JUDGE_API_KEY_ENV
- legacy aliases (LLM_QA_GENERATOR_API_KEY_ENV, LLM_KB_RESPONDER_API_KEY_ENV, LLM_QA_JUDGE_API_KEY_ENV) are still accepted with deprecation warnings
do not place api_key or api_key_env in YAML; credentials are environment-only
HF_TOKEN is optional for generation with the default embedding model (sentence-transformers/all-MiniLM-L6-v2); it is needed for gated/private HF models and dlgforge push

2.3 Prepare knowledge directory

dlgforge run requires a knowledge/ folder at the project root with at least one supported file (.txt, .md, or .pdf).

mkdir -p knowledge
# add your source docs under knowledge/, for example: knowledge/product_faq.md

By default, retrieval embeddings/chunks are cached under knowledge_index/ (tools.retrieval.index.persist_dir in config.yaml).

2.4 Run generation

uv run env PYTHONPATH=src python -m dlgforge run config.yaml

If the package is installed in the environment (uv pip install -e .), these are equivalent:

uv run dlgforge run config.yaml
dlgforge run config.yaml

2.5 Verify first outputs

outputs/conversations/*.json
outputs/conversations_sharegpt.jsonl
outputs/turns.jsonl
outputs/run_state/*.json
logs/run.log, logs/llm.log, logs/judge.log

2.6 Recommended mode by setup

macOS laptop + LM Studio: run.distributed.enabled: false + llm.mode: api + llm.agents.*.base_url: http://127.0.0.1:1234/v1
macOS laptop + distributed orchestrator: run.distributed.enabled: true + llm.mode: vllm_attach + Postgres DSN
Linux GPU nodes and self-managed cluster: run.distributed.enabled: true + llm.mode: vllm_managed

2.7 Run modes (copy-paste)

A) LM Studio local, non-distributed (recommended on macOS)

run:
  distributed:
    enabled: false
llm:
  mode: api
  agents:
    user:
      model: openai/gpt-oss-20b
      base_url: http://127.0.0.1:1234/v1
    assistant:
      model: openai/gpt-oss-20b
      base_url: http://127.0.0.1:1234/v1
    judge:
      model: openai/gpt-oss-20b
      base_url: http://127.0.0.1:1234/v1

Make sure to specify the correct base URLs for each agent via llm.agents.<role>.base_url (refer to https://docs.litellm.ai/docs/providers for base_url and api_key formats for different providers). dlgforge auto-uses LiteLLM openai-compatible passthrough so namespaced model IDs like openai/gpt-oss-20b are forwarded unchanged.

dlgforge run config.yaml

B) OpenAI, non-distributed (no Ray/Postgres required)

run:
  distributed:
    enabled: false
llm:
  mode: api

dlgforge run config.yaml

C) OpenAI, one-command distributed (Ray + Postgres)

run:
  distributed:
    enabled: true
ray:
  address: auto
  auto_start_local: true
llm:
  mode: api

export DLGFORGE_POSTGRES_DSN='postgresql://USER:PASS@HOST:5432/DB'
dlgforge run config.yaml

If no Ray cluster is running, ray.auto_start_local: true lets dlgforge start a local Ray runtime automatically.

D) vLLM attach (you already started vLLM endpoints)

run:
  distributed:
    enabled: true
llm:
  mode: vllm_attach
  routing:
    endpoints:
      - name: gpu-node-1
        base_url: http://10.0.0.11:8000/v1
        api_key: EMPTY

export DLGFORGE_POSTGRES_DSN='postgresql://USER:PASS@HOST:5432/DB'
dlgforge run config.yaml

E) vLLM managed autostart (dlgforge starts/stops vLLM)

run:
  distributed:
    enabled: true
llm:
  mode: vllm_managed
  vllm:
    model: Qwen/Qwen2.5-7B-Instruct
    served_model_name: qwen

python -m pip install -e ".[vllm]"
export DLGFORGE_POSTGRES_DSN='postgresql://USER:PASS@HOST:5432/DB'
dlgforge run config.yaml

Notes:

managed mode is Linux-oriented (vllm extra is Linux-only in this project).
on macOS, use LM Studio local mode (A) or distributed attach mode (D).
if using managed mode, align your llm.agents.*.model values with llm.vllm.served_model_name.

2.7 Mixed providers by agent

When mixing providers (for example OpenAI + Gemini + LM Studio), define provider/model/base URL per agent and follow LiteLLM provider conventions:

LiteLLM providers reference: https://docs.litellm.ai/docs/providers

Recommended per-agent setup:

OpenAI API:
- provider: openai
- model: gpt-5.2 (or another OpenAI model)
- base_url: https://api.openai.com/v1
Gemini (Google AI Studio):
- provider: gemini
- model: gemini/gemini-2.0-flash (or another Gemini model)
- leave base_url empty unless you intentionally set a Gemini-specific proxy/base URL
LM Studio (OpenAI-compatible local server):
- provider: openai (or lm_studio)
- model: openai/gpt-oss-20b (or your served model name format)
- base_url: http://localhost:1234/v1
vLLM (OpenAI-compatible endpoint):
- provider: openai (or hosted_vllm)
- base_url: http://localhost:8000/v1

Be careful with env/config precedence:

credentials are env-only via LLM_USER_API_KEY_ENV, LLM_ASSISTANT_API_KEY_ENV, LLM_JUDGE_API_KEY_ENV.
LLM_<ROLE>_* env vars override config.yaml values for that specific role.
for mixed providers, prefer setting provider/model/base_url explicitly per agent in llm.agents.<role>.

2.8 Start Postgres quickly (required for distributed modes)

Distributed modes (C, D, E) require Postgres.

Start a local Postgres with Docker:

docker run -d \
  --name dlgforge-postgres \
  -e POSTGRES_USER=dlgforge \
  -e POSTGRES_PASSWORD=dlgforge \
  -e POSTGRES_DB=dlgforge \
  -p 5432:5432 \
  postgres:16

Set DSN:

export DLGFORGE_POSTGRES_DSN='postgresql://dlgforge:dlgforge@127.0.0.1:5432/dlgforge'

Health check:

docker exec dlgforge-postgres pg_isready -U dlgforge -d dlgforge

Reuse existing container:

docker start dlgforge-postgres

Stop when done:

docker stop dlgforge-postgres

3) How generation works

At runtime:

Load config + env overrides.
Build base inputs and runtime settings.
For each target language:
- run one or more waves until total_samples for that language is reached
- each wave runs batch_size conversations (or remaining count)
Persist artifacts and optional HF auto-push.

Important:

total_samples is per language, not global.
if target_languages has 5 values and total_samples=200, target is 1000 conversations total.

4) Configuration guide

4.1 `run`

Core run controls:

run.batch_size: number of conversations advanced concurrently.
run.total_samples: number of conversations to persist per language.
run.target_languages: list of languages.
run.run_id: optional explicit run id.
run.resume_run_id: resume checkpoint.

Turn count:

run.turns.mode: exact or range
run.turns.exact: used when mode: exact
run.turns.min, run.turns.max: used when mode: range
run.turns.distribution: uniform, poisson, or exponential
run.turns.mean: mean for poisson/exponential

Data shaping:

run.data.seeding.question
run.data.seeding.topics.path
run.data.seeding.topics.enabled
run.data.seeding.topics.variant
run.data.seeding.topics.probability

Behavior:

sampled turns are clamped to [run.turns.min, run.turns.max]
each conversation samples independently

4.2 `llm`

LiteLLM-routed settings:

llm.mode: api, vllm_attach, or vllm_managed
per-agent settings under llm.agents.<role>:
- provider, model, base_url
- optional sampling params (temperature, max_tokens, top_p, timeout, max_retries, extra)
credentials are environment-only via:
- LLM_USER_API_KEY_ENV
- LLM_ASSISTANT_API_KEY_ENV
- LLM_JUDGE_API_KEY_ENV
llm.routing.*: multi-endpoint routing (used by attach/managed vLLM modes)
llm.vllm.*: managed vLLM replica settings when llm.mode: vllm_managed

Agents:

user
assistant
judge

4.3 `tools`

Tool settings:

tools.web_search.enabled
tools.web_search.serper_num_results
tools.web_search.serper_timeout
tools.retrieval.top_k
tools.retrieval.chunking.chunk_size
tools.retrieval.chunking.chunk_overlap
tools.retrieval.index.persist_dir
tools.retrieval.index.rebuild
tools.retrieval.embeddings.*
tools.retrieval.reranker.*

4.4 `coverage`

Dedup and coverage behavior:

coverage.question_dedup_retries
coverage balancing parameters

4.5 `personas`

Persona controls:

personas.enabled
personas.path

Current recommended path:

personas:
  enabled: true
  path: src/dlgforge/prompts/personas.yaml

If path is missing/unreadable, built-in fallback personas are used.

4.6 `judge`

Judge controls:

judge.enabled
judge.mode: online or offline
judge.granularity: turn or conversation
judge.reasons: allowed labels

4.7 `saving`

Output layout and export:

saving.output_dir
saving.output_columns.* (renamable JSONL columns)
saving.hf_push.*

4.8 Distributed one-command runtime

Enable one-command distributed launch from the same CLI entrypoint:

run:
  distributed:
    enabled: true
    backend: ray
    spawn:
      coordinator: true
      workers: true
    ray:
      address: "auto"
      auto_start_local: true
      namespace: "dlgforge"

store:
  backend: postgres
  postgres:
    dsn: "${DLGFORGE_POSTGRES_DSN}"

llm:
  mode: vllm_attach  # api | vllm_attach | vllm_managed
  routing:
    strategy: weighted_least_inflight
    endpoints:
      - name: gpu-node-1
        base_url: http://10.0.0.11:8000/v1
        api_key: EMPTY
      - name: gpu-node-2
        base_url: http://10.0.0.12:8000/v1
        api_key: EMPTY

Behavior:

dlgforge run config.yaml bootstraps coordinator + workers automatically when run.distributed.enabled: true
Ray init tries run.distributed.ray.address first; when run.distributed.ray.address: auto has no running cluster and run.distributed.ray.auto_start_local: true, it falls back to a local Ray runtime
llm.mode: api uses hosted API (no vLLM provisioning)
llm.mode: vllm_attach validates configured vLLM endpoints before run
llm.mode: vllm_managed starts/stops vLLM servers on Ray GPU actors
current execution path runs generation from the coordinator actor while worker replicas are provisioned for lifecycle orchestration hooks

Bootstrap sequence:

flowchart TD
  A["dlgforge run config.yaml"] --> B["RunBootstrap"]
  B --> C["Initialize Ray"]
  C --> D["Validate Postgres DSN and ping"]
  D --> E{"llm.mode"}
  E -->|api| F["No vLLM provisioning"]
  E -->|vllm_attach| G["Validate configured /v1/models endpoints"]
  E -->|vllm_managed| H["Spawn vLLM server actors and wait healthy"]
  F --> I["Spawn coordinator actor"]
  G --> I
  H --> I
  B --> J["Spawn worker actors"]
  I --> K["Coordinator executes generation run"]

Current dispatch/execution model:

flowchart LR
  U["User: dlgforge run"] --> B["RunBootstrap"]
  B --> C["Coordinator actor"]
  B --> W["Worker actors (provisioned)"]
  C --> P["Existing generation loop (turn logic)"]
  P --> R["Endpoint routing"]
  R --> O["OpenAI API"]
  R --> VA["Attached vLLM endpoints"]
  R --> VM["Managed vLLM endpoints"]

Mode matrix:

flowchart TD
  S{"run.distributed.enabled"}
  S -->|false| L["Local mode: api, no Ray/Postgres requirement"]
  S -->|true| D["Distributed mode: Ray plus Postgres required"]
  D --> B{"llm.mode"}
  B -->|api| BO["Hosted API path"]
  B -->|vllm_attach| BA["Use user-provided vLLM endpoints"]
  B -->|vllm_managed| BM["Auto-start vLLM on Ray GPU workers (vllm extra required)"]

Useful HF export options:

saving.hf_push.source_file: use conversations_sharegpt_judged.jsonl to include judge column.
saving.hf_push.generate_stats: writes dataset stats JSON during export.
saving.hf_push.stats_file: stats JSON filename (default dataset_stats.json).
saving.hf_push.generate_plots: writes SVG distribution plots during export.
saving.hf_push.plots_dir: plot output folder inside export dir.

5) Judge modes and budget control

Two orthogonal controls:

when to run judge

judge.mode: online -> judge integrated in dlgforge run
judge.mode: offline -> no judge during generation

how often to run judge

judge.granularity: turn -> judge every turn
judge.granularity: conversation -> judge once per conversation

5.1 Recommended config for lower budget

judge:
  enabled: true
  mode: online
  granularity: conversation
  reasons:
    - irrelevant
    - incorrect
    - hallucinated
    - weak_grounding
    - vague
    - incomplete
    - unsafe
    - other

5.2 Tradeoff summary

turn granularity:
- pros: fine-grained labels and diagnostics
- cons: most judge-token expensive
conversation granularity:
- pros: 1 judge call per conversation, cheaper
- cons: less localized feedback per turn

5.3 Verification checklist

For judge.mode: online + granularity: turn:

run dlgforge run config.yaml
check logs/judge.log for [judge-online] ...
check conversation turns contain qa_judge
check outputs/conversations_sharegpt_judged.jsonl grows

For judge.mode: online + granularity: conversation:

run dlgforge run config.yaml
check logs/judge.log for [judge-online-conversation] ...
check conversation payload has conversation_judge
check judged export judge.conversation is populated

6) Async batch + dedup semantics

6.1 Concurrency

conversations advance independently in slots
slot ordering is deterministic for acceptance/commit

6.2 Dedup scope

Question dedup uses normalized exact match:

lowercase
collapsed whitespace

Applied across:

duplicates within the same batch attempt
duplicates already accepted in prior attempts/batches

6.3 Retry and drop policy

rejected duplicate slots are regenerated only for missing slots
retries capped by coverage.question_dedup_retries
on exhaustion: slot marked dropped (drop_reason=dedup_exhausted)

7) Persona sampling behavior

Personas are sampled per conversation (not per run):

each conversation gets one user persona and one assistant persona
sampling is uniform-cycle based across available personas
this avoids overusing a small subset when generating many samples

In batched mode:

each slot has its own persona assignment
assignment is persisted in run_state and reused on resume

8) Outputs and inspection

Main files under outputs/:

synthetic_qa.jsonl: one record per conversation
coverage_ledger.jsonl: dedup/coverage memory
turns.jsonl: flattened per-turn rows
conversations_sharegpt.jsonl: ShareGPT export
conversations_sharegpt_judged.jsonl: judged ShareGPT export
conversations/<conversation_id>.json: rich conversation artifact
run_state/<run_id>.json: checkpoint state

Judge fields:

per-turn mode: turns[].qa_judge
conversation mode: top-level conversation_judge
judged ShareGPT column (judge) includes:
- per_turn
- avg_score
- conversation

Useful commands:

# runtime
tail -f logs/run.log

# judge logs
tail -f logs/judge.log

# latest conversation files
ls -lt outputs/conversations | head

# inspect conversation-level judge
rg "conversation_judge" outputs/conversations/*.json

# inspect turn-level judge fields
rg "judge_score|judge_reasons|conversation_judge_score" outputs/turns.jsonl

9) Resume and run state

Resume from checkpoint:

run:
  resume_run_id: "<existing_run_id>"

Batched resume rules:

run.batch_size must match saved run_state batch size
slot states (active/completed/dropped) are restored
per-slot persona inputs are restored
dedup memory is restored from ledger and existing accepted turns

10) CLI commands

Run generation:

uv run env PYTHONPATH=src python -m dlgforge run config.yaml

If the package is installed in the active environment:

dlgforge run config.yaml

Judge-only pass on existing conversations:

dlgforge judge config.yaml

Push/export:

dlgforge push config.yaml
dlgforge push config.yaml --no-push

Seed migration:

dlgforge seeds-migrate config.yaml --source-file seed_topics.json --dest-file data/seeds/topics.yaml --overwrite

11) Troubleshooting playbook

11.1 Judge not running

Check:

judge.enabled: true
judge.mode: online
llm.agents.judge.model resolves
API key present

11.2 No judged output in conversation mode

Check:

judge.granularity: conversation
logs/judge.log has [judge-online-conversation]
conversation files contain conversation_judge

11.3 No judged output in turn mode

Check:

judge.granularity: turn
logs/judge.log has [judge-online]
turn payloads contain qa_judge

11.4 Same persona repeated too often

Check:

persona file path exists and is readable
list has enough personas
personas.enabled: true

11.5 Batch run appears stalled

Check:

dedup pressure and retry budget
model latency in logs/llm.log
dropped slots in run_state (drop_reason)

11.6 Embedding/index mismatch

If retrieval errors appear after model/backend changes:

set tools.retrieval.index.rebuild: true for one run, then back to false
or remove knowledge_index/ and regenerate

11.7 Missing `knowledge/` or empty knowledge base

If preflight fails with Missing knowledge directory or No supported knowledge files found:

create knowledge/ at the repository root
add at least one .txt, .md, or .pdf file under knowledge/
rerun generation (optionally set tools.retrieval.index.rebuild: true for one run after major document changes)

11.8 Hugging Face auth error while loading embeddings

If you see model download/auth errors:

keep the default tools.retrieval.embeddings.model: sentence-transformers/all-MiniLM-L6-v2 (no HF token required in standard setups)
if you switch to a gated/private HF embedding model, set HF_TOKEN (or HUGGINGFACE_HUB_TOKEN) in .env

If you need a strict production profile, keep these defaults:

judge.mode: online
judge.granularity: conversation (budget-friendly)
batch_size tuned to provider throughput
question_dedup_retries >= 3

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
assets		assets
data/seeds		data/seeds
docs		docs
release		release
release_assets/sample_dataset		release_assets/sample_dataset
site		site
src/dlgforge		src/dlgforge
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
config.yaml		config.yaml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
seed_topics.json		seed_topics.json
uv.lock		uv.lock

License

MBZUAI-Paris/dialogforge

Folders and files

Latest commit

History

Repository files navigation

dlgforge

Documentation hub

Table of contents

1) What this project does

2) Quick start (5 minutes)

2.1 Create environment and install

2.2 Configure env vars

2.3 Prepare knowledge directory

2.4 Run generation

2.5 Verify first outputs

2.6 Recommended mode by setup

2.7 Run modes (copy-paste)

A) LM Studio local, non-distributed (recommended on macOS)

B) OpenAI, non-distributed (no Ray/Postgres required)

C) OpenAI, one-command distributed (Ray + Postgres)

D) vLLM attach (you already started vLLM endpoints)

E) vLLM managed autostart (dlgforge starts/stops vLLM)

2.7 Mixed providers by agent

2.8 Start Postgres quickly (required for distributed modes)

3) How generation works

4) Configuration guide

4.1 run

4.2 llm

4.3 tools

4.4 coverage

4.5 personas

4.6 judge

4.7 saving

4.8 Distributed one-command runtime

5) Judge modes and budget control

5.1 Recommended config for lower budget

5.2 Tradeoff summary

5.3 Verification checklist

6) Async batch + dedup semantics

6.1 Concurrency

6.2 Dedup scope

6.3 Retry and drop policy

7) Persona sampling behavior

8) Outputs and inspection

9) Resume and run state

10) CLI commands

11) Troubleshooting playbook

11.1 Judge not running

11.2 No judged output in conversation mode

11.3 No judged output in turn mode

11.4 Same persona repeated too often

11.5 Batch run appears stalled

11.6 Embedding/index mismatch

11.7 Missing knowledge/ or empty knowledge base

11.8 Hugging Face auth error while loading embeddings

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

4.1 `run`

4.2 `llm`

4.3 `tools`

4.4 `coverage`

4.5 `personas`

4.6 `judge`

4.7 `saving`

11.7 Missing `knowledge/` or empty knowledge base

Packages