You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ForgeRAG is configured via a single YAML file. YAML is the single source of truth — there is no runtime config editing via UI. Edit the file and restart to apply.
Config Resolution Order
--config <path> CLI argument
$FORGERAG_CONFIG environment variable
./forgerag.yaml in the working directory
Auto-generated skeleton (written on first boot if no yaml exists; needs at minimum your LLM + embedding provider credentials filled in before queries will succeed)
Per-request overrides (not the same as config editing)
A subset of retrieval knobs can be overridden per query via QueryOverrides in the /api/v1/query request body — handy for A/B testing and debug without mutating the global config. See api-reference.md for the field list. These overrides never mutate YAML or the database; they apply only to the single request.
DB as a one-way backup mirror
On startup, ForgeRAG writes the resolved cfg into the settings table as a read-only snapshot. GET /api/v1/settings returns this snapshot for admin tooling. The runtime never reads back — components always consult the in-memory cfg loaded from YAML. Any drift between DB and YAML is resolved in YAML's favour on the next boot. (A legacy llm_providers table also exists for migration compatibility but is unused since v0.2.0 dropped the provider_id indirection.)
Changing configuration — the only way
Edit forgerag.yaml (or myconfig.yaml via --config) and restart the backend. This applies to every setting: infrastructure (persistence/storage/graph backends), LLM providers, retrieval parameters, prompts, everything.
Sections
parser
Controls document parsing, chunking, and tree building.
parser.backend
Single explicit choice — no fallback chain. Pick one of:
Value
Description
pymupdf (default)
Fast, no extra dependencies.
mineru
Layout-aware (tables / formulas / multi-column). Pulls GBs of model weights on first run.
mineru-vlm
Vision-language MinerU. Best for scanned / handwritten / very complex layouts. Heaviest.
parser.mineru
Sub-config for MinerU; only used when parser.backend is mineru or mineru-vlm. The pipeline auto-derives mineru.backend from the top-level choice.
Key
Type
Default
Description
device
string
"cuda"
Compute device: cuda or cpu
lang
string
"ch"
Primary OCR language
formula_enable
bool
true
Enable formula detection
table_enable
bool
true
Enable table detection
parse_method
string
"auto"
Parse method: auto, txt, ocr
server_url
string
null
Remote VLM server URL (leave blank for local inference, only meaningful with mineru-vlm)
parser.chunker
Controls how blocks are packed into chunks.
Key
Type
Default
Description
target_tokens
int
600
Target token count per chunk (greedy packing)
max_tokens
int
1000
Hard ceiling; chunks exceeding this are split
min_tokens
int
100
Trailing chunks below this merge into previous
tokenizer
string
"char_approx"
Token counting method (CJK-aware character approximation)
isolate_tables
bool
true
Tables become single-block chunks
isolate_figures
bool
true
Figures become single-block chunks
isolate_formulas
bool
false
Formulas become single-block chunks
overlap_blocks
int
0
Number of blocks to overlap between adjacent chunks
parser.tree_builder
Controls how document hierarchy is built. When llm_enabled is true, an LLM groups pages into logical sections, generates titles and per-node summaries in a single call. TOC and heading signals are passed as hints but the LLM makes all structural decisions. When disabled, a flat fallback is used and tree navigation is not available during retrieval.
Key
Type
Default
Description
llm_enabled
bool
false
Use LLM to build document tree with summaries (page-group strategy)
llm_model
string
null
Model for tree building (defaults to generator model)
page_group_size
int
5
Pages per group before LLM merge
max_tokens_per_node
int
8000
Subdivide leaf nodes exceeding this token count
group_llm_max_chars
int
40000
Max chars per LLM batch call
min_coverage
float
0.80
Minimum page coverage for quality scoring
min_nodes
int
3
Minimum node count for non-trivial tree
max_reasonable_depth
int
6
Maximum tree depth
target_leaf_pages
float
7.0
Target pages per leaf node
summary_max_workers
int
4
Parallel workers for batch summary generation
parser.normalizer
Post-processing rules applied after parsing.
Key
Type
Default
Description
strip_header_footer
bool
true
Remove repeated page headers/footers
merge_cross_page_paragraphs
bool
true
Merge paragraphs split across page breaks
bind_captions
bool
true
Associate captions with their figures/tables
resolve_references
bool
true
Resolve "see Figure N" / "see Table N" cross-references
parser (top-level)
Key
Type
Default
Description
ingest_max_workers
int
10
Maximum concurrent document ingestion workers
storage
Blob storage for uploaded files and generated assets (converted PDFs, figure images).
Key
Type
Default
Description
mode
string
"local"
Storage backend: local, s3, oss
storage.local
Key
Type
Default
Description
root
string
"./storage/blobs"
Directory for blob storage
storage.s3
Key
Type
Default
Description
bucket
string
—
S3 bucket name
prefix
string
""
Key prefix
region
string
null
AWS region
endpoint_url
string
null
Custom endpoint (for MinIO, etc.)
access_key_env
string
"AWS_ACCESS_KEY_ID"
Env var for access key
secret_key_env
string
"AWS_SECRET_ACCESS_KEY"
Env var for secret key
storage.oss
Key
Type
Default
Description
bucket
string
—
OSS bucket name
endpoint
string
—
OSS endpoint
prefix
string
""
Key prefix
access_key_env
string
"OSS_ACCESS_KEY_ID"
Env var for access key
secret_key_env
string
"OSS_ACCESS_KEY_SECRET"
Env var for secret key
files
File upload constraints.
Key
Type
Default
Description
max_bytes
int
209715200
Maximum upload size (200 MiB)
persistence
Database backends for relational data and vector embeddings.
Always-on subsystems.query_understanding, rerank, kg_extraction, and kg_path no longer have enabled toggles in v0.2.0 — they always run when the relevant infrastructure is configured (e.g. kg_* runs whenever a graph store is set). To opt out per query, pass QueryOverrides in the API request body (e.g. {"overrides": {"rerank": false, "kg_path": false}}). To opt out the whole KG layer, just omit the graph: config block.
retrieval.query_understanding
Key
Type
Default
Description
model
string
"openai/gpt-4o-mini"
LLM model
api_key / api_key_env / api_base
string
null
Inline credentials (skip to inherit answer-LLM creds)
max_expansions
int
3
Maximum query expansions
timeout
float
10.0
Timeout in seconds
system_prompt
string
null
Custom system prompt
user_prompt_template
string
null
Custom user prompt template
retrieval.bm25
Key
Type
Default
Description
enabled
bool
true
Enable BM25 path
k1
float
1.5
Term frequency saturation
b
float
0.75
Document length normalization
top_k
int
30
Number of results
doc_prefilter_top_k
int
10
Document pre-filter for tree path
retrieval.vector
Key
Type
Default
Description
enabled
bool
true
Enable vector similarity path
top_k
int
30
Number of results
default_filter
dict
null
Metadata filter (e.g., {"content_type": "text"})
retrieval.tree_path
Key
Type
Default
Description
enabled
bool
true
Enable tree navigation path
llm_nav_enabled
bool
true
Use LLM for tree navigation
top_k
int
30
Number of chunks from tree path
retrieval.tree_path.tree_nav
Key
Type
Default
Description
model
string
"openai/gpt-4o-mini"
LLM model for tree navigation
temperature
float
0.0
LLM temperature
max_tokens
int
1024
LLM max tokens
timeout
float
30.0
Timeout in seconds
max_nodes
int
8
Maximum nodes the LLM can select
max_workers
int
5
Parallel LLM calls
target_chunks
int
30
Early-stop threshold
retrieval.kg_extraction
Runs whenever a graph store is configured. To skip the KG layer entirely, omit the graph: block in yaml. Relation descriptions and entity names are always embedded — both downstream paths (EntityDisambiguation, KGPath.relation_weight semantic search) silently degrade without them, so the toggles were dropped.
Key
Type
Default
Description
model
string
"openai/gpt-4o-mini"
LLM model
api_key / api_key_env / api_base
string
null
Inline credentials
max_workers
int
5
Parallel extraction workers
timeout
float
120.0
Timeout per chunk
merge_description_threshold
int
6
Fragment count that triggers LLM description consolidation
merge_description_max_chars
int
2000
Char length that triggers LLM description consolidation
retrieval.kg_path
Participates in retrieval whenever a graph store is configured. Per-query opt-out: QueryOverrides.kg_path = false.
Key
Type
Default
Description
model
string
"openai/gpt-4o-mini"
LLM model for entity extraction
api_key / api_key_env / api_base
string
null
Inline credentials
top_k
int
30
Number of chunks from KG path
max_hops
int
1
Hop depth in graph traversal (1 is the safe default; 2-hop on hub entities can explode)
Backend: passthrough (no-op) / rerank_api (dedicated cross-encoder via litellm.rerank()) / llm_as_reranker (chat LLM as judge)
on_failure
string
"strict"
"strict" raises RerankerError on failure; "passthrough" falls back to RRF order silently
model
string
"openai/gpt-4o-mini"
For llm_as_reranker use a chat model. For rerank_api use a litellm-rerank-compatible prefix: infinity/, cohere/, jina_ai/, voyage/, together_ai/. SiliconFlow's BGE rerank works as infinity/BAAI/bge-reranker-v2-m3 + api_base=https://api.siliconflow.cn/v1.
api_key / api_key_env / api_base
string
null
Inline credentials
top_k
int
10
Results after reranking
timeout
float
30.0
Timeout in seconds
snippet_chars
int
500
Per-candidate snippet budget (only for llm_as_reranker)
Production: Restrict allow_origins to your domain.
cache
Cache paths for BM25 index and embedding cache.
Key
Type
Default
Description
bm25_persistence
string
"./storage/bm25_index.pkl"
BM25 index file path
embedding_cache
string
""
Embedding cache path (empty = disabled)
Environment Variables
Credentials should never be stored in forgerag.yaml. Use environment variables instead:
Variable
Used by
Description
OPENAI_API_KEY
Embedder, Generator, Tree Nav
OpenAI API key
FORGERAG_CONFIG
main.py
Config file path
FORGERAG_HOST
main.py
Server bind address
FORGERAG_PORT
main.py
Server bind port
POSTGRES_PASSWORD
Relational store
PostgreSQL password
MYSQL_PASSWORD
Relational store
MySQL password
NEO4J_PASSWORD
Graph store
Neo4j password
AWS_ACCESS_KEY_ID
S3 blob store
AWS access key
AWS_SECRET_ACCESS_KEY
S3 blob store
AWS secret key
OSS_ACCESS_KEY_ID
OSS blob store
Alibaba OSS access key
OSS_ACCESS_KEY_SECRET
OSS blob store
Alibaba OSS secret key
The api_key_env / password_env pattern throughout the config refers to environment variable names, not literal values. For example, api_key_env: OPENAI_API_KEY tells ForgeRAG to read the key from $OPENAI_API_KEY.
Example: Minimal Config
Yaml is the single source of truth: model + api_key + api_base are inlined directly under each subsystem (no provider_id indirection). The retrieval subsystems inherit from answering.generator if you don't override them.
Run python scripts/setup.py to generate this interactively — the wizard also walks through every retrieval subsystem (query_understanding / rerank / kg_extraction / kg_path / tree_path.nav) so you can override the model per-subsystem (e.g. cheap-and-fast for kg_extraction, strong for tree_path.nav). Subsystems left without overrides reuse the answer-LLM credentials.