-
Notifications
You must be signed in to change notification settings - Fork 78
Description
The only ways to set the extraction LLM model are:
- Set EXTRACTION_MODEL env var before the config singleton lazily initializes
- Directly set GraphRAGConfig.extraction_llm (undocumented, reaches into internals)
For multi-tenant pipelines where each tenant uses a different model, neither approach is clean. Setting env vars as a side effect before calling extract_and_build() is fragile — it relies on the singleton not having been initialized yet, and mutating global state between tenants requires knowing that
_extraction_llm gets reset (it doesn't, so switching models between tenants is actually broken).
Suggestion:
Accept extraction_llm (model ID string or LLM instance) as a parameter on ExtractionConfig or IndexingConfig, and have it take precedence over the env var / default:
extraction_config = ExtractionConfig(
extraction_llm='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
preferred_entity_classifications=classifications,
)
This keeps configuration explicit and co-located, avoids global state mutation, and supports multi-tenant use cases naturally.
Current workaround:
os.environ['EXTRACTION_MODEL'] = tenant.model