Skip to content

[FEATUREE] Allow passing extraction model via ExtractionConfig instead of env var #130

@voidwisp

Description

@voidwisp

The only ways to set the extraction LLM model are:

  1. Set EXTRACTION_MODEL env var before the config singleton lazily initializes
  2. Directly set GraphRAGConfig.extraction_llm (undocumented, reaches into internals)

For multi-tenant pipelines where each tenant uses a different model, neither approach is clean. Setting env vars as a side effect before calling extract_and_build() is fragile — it relies on the singleton not having been initialized yet, and mutating global state between tenants requires knowing that
_extraction_llm gets reset (it doesn't, so switching models between tenants is actually broken).

Suggestion:

Accept extraction_llm (model ID string or LLM instance) as a parameter on ExtractionConfig or IndexingConfig, and have it take precedence over the env var / default:

extraction_config = ExtractionConfig(
extraction_llm='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
preferred_entity_classifications=classifications,
)

This keeps configuration explicit and co-located, avoids global state mutation, and supports multi-tenant use cases naturally.

Current workaround:

os.environ['EXTRACTION_MODEL'] = tenant.model

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions