Skip to content

Feat/add litellm provider#260

Open
RheagalFire wants to merge 3 commits into
hitsz-ids:mainfrom
RheagalFire:feat/add-litellm-provider
Open

Feat/add litellm provider#260
RheagalFire wants to merge 3 commits into
hitsz-ids:mainfrom
RheagalFire:feat/add-litellm-provider

Conversation

@RheagalFire

@RheagalFire RheagalFire commented May 19, 2026

Copy link
Copy Markdown

Description

Adds SingleTableLiteLLMModel as a new LLM-based synthetic data generation model alongside SingleTableGPTModel. Uses LiteLLM to route to 100+ LLM providers through a single class.

Files changed:

  • sdgx/models/LLM/single_table/litellm.py -- New SingleTableLiteLLMModel(LLMBaseModel) following the exact same pattern as gpt.py: ask_llm(), fit() (raw data + metadata), sample() with batch querying, same prompt templates. Uses litellm.completion() with drop_params=True for cross-provider kwarg compatibility.
  • pyproject.toml -- Added litellm>=1.80.0,<1.87 under [project.optional-dependencies].litellm.

Motivation and Context

The existing SingleTableGPTModel is hardcoded to OpenAI's API. Users who want to generate synthetic data using Anthropic, Google, Groq, local models via Ollama, or any other provider have no path today. Related: issue #259 requests adding MiniMax as a provider -- LiteLLM covers MiniMax and 100+ other providers in one integration.

LiteLLM provides a unified interface where users switch providers by changing the model string (e.g. anthropic/claude-sonnet-4-6, groq/llama-3.3-70b-versatile) without changing code. API keys are read from provider-specific environment variables automatically.

How has this been tested?

Testing environment: macOS (Apple Silicon), Python 3.12, litellm 1.85.0

Lint:

$ ruff check sdgx/models/LLM/single_table/litellm.py
All checks passed!
$ ruff format --check sdgx/models/LLM/single_table/litellm.py
1 file already formatted

Live e2e against Anthropic:

from sdgx.models.LLM.single_table.litellm import SingleTableLiteLLMModel
m = SingleTableLiteLLMModel()
m.model = 'anthropic/claude-sonnet-4-6'
m.use_raw_data = True
result = m.ask_llm('What is 2+2? Answer with just the number.')
# Response: 4

Existing test suite: 10 pre-existing collection errors on both main and this branch (unchanged). No regressions introduced.

Impact on other areas: Additive only. SingleTableGPTModel and LLMBaseModel are untouched. The new file is a sibling to gpt.py with no shared mutable state.

Types of changes

  • Maintenance (no change in code, maintain the project's CI, docs, etc.)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@RheagalFire

Copy link
Copy Markdown
Author

cc @MooooCat @Wh1isper

@RheagalFire

Copy link
Copy Markdown
Author

@Wh1isper do you have any update on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant