This guide explains how to set up and add LLM providers in Gofannon. Provider configurations determine which models are available and how they interact with the system.
- Overview
- Why LiteLLM?
- Configuration Structure
- Provider Configuration Files
- Adding a New Provider
- Parameter Types and Features
- Examples
- LiteLLM Mapping Reference
Gofannon uses a centralized provider configuration system that abstracts LLM provider implementations through LiteLLM. All provider configurations are defined in:
webapp/packages/api/user-service/config/
├── provider_config.py # Main provider registry
├── openai/__init__.py # OpenAI models configuration
├── anthropic/__init__.py # Anthropic/Claude models configuration
├── gemini/__init__.py # Google Gemini models configuration
└── [provider]/__init__.py # Additional provider configurations
The LLM service that consumes these configurations is located at:
webapp/packages/api/user-service/services/llm_service.py
Gofannon supports two ways to configure API keys:
Set by administrators and used as fallback for all users:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
PERPLEXITYAI_API_KEY=pplx-...Each user can configure their own API keys through the Profile → API Keys page. User-specific keys take precedence over environment variables.
See API Key Management for detailed documentation.
When making an LLM API call:
- User's stored API key (if configured in profile)
- Environment variable (system-wide fallback)
- No key available (provider unavailable)
Gofannon relies on LiteLLM to abstract multiple LLM providers and manage their dependencies. This architectural decision has important implications:
- Unified Interface: Single API interface for all providers (OpenAI, Anthropic, Google, etc.)
- Dependency Management: LiteLLM handles provider-specific SDKs and their dependencies
- Consistency: Standardized request/response formats across providers
- Reduced Maintenance: Updates to provider SDKs are managed by LiteLLM
Do not use provider-specific SDKs directly. While this keeps our codebase simpler, it creates a brief lag between when a provider releases a new feature and when we can use it (we must wait for LiteLLM to add support). However, we've decided this tradeoff is acceptable given the significant maintenance and consistency benefits.
When implementing provider features, always reference LiteLLM's documentation to understand:
- How provider-specific options map to LiteLLM parameters
- Which features are currently supported
- Provider-specific limitations or quirks
Each provider in provider_config.py follows this structure:
PROVIDER_CONFIG = {
"provider_name": {
"api_key_env_var": "PROVIDER_API_KEY", # Optional: environment variable for API key
"models": {
"model-name": {
"api_style": "responses", # Optional: "responses" for OpenAI's special APIs
"returns_thoughts": True, # Whether model returns reasoning/thoughts
"parameters": {
# Model-specific parameters (see below)
},
"built_in_tools": [
# Provider-specific built-in tools (see below)
]
}
}
}
}- api_key_env_var: Environment variable name for the provider's API key
- models: Dictionary of model configurations keyed by model name
- api_style: Special handling for certain APIs (e.g., OpenAI's "responses" API for o1/reasoning models)
- returns_thoughts: Boolean indicating if the model returns reasoning traces or internal thoughts
- parameters: Model-specific parameters with validation rules
- built_in_tools: Provider-specific tools (web search, code execution, etc.)
Location: webapp/packages/api/user-service/config/openai/init.py
Key features:
- API Style: OpenAI's newer models (o1, gpt-5 series) use the
"responses"API style - Reasoning Effort: GPT-5 and o-series models support
reasoning_effortparameter - Built-in Tools: Many models have built-in web search capabilities
Example configuration:
"gpt-5.2": {
"api_style": "responses",
"returns_thoughts": True,
"parameters": {
"reasoning_effort": {
"type": "choice",
"default": "disable",
"choices": ["disable", "low", "medium", "high"],
"description": "Reasoning Effort: Effort level for reasoning during generation"
},
},
"built_in_tools": [
{
"id": "web_search",
"description": "Performs a web search.",
"tool_config": {"type": "web_search", "search_context_size": "medium"}
},
]
}LiteLLM Mapping:
- The
api_style: "responses"maps to LiteLLM'slitellm.aresponses()function (see llm_service.py:87-127) - Standard models use
litellm.acompletion()(see llm_service.py:220-240) - Model string format:
"openai/model-name"(see llm_service.py:53)
Location: webapp/packages/api/user-service/config/anthropic/init.py
Key features:
- Mutually Exclusive Parameters: Claude 4.x models cannot have both
temperatureandtop_pset simultaneously - Max Tokens: Different models have different token limits
Example configuration:
"claude-opus-4-5-20251101": {
"returns_thoughts": False,
"parameters": {
"temperature": {
"type": "float",
"default": 1.0,
"min": 0.0,
"max": 1.0,
"description": "Randomness (0=focused, 1=creative)"
},
"top_p": {
"type": "float",
"default": 0.9,
"min": 0.0,
"max": 1.0,
"description": "Nucleus sampling (0.1=conservative, 0.95=diverse)",
"mutually_exclusive_with": ["temperature"]
},
"max_tokens": {
"type": "integer",
"default": 8192,
"min": 1,
"max": 16384,
"description": "Maximum tokens in response"
},
}
}LiteLLM Mapping:
- Model string format:
"anthropic/model-name"(see llm_service.py:53) - Anthropic's block-based content format is handled in llm_service.py:250-261
- The
mutually_exclusive_withis enforced in the frontend; LiteLLM passes through only one parameter
Location: webapp/packages/api/user-service/config/gemini/init.py
Key features:
- Built-in Tools: Google Search, URL context, code execution
- Reasoning Effort: Similar to OpenAI's reasoning models
Example configuration:
"gemini-2.5-pro": {
"parameters": {
"temperature": {
"type": "float",
"default": 1.0,
"min": 0.0,
"max": 2.0,
"description": "Temperature - Controls the randomness of the output."
},
"reasoning_effort": {
"type": "choice",
"default": "disable",
"choices": ["disable", "low", "medium", "high"],
"description": "Reasoning Effort: Effort level for reasoning during generation"
},
},
"built_in_tools": [
{
"id": "google_search",
"description": "Performs a Google search.",
"tool_config": {"google_search": {}}
},
{
"id": "code_execution",
"description": "Executes code snippets in a secure environment.",
"tool_config": {"codeExecution": {}}
}
]
}LiteLLM Mapping:
- Model string format:
"gemini/model-name"(see llm_service.py:53) - Built-in tools are passed through LiteLLM's
toolsparameter
Location: webapp/packages/api/user-service/config/provider_config.py:17-77
Example for local models:
"ollama": {
"models": {
"llama2": {
"parameters": {
"temperature": {
"type": "float",
"default": 0.7,
"min": 0.0,
"max": 1.0,
"description": "Controls randomness"
},
"num_predict": {
"type": "integer",
"default": 512,
"min": 1,
"max": 2048,
"description": "Maximum tokens to generate"
},
}
}
}
}LiteLLM Mapping:
- No API key required (local deployment)
- Model string format:
"ollama/model-name"
Follow these steps to add a new LLM provider:
Before adding a provider, check LiteLLM's supported providers documentation:
- Verify the provider is supported
- Note the required authentication method
- Identify any provider-specific parameters
- Check for special features (built-in tools, reasoning, etc.)
Create a new file: webapp/packages/api/user-service/config/[provider_name]/__init__.py
# [Provider Name] models configuration
# Updated [Date]
models = {
"model-name": {
"returns_thoughts": False, # or True if model supports reasoning
"parameters": {
# Define parameters with validation rules
"temperature": {
"type": "float",
"default": 0.7,
"min": 0.0,
"max": 1.0,
"description": "Controls randomness in generation"
},
# Add more parameters as needed
},
"built_in_tools": [] # Add provider-specific tools if available
}
}Edit webapp/packages/api/user-service/config/provider_config.py:
from .provider_name import models as provider_name_models
PROVIDER_CONFIG = {
# ... existing providers ...
"provider_name": {
"api_key_env_var": "PROVIDER_NAME_API_KEY", # if API key is required
"models": provider_name_models,
},
}Add the API key to your environment or .env file:
PROVIDER_NAME_API_KEY=your-api-key-hereCreate a test to verify the provider works correctly. The LLM service will automatically:
- Format the model string as
"provider_name/model-name" - Pass it to LiteLLM's
acompletion()oraresponses()function - Handle the response according to the configuration
"temperature": {
"type": "float",
"default": 0.7,
"min": 0.0,
"max": 2.0,
"description": "Controls randomness"
}"max_tokens": {
"type": "integer",
"default": 4096,
"min": 1,
"max": 16384,
"description": "Maximum tokens in response"
}"reasoning_effort": {
"type": "choice",
"default": "medium",
"choices": ["low", "medium", "high"],
"description": "Effort level for reasoning"
}Prevents using two parameters simultaneously (like temperature and top_p):
"temperature": {
"type": "float",
"default": 0.7,
"min": 0.0,
"max": 2.0,
"mutually_exclusive_with": ["top_p"]
}Implementation: The LLM service filters out None values before passing to LiteLLM (see llm_service.py:58-59):
# Filter out None values from parameters (e.g., top_p with default None)
filtered_params = {k: v for k, v in parameters.items() if v is not None}Provider-specific tools that don't require custom implementation:
"built_in_tools": [
{
"id": "web_search",
"description": "Performs a web search.",
"tool_config": {"type": "web_search", "search_context_size": "medium"}
}
]LiteLLM Mapping: Built-in tools are passed through the tools parameter in llm_service.py:69-70:
if tools:
kwargs["tools"] = toolsFor providers with multiple API endpoints (like OpenAI's responses API):
"api_style": "responses" # Uses litellm.aresponses() instead of acompletion()Implementation: The service checks this flag and routes to the appropriate LiteLLM function (see llm_service.py:82-86):
use_responses_api = (
api_style == "responses" and
(tools or reasoning_effort != 'disable')
)# config/cohere/__init__.py
models = {
"command": {
"returns_thoughts": False,
"parameters": {
"temperature": {
"type": "float",
"default": 0.75,
"min": 0.0,
"max": 1.0,
"description": "Controls randomness"
},
"max_tokens": {
"type": "integer",
"default": 4096,
"min": 1,
"max": 4096,
"description": "Maximum tokens in response"
}
},
"built_in_tools": []
}
}# In provider_config.py
from .cohere import models as cohere_models
PROVIDER_CONFIG = {
"cohere": {
"api_key_env_var": "COHERE_API_KEY",
"models": cohere_models
}
}# config/mistral/__init__.py
models = {
"mistral-large": {
"returns_thoughts": True,
"parameters": {
"temperature": {
"type": "float",
"default": 0.7,
"min": 0.0,
"max": 1.0,
"description": "Controls randomness"
},
"reasoning_effort": {
"type": "choice",
"default": "disable",
"choices": ["disable", "low", "medium", "high"],
"description": "Reasoning effort level"
}
},
"built_in_tools": []
}
}# In provider_config.py
PROVIDER_CONFIG = {
"vllm": {
# No api_key_env_var needed for local deployment
"models": {
"llama-3-70b": {
"returns_thoughts": False,
"parameters": {
"temperature": {
"type": "float",
"default": 0.7,
"min": 0.0,
"max": 1.0,
"description": "Controls randomness"
}
},
"built_in_tools": []
}
}
}
}This section explains how Gofannon's configuration maps to LiteLLM function calls.
Gofannon constructs model strings in the format "provider/model":
# From llm_service.py:53
model_string = f"{provider}/{model}"Examples:
"openai/gpt-4o""anthropic/claude-opus-4-5-20251101""gemini/gemini-2.5-pro"
For most models (see llm_service.py:220-240):
kwargs = {
"model": model_string, # "provider/model"
"messages": messages, # Standard messages array
**filtered_params, # temperature, max_tokens, etc.
}
if reasoning_effort != 'disable':
kwargs['reasoning_effort'] = reasoning_effort
response = await litellm.acompletion(**kwargs)For OpenAI's responses API (see llm_service.py:87-127):
# System prompts become 'instructions'
kwargs["instructions"] = "\n\n".join(system_prompts)
# Last user message becomes 'input'
response_obj = await litellm.aresponses(input=input_text, **kwargs)
# Poll for completion
response_status = await litellm.aget_responses(response_id=response_obj.id)Parameters with None values are filtered out (see llm_service.py:58-59):
filtered_params = {k: v for k, v in parameters.items() if v is not None}This implements mutual exclusivity without explicit validation.
Tools are passed directly if provided (see llm_service.py:69-70):
if tools:
kwargs["tools"] = toolsDifferent response formats are handled:
Standard responses (see llm_service.py:239-263):
message = response.choices[0].message
content = message.content if isinstance(message.content, str) else ""
# Extract thoughts (reasoning, tool calls, etc.)
if message.tool_calls:
thoughts_payload['tool_calls'] = [tc.model_dump() for tc in message.tool_calls]
if hasattr(message, 'reasoning_content') and message.reasoning_content:
thoughts_payload['reasoning_content'] = message.reasoning_contentAnthropic block-based responses (see llm_service.py:250-261):
if isinstance(message.content, list): # Anthropic's block-based content
thought_blocks = [block for block in content_blocks if block.get("type") == "thought"]
tool_use_blocks = [block for block in content_blocks if block.get("type") == "tool_use"]
text_blocks = [block.get("text", "") for block in content_blocks if block.get("type") == "text"]- LiteLLM Documentation
- LiteLLM Supported Providers
- LiteLLM API Reference
- Gofannon LLM Service Implementation
- Check LiteLLM Support: Verify the provider is supported by LiteLLM
- Verify API Key:
- Check if the user has configured a personal API key in their profile
- Ensure the environment variable is set correctly (fallback)
- Check Model Name: Verify the model name matches LiteLLM's expected format
- Review LiteLLM Logs: Check
services/litellm_logger.pyfor error messages
User-specific keys not working:
- Verify the key is saved in the user's profile (Profile → API Keys)
- Check the provider status shows "Configured"
- Test the key directly with the provider's API
Environment variable not working:
- Ensure the environment variable name matches
api_key_env_varinprovider_config.py - Restart the application after setting environment variables
- Check for typos or extra whitespace
- Mutually Exclusive Parameters: Ensure only one parameter from a mutually exclusive group is set
- Range Validation: Check that numeric values are within min/max bounds
- Type Mismatches: Verify parameter types match the configuration (float vs int)
If a provider releases a new feature that isn't working:
- Check if LiteLLM has added support for the feature
- Review LiteLLM's changelog
- Consider updating the LiteLLM dependency
- Temporarily use the provider's SDK directly (not recommended for production)
When adding new providers or models:
- Follow the existing configuration patterns
- Add comprehensive parameter descriptions
- Document any provider-specific quirks
- Reference LiteLLM documentation for parameter mappings
- Add example usage in this documentation
- Test thoroughly with the provider's actual API
Last Updated: January 2026 Maintainer: AI Alliance Gofannon Team