Skip to content

Commit dd324bd

Browse files
authored
Merge pull request #67 from donvito/feature/new-provider-llmgateway
Feature/new provider llmgateway
2 parents 4fc133a + b14faaf commit dd324bd

File tree

12 files changed

+217
-38
lines changed

12 files changed

+217
-38
lines changed

README.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
AIBackends is an API server that you can use to integrate AI into your applications. You can run it locally or self-host it.
44

5-
The project supports running open source models locally with Ollama and LM Studio. It also supports OpenRouter, OpenAI and Anthropic.
5+
The project supports running open models locally with Ollama, LM Studio or LlamaCpp. It also supports LLM Gateway, OpenRouter, OpenAI, Anthropic and Google AI Studio, Baseten providers.
66

77
## Why AI Backends?
88

@@ -46,16 +46,22 @@ More to come...check swagger docs for updated endpoints.
4646

4747
## Supported LLM Providers
4848

49+
### Local Providers
4950
| Provider | Description | Status |
5051
|----------|-------------|--------|
5152
| [Ollama](https://ollama.ai/) | Local models (self-hosted) | Available |
5253
| [LM Studio](https://lmstudio.ai/) | Local models via OpenAI-compatible API (self-hosted) | Available |
54+
| [LlamaCpp](https://github.com/ggml-org/llama.cpp) | Local models via llama.cpp server (self-hosted) | Available |
55+
56+
### Cloud Providers
57+
| Provider | Description | Status |
58+
|----------|-------------|--------|
59+
| [LLM Gateway](https://dub.sh/try-llmgw) | **Recommended** - Unified API for multiple LLM providers with free models | Available |
5360
| [OpenAI](https://openai.com/) | GPT models | Available |
5461
| [Anthropic](https://www.anthropic.com/) | Claude models | Available |
5562
| [OpenRouter](https://openrouter.ai/) | Open source and private models | Available |
5663
| [Vercel AI Gateway](https://vercel.com/ai-gateway) | Open source and private models | Available |
57-
| [LlamaCpp](https://github.com/ggml-org/llama.cpp) | Local models via llama.cpp server (self-hosted) | Available |
58-
| [Google Gemini](https://ai.google.dev/) | Gemini models via OpenAI-compatible interface | Available |
64+
| [Google AI Studio](https://ai.google.dev/) | Gemini models via OpenAI-compatible interface | Available |
5965
| [Baseten](https://baseten.co/) | Cloud-hosted ML models with OpenAI-compatible API | Available |
6066

6167

@@ -180,27 +186,21 @@ OPENROUTER_API_KEY=your-openrouter-api-key
180186
# Baseten Configuration
181187
BASETEN_API_KEY=your-baseten-api-key
182188
BASETEN_BASE_URL=https://inference.baseten.co/v1
183-
```
184-
185-
### Google Gemini Setup
186189
187-
To use Google Gemini models:
190+
# LLM Gateway Configuration (Recommended)
191+
LLM_GATEWAY_API_KEY=your-llm-gateway-api-key
192+
```
188193

189-
1. Get your API key from [Google AI Studio](https://makersuite.google.com/app/apikey)
190-
2. Set `GOOGLE_AI_API_KEY` in your `.env` file
191-
3. Optionally configure `GEMINI_MODEL` (defaults to `gemini-2.5-flash-lite`)
194+
### LLM Gateway Setup (Recommended for Cloud Providers)
192195

193-
Available Gemini models:
194-
- `gemini-2.5-flash-lite` (default)
195-
- `gemini-2.5-flash`
196-
- `gemini-2.5-pro`
197-
- `gemini-pro-vision`
196+
[LLM Gateway](https://dub.sh/try-llmgw) provides a unified API to access multiple LLM providers with a single API key. It includes several free models to get started.
198197

199-
**Note**: The Gemini provider uses Google's OpenAI-compatible interface to maintain compatibility with AI SDK v4.
198+
1. Sign up at [LLM Gateway](https://dub.sh/try-llmgw)
199+
2. Get your API key from the dashboard
200+
3. Set `LLM_GATEWAY_API_KEY` in your `.env` file
200201

201202
**Important:** Make sure to add `.env` to your `.gitignore` file to avoid committing sensitive information to version control.
202203

203-
204204
## Tech Stack
205205

206206
- Hono for the API server

src/config/models.json

Lines changed: 24 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,15 @@
2626
{ "name": "nvidia_nvidia-nemotron-nano-9b-v2", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "NVIDIA Nemotron Nano 9B v2; compact high-performance model." }
2727
]
2828
},
29+
"llmgateway": {
30+
"enabled": true,
31+
"models": [
32+
{ "name": "gpt-oss-20b-free", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "Free GPT OSS 20B model via LLM Gateway." },
33+
{ "name": "kimi-k2-0905-free", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "Free Kimi K2 model via LLM Gateway." },
34+
{ "name": "deepseek-r1t2-chimera-free", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "DeepSeek via LLM Gateway." },
35+
{ "name": "gpt-4o-mini", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "GPT 4o mini via LLM Gateway." }
36+
]
37+
},
2938
"openai": {
3039
"enabled": true,
3140
"models": [
@@ -37,6 +46,20 @@
3746
{ "name": "gpt-4o", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "vision", "emailReply", "translate", "meetingNotes", "outline"], "notes": "OpenAI next-gen vision model with Q&A capabilities." }
3847
]
3948
},
49+
"anthropic": {
50+
"enabled": true,
51+
"models": [
52+
{ "name": "claude-3-haiku-20240307", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "Anthropic Claude 3 Haiku; fast and cost-effective with Q&A support." }
53+
]
54+
},
55+
"google": {
56+
"enabled": true,
57+
"models": [
58+
{ "name": "gemini-2.5-flash-lite", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "vision", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "Google Gemini 2.5 Flash Lite with fast processing and vision capabilities." },
59+
{ "name": "gemini-2.5-flash", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "vision", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "Google Gemini 2.5 Flash with advanced multimodal capabilities." },
60+
{ "name": "gemini-2.5-pro", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "vision", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "Google Gemini 2.5 Pro with enhanced reasoning and vision support." }
61+
]
62+
},
4063
"openrouter": {
4164
"enabled": true,
4265
"models": [
@@ -51,12 +74,6 @@
5174
{ "name": "nvidia/nemotron-nano-9b-v2", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "NVIDIA Nemotron Nano 9B v2; compact high-performance model." }
5275
]
5376
},
54-
"anthropic": {
55-
"enabled": true,
56-
"models": [
57-
{ "name": "claude-3-haiku-20240307", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "Anthropic Claude 3 Haiku; fast and cost-effective with Q&A support." }
58-
]
59-
},
6077
"aigateway": {
6178
"enabled": true,
6279
"models": [
@@ -67,14 +84,6 @@
6784

6885
]
6986
},
70-
"google": {
71-
"enabled": true,
72-
"models": [
73-
{ "name": "gemini-2.5-flash-lite", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "vision", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "Google Gemini 2.5 Flash Lite with fast processing and vision capabilities." },
74-
{ "name": "gemini-2.5-flash", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "vision", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "Google Gemini 2.5 Flash with advanced multimodal capabilities." },
75-
{ "name": "gemini-2.5-pro", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "vision", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "Google Gemini 2.5 Pro with enhanced reasoning and vision support." }
76-
]
77-
},
7887
"llamacpp": {
7988
"enabled": true,
8089
"models": [
@@ -86,7 +95,7 @@
8695
"models": [
8796
{ "name": "openai/gpt-oss-120b", "capabilities": ["summarize", "web-search", "pdf-summarizer", "pdf-translate", "rewrite", "compose", "planning", "keywords", "sentiment", "askText", "emailReply", "translate", "meetingNotes", "outline"], "notes": "Baseten hosted OpenAI GPT OSS 120B model with comprehensive capabilities." }
8897
]
89-
}
98+
}
9099
}
91100
}
92101

src/config/services.ts

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,14 @@ export interface BasetenConfig extends ServiceConfig {
6464
timeout?: number;
6565
}
6666

67+
export interface LLMGatewayConfig extends ServiceConfig {
68+
apiKey: string;
69+
baseURL: string;
70+
model: string;
71+
chatModel: string;
72+
timeout?: number;
73+
}
74+
6775
// OpenAI Configuration
6876
export const openaiConfig: OpenAIConfig = {
6977
name: 'OpenAI',
@@ -157,8 +165,20 @@ export const basetenConfig: BasetenConfig = {
157165
timeout: parseInt(process.env.BASETEN_TIMEOUT || '30000'),
158166
};
159167

168+
// LLM Gateway Configuration
169+
export const llmgatewayConfig: LLMGatewayConfig = {
170+
name: 'LLMGateway',
171+
enabled: !!process.env.LLM_GATEWAY_API_KEY,
172+
priority: 10,
173+
apiKey: process.env.LLM_GATEWAY_API_KEY || '',
174+
baseURL: process.env.LLM_GATEWAY_BASE_URL || 'https://api.llmgateway.io/v1',
175+
model: process.env.LLM_GATEWAY_MODEL || 'gpt-oss-20b-free',
176+
chatModel: process.env.LLM_GATEWAY_CHAT_MODEL || process.env.LLM_GATEWAY_MODEL || 'gpt-oss-20b-free',
177+
timeout: parseInt(process.env.LLM_GATEWAY_TIMEOUT || '30000'),
178+
};
179+
160180
// Available services
161-
export const availableServices = [openaiConfig, anthropicConfig, ollamaConfig, openrouterConfig, lmstudioConfig, aigatewayConfig, llamacppConfig, googleConfig, basetenConfig];
181+
export const availableServices = [openaiConfig, anthropicConfig, ollamaConfig, openrouterConfig, lmstudioConfig, aigatewayConfig, llamacppConfig, googleConfig, basetenConfig, llmgatewayConfig];
162182

163183
// Get the primary service (highest priority enabled service)
164184
export function getPrimaryService(): ServiceConfig | null {

src/schemas/v1/syntheticData.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ import { llmRequestSchema } from './llm'
44
/**
55
* Schema for JSON schema definition that users can provide
66
*/
7-
export const jsonSchemaSchema = z.record(z.any()).describe('JSON Schema definition for the synthetic data structure')
7+
export const jsonSchemaSchema = z.record(z.string(), z.unknown()).describe('JSON Schema definition for the synthetic data structure')
88

99
/**
1010
* Payload sent by the client for synthetic data generation endpoint.

src/services/interfaces.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import { z } from 'zod';
22

3-
export type ProviderName = 'openai' | 'anthropic' | 'ollama' | 'openrouter' | 'lmstudio' | 'aigateway' | 'llamacpp' | 'google' | 'baseten';
3+
export type ProviderName = 'openai' | 'anthropic' | 'ollama' | 'openrouter' | 'lmstudio' | 'aigateway' | 'llamacpp' | 'google' | 'baseten' | 'llmgateway';
44

55
export interface AIProvider {
66
name: ProviderName;

src/services/llmgateway.ts

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
import { z } from 'zod';
2+
import { createOpenAICompatible } from '@ai-sdk/openai-compatible';
3+
import { generateText, streamText, generateObject } from 'ai';
4+
import type { AIProvider } from './interfaces';
5+
import { llmgatewayConfig } from '../config/services';
6+
7+
const normalizedBase = (llmgatewayConfig.baseURL || 'https://api.llmgateway.io/v1').replace(/\/$/, '');
8+
const LLM_GATEWAY_BASE_URL = normalizedBase;
9+
10+
const llmgateway = createOpenAICompatible({
11+
name: 'llmgateway',
12+
baseURL: `${LLM_GATEWAY_BASE_URL}`,
13+
headers: {
14+
'Authorization': `Bearer ${llmgatewayConfig.apiKey}`,
15+
},
16+
});
17+
18+
class LLMGatewayProvider implements AIProvider {
19+
name = 'llmgateway' as const;
20+
21+
async generateChatStructuredResponse(
22+
prompt: string,
23+
schema: z.ZodType,
24+
model: string = llmgatewayConfig.chatModel,
25+
temperature: number = 0
26+
): Promise<any> {
27+
try {
28+
const modelToUse = model || llmgatewayConfig.chatModel;
29+
30+
// OpenAI-compatible APIs require the word "json" in the prompt when using response_format: json_object
31+
// The generateObject function uses json_object format, so we need to ensure "json" is in the prompt
32+
const promptWithJson = prompt.toLowerCase().includes('json')
33+
? prompt
34+
: `${prompt}\n\nReturn the response as valid JSON.`;
35+
36+
const result = await generateObject({
37+
model: llmgateway(modelToUse),
38+
schema,
39+
prompt: promptWithJson,
40+
temperature,
41+
});
42+
43+
return {
44+
object: result.object,
45+
finishReason: result.finishReason,
46+
usage: {
47+
promptTokens: result.usage?.promptTokens || 0,
48+
completionTokens: result.usage?.completionTokens || 0,
49+
totalTokens: result.usage?.totalTokens || 0,
50+
},
51+
warnings: result.warnings,
52+
};
53+
} catch (error) {
54+
throw new Error(`LLM Gateway structured response error: ${error}`);
55+
}
56+
}
57+
58+
59+
async generateChatTextResponse(
60+
prompt: string,
61+
model?: string,
62+
temperature: number = 0
63+
): Promise<any> {
64+
try {
65+
const modelToUse = llmgateway(model || llmgatewayConfig.model);
66+
67+
const result = await generateText({
68+
model: modelToUse,
69+
prompt,
70+
temperature,
71+
});
72+
73+
return result;
74+
} catch (error) {
75+
console.error('LLM Gateway text response error: ', error);
76+
throw new Error(`LLM Gateway text response error: ${error}`);
77+
}
78+
}
79+
80+
async generateChatTextStreamResponse(
81+
prompt: string,
82+
model?: string,
83+
temperature: number = 0
84+
): Promise<any> {
85+
try {
86+
const modelToUse = llmgateway(model || llmgatewayConfig.model);
87+
88+
const result = await streamText({
89+
model: modelToUse,
90+
prompt,
91+
temperature,
92+
});
93+
94+
return result;
95+
} catch (error) {
96+
console.error('LLM Gateway streaming response error: ', error);
97+
throw new Error(`LLM Gateway streaming response error: ${error}`);
98+
}
99+
}
100+
101+
async getAvailableModels(): Promise<string[]> {
102+
return [
103+
'gpt-oss-20b-free',
104+
'glm-4.5-air-free',
105+
'llama-3.3-70b-instruct-free',
106+
'glm-4.5-flash',
107+
'llama-4-maverick-free',
108+
'kimi-k2-0905-free',
109+
'llama-4-scout-free',
110+
];
111+
}
112+
}
113+
114+
const provider = new LLMGatewayProvider();
115+
116+
export default provider;
117+
export { LLM_GATEWAY_BASE_URL };

src/services/providers.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
import { z } from 'zod';
22

3-
export const providersSupported = z.enum(['ollama', 'openai', 'anthropic', 'openrouter', 'lmstudio', 'aigateway', 'llamacpp', 'google', 'baseten']);
3+
export const providersSupported = z.enum(['ollama', 'openai', 'anthropic', 'openrouter', 'lmstudio', 'aigateway', 'llamacpp', 'google', 'baseten', 'llmgateway']);

src/services/registry.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import aigatewayProvider from './aigateway';
99
import llamacppProvider from './llamacpp';
1010
import geminiProvider from './google';
1111
import basetenProvider from './baseten';
12+
import llmgatewayProvider from './llmgateway';
1213

1314
export class ServiceRegistry {
1415
private providers = new Map<ProviderName, AIProvider>();
@@ -46,6 +47,7 @@ serviceRegistry.register(aigatewayProvider);
4647
serviceRegistry.register(llamacppProvider);
4748
serviceRegistry.register(geminiProvider);
4849
serviceRegistry.register(basetenProvider);
50+
serviceRegistry.register(llmgatewayProvider);
4951

5052
// Helper for tests to replace the registry content
5153
export function replaceRegistryForTests(registry: ServiceRegistry) {

src/templates/askTextDemo.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,7 @@ <h2 class="text-lg font-semibold mb-3">Input</h2>
170170
<div>
171171
<label class="block text-sm font-medium text-gray-700 mb-1">Provider</label>
172172
<select id="provider" class="w-full p-2 rounded-lg border border-gray-300 focus:outline-none focus:ring-2 focus:ring-brand-purple">
173+
<option value="llmgateway">llmgateway</option>
173174
<option value="ollama">ollama</option>
174175
<option value="openai">openai</option>
175176
<option value="anthropic">anthropic</option>

src/templates/composeDemo.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -617,7 +617,7 @@ <h2 class="text-lg font-semibold">Output</h2>
617617
cancelBtn.addEventListener('click', (e) => { e.preventDefault(); cancel(); });
618618
loadSampleBtn.addEventListener('click', (e) => { e.preventDefault(); setSample(); });
619619

620-
initModels();
620+
initModels().then(() => setSample());
621621
updatePreview();
622622
</script>
623623
</body>

0 commit comments

Comments
 (0)