Skip to content

Conversation

@Samso9th
Copy link

@Samso9th Samso9th commented Jun 3, 2025

🚀 Multi-Provider AI Support - Cost-Free & Privacy-First Options

This PR adds support for 5 AI providers, enabling users to run the MCP server completely free or at significantly reduced costs.

💰 Cost Benefits

  • 100% Free: Use Ollama for completely local deployment (no API costs)
  • Free Tier Friendly: Google Gemini offers generous free quotas
  • Cost Reduction: DeepSeek offers very affordable API rates
  • Flexibility: Mix providers (e.g., Ollama for embeddings + free Gemini for completions)

🔒 Privacy Benefits

  • Local-only option with Ollama - no data leaves your machine
  • Self-hosted models for sensitive enterprise use cases
  • Reduced vendor lock-in with multiple provider options

🎯 Supported Providers

Provider Embeddings Completions Cost Privacy
Ollama ✅ (768d) FREE Local
Gemini ✅ (768d) Free tier available Cloud
OpenAI ✅ (1536d) Paid Cloud
DeepSeek Very cheap Cloud
Anthropic Paid Cloud

🛠 Technical Improvements

  • Backward compatible - existing OpenAI configs work unchanged
  • Async architecture - proper async/await throughout
  • Provider abstraction - clean interface for future providers
  • Flexible embedding dimensions - supports 768, 1024, 1536 dimensions
  • Graceful fallbacks - handles providers without embedding support

📋 Configuration Example

# Completely free local setup
AI_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434

# Free tier cloud setup  
AI_PROVIDER=gemini
GEMINI_API_KEY=your_free_gemini_key

# Existing OpenAI setup (unchanged)
AI_PROVIDER=openai  # or omit (default)
OPENAI_API_KEY=your_openai_key

🎯 Target Users

  • Broke Vibe Coders like me 🙂

Testing

  • All providers tested with embeddings and completions
  • Backward compatibility verified
  • Async performance improvements confirmed
  • Error handling and fallbacks working

This change democratizes access to the RAG capabilities by removing the barrier of required paid API access while maintaining full feature parity.

@coleam00
Copy link
Owner

Thank you for this PR @Samso9th! I would want to implement different providers differently though, so I'm am curious your thoughts on these points:

  1. I like your documentation and the providers you've chosen, but I don't really want to have a separate code file for each provider. I'd rather just rely on the OpenAI API compatibility that these providers have.

  2. And then I want to have a separation of embedding models and large language models, so that you can choose one for the embeddings and then another for the LLM. This will make it so that we can use OpenRouter, but not just default to zero vectors like you're doing. We could actually use OpenAI for embeddings and OpenRouter for the LLM, for example.

  3. If you could be more clear and maybe elaborate more on the async improvements you did, I would appreciate that!

@Samso9th
Copy link
Author

Samso9th commented Jun 19, 2025

I appreciate the feedback @coleam00, I've completely restructured the implementation to address your points:

OpenAI API Compatibility: I consolidated everything into a single OpenAICompatibleProvider that handles OpenAI, DeepSeek, Ollama, and OpenRouter using the OpenAI client library. Only kept separate files for Gemini and Anthropic since they have different APIs (i really need Gemini). With this; we have reduced the codebase significantly.

Embedding/LLM Separation: Implemented exactly what you described with a new ProviderManager class. You can now do EMBEDDING_PROVIDER=openai and LLM_PROVIDER=openrouter to use OpenAI embeddings with OpenRouter completions. No more zero vectors - proper embeddings from one provider, completions from another. Also enables cost optimization like using expensive OpenAI embeddings with cheap DeepSeek completions.

Async Improvements: Added proper async context management throughout, replaced synchronous HTTP requests with aiohttp sessions, implemented concurrent request handling for embedding batches, and added async-aware error handling with proper resource cleanup. This enables non-blocking I/O and better scalability.

The maintains full backward compatibility while adding the dual-provider functionality you requested for. The architecture is much cleaner now with less code duplication.

Also you can check this doc for a general summary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants