Skip to content

Latest commit

 

History

History
975 lines (795 loc) · 33.3 KB

File metadata and controls

975 lines (795 loc) · 33.3 KB

ProxyGateLLM — System Architecture

Architecture Version

Comprehensive architecture documentation for ProxyGateLLM v6.0.0


Table of Contents


Overview

ProxyGateLLM is a multi-provider LLM proxy that aggregates 9 free AI providers into a single, unified API. It exposes OpenAI-compatible and Anthropic-compatible endpoints, automatically routes requests to the best available provider, and handles failover transparently.

Design Principles

Principle Implementation
Zero-config core 5 of 9 providers work without API keys
Transparent failover Requests automatically retry on next provider
API compatibility Drop-in replacement for OpenAI and Anthropic SDKs
Streaming-first SSE streaming with async iterators and ReadableStreams
Health-aware routing Unhealthy providers are deprioritized automatically
Extensible Add new providers by extending BaseProvider

System Diagram

                            ┌─────────────────────────┐
                            │     Client / SDK         │
                            │  (OpenAI / Anthropic)    │
                            └───────────┬─────────────┘
                                        │  HTTP
                                        ▼
┌───────────────────────────────────────────────────────────────────────┐
│                       ProxyGateLLM v6.0.0                      │
│                                                                       │
│  ┌─────────────────────────────────────────────────────────────────┐ │
│  │ Layer 1: API Gateway (Express 5.x)                             │ │
│  │   /v1/chat/completions  /v1/messages  /chat  /health  /models  │ │
│  └──────────────────────────┬──────────────────────────────────────┘ │
│                              │                                        │
│  ┌──────────────────────────▼──────────────────────────────────────┐ │
│  │ Layer 2: Request Router                                         │ │
│  │   resolveModel()  pickModel()  getTaskType()                    │ │
│  └──────────────────────────┬──────────────────────────────────────┘ │
│                              │                                        │
│  ┌──────────────────────────▼──────────────────────────────────────┐ │
│  │ Layer 3: Provider Manager                                       │ │
│  │   route()  chatWithFailover()  chatStreamWithFailover()         │ │
│  │   round-robin  health tracking  latency scoring                 │ │
│  └──────────────────────────┬──────────────────────────────────────┘ │
│                              │                                        │
│  ┌──────────────────────────▼──────────────────────────────────────┐ │
│  │ Layer 4: Provider Adapters (9 providers)                        │ │
│  │   Puter  Pollinations  DuckDuckGo  OpenRouter  Groq            │ │
│  │   HuggingFace  G4F  Blackbox  Phind                             │ │
│  └──────────────────────────┬──────────────────────────────────────┘ │
│                              │                                        │
│  ┌──────────────────────────▼──────────────────────────────────────┐ │
│  │ Layer 5: Model Sync Service                                     │ │
│  │   Auto-fetch  Caching  Periodic refresh                         │ │
│  └─────────────────────────────────────────────────────────────────┘ │
│                                                                       │
└───────────────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
    ┌─────────────┐  ┌──────────────┐  ┌──────────────┐
    │  Puter API  │  │ Pollinations │  │  DuckDuckGo  │
    │  OpenRouter │  │    Groq      │  │ HuggingFace  │
    │  G4F/Py     │  │  Blackbox    │  │    Phind     │
    └─────────────┘  └──────────────┘  └──────────────┘

5-Layer Architecture

Layer 1: API Gateway

File: index.js Framework: Express 5.x

The API Gateway handles all inbound HTTP traffic, request validation, response formatting, and CORS.

Responsibilities

Responsibility Implementation
HTTP routing Express route definitions
Request parsing express.json() with 10MB limit
Input validation validateChatRequest, validateMessagesRequest middleware
Message sanitization sanitizeMessages() strips null bytes
Rate limiting Per-IP sliding window (rateLimiter middleware)
API key auth Optional API_KEY_AUTH middleware
CORS Configurable Access-Control-Allow-Origin
Response formatting OpenAI and Anthropic response shapes
SSE streaming sendSSE(), sendSSEDone() helpers
Error handling Centralized error handler + safeErrorResponse()

Endpoint → Layer Mapping

POST /v1/chat/completions  →  Layer 2 (resolveModel)  →  Layer 3 (chatWithFailover)
POST /v1/messages          →  Layer 2 (resolveModel)  →  Layer 3 (chatWithFailover)
POST /chat                 →  Layer 2 (pickModel)     →  Layer 3 (chatWithFailover)
GET  /health               →  Direct response
GET  /status               →  Layer 3 + Layer 5 stats
GET  /models               →  Provider Registry (getAllModels)
GET  /providers            →  Provider Registry (getStats)
GET  /providers/:name/health → Layer 4 (checkHealth)

Request Processing Pipeline

Request → CORS → Rate Limit → API Key Auth → Body Parse → Validate → Sanitize → Route → Respond

Layer 2: Request Router

File: router.js

The Request Router resolves model names and performs intelligent auto-routing based on message content.

Functions

Function Purpose
resolveModel(rawModel) Resolves model aliases to canonical IDs; returns null for "auto"
pickModel(messages) Auto-selects model based on task type detection
getTaskType(messages) Classifies message content into task categories

Model Resolution Flow

Input: rawModel
  │
  ├─ null / undefined / "auto"  →  returns null (triggers pickModel)
  │
  ├─ Known alias (e.g., "gpt4")  →  looks up MODEL_ALIASES → returns canonical ID
  │
  └─ Unknown string  →  returns as-is (passed to provider matching)

Auto-Routing Decision Tree

pickModel(messages)
  │
  ├─ Empty content  →  "deepseek-chat"
  │
  ├─ BUILDING keywords detected?
  │   (code, implement, function, debug, fix, refactor, sql, api, deploy, docker, ...)
  │   └─ YES  →  "claude-opus-4-5-latest"
  │
  ├─ PLANNING keywords detected?
  │   (plan, design, strategy, analyze, compare, roadmap, architect, ...)
  │   └─ YES  →  "deepseek-chat"
  │
  ├─ REASONING keywords detected?
  │   (reason, solve, explain, how does, why is, step by step, calculate, ...)
  │   └─ YES  →  "gpt-4o"
  │
  ├─ Short query or question?
  │   (text.length < 100 OR contains "?")
  │   └─ YES  →  "gpt-4o-mini"
  │
  └─ DEFAULT  →  "deepseek-chat"

Task Type Classification

Task Type Keywords Auto-Selected Model
code code, implement, function, class, debug, fix, refactor, deploy, build claude-opus-4-5-latest
planning plan, design, strategy, analyze, roadmap, architect deepseek-chat
reasoning reason, solve, explain, calculate, prove gpt-4o
fast Short text (<100 chars) or contains ? gpt-4o-mini
general Default fallback deepseek-chat

Layer 3: Provider Manager

File: utils/provider-manager.js

The Provider Manager is the brain of the routing system. It selects providers, manages health states, implements round-robin load balancing, and handles failover.

Core Methods

Method Purpose
start() Runs initial health checks, starts periodic health check interval
stop() Clears health check interval
route(modelId, messages) Find best provider for a model
chatWithFailover(modelId, messages) Execute chat with automatic retry on failure
chatStreamWithFailover(modelId, messages) Execute streaming chat with retry
runHealthChecks() Check health of all enabled providers
_roundRobin(modelId, providers) Round-robin selection among providers

Provider Selection Algorithm

route(modelId, messages)
  │
  ├─ Get all providers that support this model
  │
  ├─ Filter out "down" providers
  │
  ├─ Sort by:
  │   1. Health status: healthy > unknown > degraded > down
  │   2. Provider priority: P1 > P2 > P3
  │   3. Average latency: lower is better
  │
  ├─ If multiple top-priority healthy providers:
  │   └─ Round-robin among them
  │
  ├─ If all healthy providers filtered out:
  │   └─ Try round-robin on degraded providers
  │
  └─ Last resort:
      └─ Return first provider regardless of health

Health Check Flow

runHealthChecks()
  │
  ├─ For each enabled provider (parallel):
  │   │
  │   ├─ Race: provider.checkHealth() vs 10s timeout
  │   │
  │   ├─ Success → status = "healthy", record latency
  │   │
  │   └─ Failure → status degrades:
  │       healthy → degraded
  │       degraded → down
  │       unknown → stays unknown
  │
  └─ Log: "Health check: 7/9 providers healthy"

Failover Execution

chatWithFailover(modelId, messages)
  │
  ├─ Get all providers for this model
  │
  ├─ Sort: healthy first, then by priority
  │
  ├─ For each provider (in order):
  │   │
  │   ├─ Try: provider.chat(messages, {model})
  │   │   ├─ Success → record latency, mark healthy, return result
  │   │   └─ Failure → record error, degrade health, try next
  │   │       healthy → degraded
  │   │       degraded → down
  │   │
  │   └─ Continue to next provider
  │
  └─ All failed → throw lastError

Layer 4: Provider Adapters

File: providers/

Nine provider adapters, all extending the BaseProvider abstract class. Each adapter implements the provider-specific communication protocol.

BaseProvider Abstract Class

File: providers/base.js

class BaseProvider {
  // Core properties
  name, displayName, priority, enabled, models, baseUrl, timeout
  healthStatus, lastHealthCheck, requestCount, errorCount, avgLatency

  // Abstract methods (must be implemented)
  async chat(messages, options = {})        // Non-streaming chat
  async chatStream(messages, options = {})  // Streaming chat
  async checkHealth()                       // Health check

  // Optional override
  async fetchModels()                       // Auto-fetch models from API

  // Provided utilities
  supportsModel(modelId)                    // Check if model is supported
  getModel(modelId)                         // Get model config
  recordRequest(latency, success)           // Track metrics
  getStats()                                // Return provider statistics
  formatOpenAIResponse(content, model)      // Format as OpenAI response
  formatAnthropicResponse(content, model)   // Format as Anthropic response
}

Provider Registry

File: providers/index.js

The ProviderRegistry is a singleton that manages all provider instances:

Method Purpose
init() Instantiate all providers, check DISABLE_* env vars
getProvider(name) Get a specific provider by name
getEnabledProviders() Get all enabled providers
getProvidersByPriority() Get enabled providers sorted by priority
getProvidersForModel(modelId) Get providers that support a model
getAllModels() Get deduplicated model list with provider mapping
getStats() Aggregate statistics

Adapter Overview

Provider Class Priority Auth Required Streaming Protocol
Puter.js PuterProvider P1 Optional Native SDK SDK (puter.ai.chat)
Pollinations PollinationsProvider P1 None SSE via fetch HTTP POST
DuckDuckGo DuckDuckGoProvider P1 None SSE (VQD token) HTTP POST + VQD
OpenRouter OpenRouterProvider P1 Optional SSE via fetch OpenAI API
Groq GroqProvider P2 Required SSE via fetch OpenAI API
HuggingFace HuggingFaceProvider P2 Required Simulated* HF Inference API
G4F G4FProvider P2 None Simulated* Python subprocess
Blackbox BlackboxProvider P3 None Custom stream HTTP POST
Phind PhindProvider P3 None Simulated* HTTP POST

*Simulated streaming = non-streaming request wrapped in an async generator that yields a single chunk.


Layer 5: Model Sync Service

File: utils/model-sync.js

The Model Sync Service automatically fetches and updates model lists from providers that expose a models API (e.g., OpenRouter, Pollinations, Groq).

How It Works

ModelSyncService
  │
  ├─ start()
  │   ├─ Initial sync: syncModels()
  │   └─ Set interval (default: 1 hour)
  │
  ├─ syncModels()
  │   │
  │   ├─ For each enabled provider:
  │   │   ├─ Race: provider.fetchModels() vs 15s timeout
  │   │   ├─ If new models returned → update provider.models
  │   │   └─ Log: "OpenRouter: 0 → 45 models (+45 new)"
  │   │
  │   └─ Record sync history (last 10 syncs)
  │
  ├─ getStats()
  │   └─ Returns lastSync, interval, history
  │
  └─ stop()
      └─ Clear interval

Providers with Auto-Fetch

Provider fetchModels() Source
OpenRouter Yes GET https://openrouter.ai/api/v1/models (free models only)
Pollinations Yes GET https://text.pollinations.ai/models
Groq Yes GET https://api.groq.com/openai/v1/models
Others No (returns static config) config/providers.js

Data Flow

Non-Streaming Chat Request

Client
  │
  │  POST /v1/chat/completions
  │  {model: "gpt-4o", messages: [...]}
  │
  ▼
API Gateway (index.js)
  │
  ├─ validateChatRequest() ──── 400 if invalid
  ├─ sanitizeMessages() ─────── strip null bytes
  │
  ├─ resolveModel("gpt-4o") ─── "gpt-4o" (no alias)
  │
  ▼
Provider Manager (provider-manager.js)
  │
  ├─ getProvidersForModel("gpt-4o")
  │   → [PuterProvider (P1, healthy), G4FProvider (P2, healthy)]
  │
  ├─ Sort by health + priority + latency
  │   → [PuterProvider, G4FProvider]
  │
  ├─ Try PuterProvider.chat(messages, {model: "gpt-4o"})
  │   ├─ Success → {result, provider: "puter", latency: 2340}
  │   └─ Failure → degrade to "degraded", try next
  │
  ▼
API Gateway (index.js)
  │
  ├─ Format as OpenAI response:
  │   {id, object: "chat.completion", model, choices, usage, _meta}
  │
  ▼
Client ← JSON response

Streaming Chat Request

Client
  │
  │  POST /v1/chat/completions
  │  {model: "gpt-4o", messages: [...], stream: true}
  │
  ▼
API Gateway (index.js)
  │
  ├─ Set SSE headers (Content-Type, Cache-Control, Connection)
  │
  ▼
Provider Manager (provider-manager.js)
  │
  ├─ chatStreamWithFailover("gpt-4o", messages)
  │   → PuterProvider.chatStream() → AsyncIterable
  │
  ▼
API Gateway (index.js) — Stream Loop
  │
  ├─ For await each chunk from providerStream:
  │   │
  │   ├─ If AsyncIterable (Puter, DuckDuckGo, Blackbox generators):
  │   │   └─ sendSSE(res, chunk)  →  "data: {json}\n\n"
  │   │
  │   ├─ If ReadableStream (Pollinations, OpenRouter, Groq fetch bodies):
  │   │   ├─ Read chunks via getReader()
  │   │   ├─ Parse SSE lines from buffer
  │   │   ├─ Extract content from delta
  │   │   └─ sendSSE(res, wrappedChunk)  →  "data: {json}\n\n"
  │   │
  │   └─ Continue until stream ends
  │
  ├─ sendSSEDone(res)  →  "data: [DONE]\n\n"
  └─ res.end()

Client ← SSE stream

Anthropic-Compatible Request

Client
  │
  │  POST /v1/messages
  │  {model: "claude-opus-4-5-latest", system: "...", messages: [...]}
  │
  ▼
API Gateway (index.js)
  │
  ├─ validateMessagesRequest() ──── 400 if invalid
  │
  ├─ Merge system + messages into unified array
  │
  ├─ resolveModel("claude-opus-4-5-latest") → "claude-opus-4-5-latest"
  │
  ▼
Provider Manager → Provider.chat(messages, {format: "anthropic"})
  │
  ▼
API Gateway
  │
  ├─ If result.type === "message" → pass through (native Anthropic format)
  ├─ Otherwise → format as Anthropic response:
  │   {id: "msg_...", type: "message", role: "assistant",
  │    content: [{type: "text", text: "..."}], model, stop_reason, usage, _meta}
  │
  ▼
Client ← Anthropic-format JSON

Streaming Anthropic Request

Same flow as OpenAI streaming, but chunks are reformatted:

OpenAI chunk:
  {choices: [{delta: {content: "Hello"}}]}

  ↓ Converted to ↓

Anthropic SSE event:
  event: content_block_delta
  data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

Stream termination:

event: message_stop
data: {"type":"message_stop"}

Provider Adapter Pattern

All providers follow the same interface defined by BaseProvider. This enables polymorphic routing — the Provider Manager can call any provider without knowing its implementation details.

Class Hierarchy

BaseProvider (abstract)
  │
  ├─ PuterProvider         ─── Puter.js SDK (puter.ai.chat)
  ├─ PollinationsProvider  ─── HTTP POST + SSE streaming
  ├─ DuckDuckGoProvider    ─── HTTP POST + VQD token + SSE
  ├─ OpenRouterProvider    ─── OpenAI API compatible
  ├─ GroqProvider          ─── OpenAI API compatible
  ├─ HuggingFaceProvider   ─── HF Inference API (single-turn)
  ├─ G4FProvider           ─── Python subprocess (g4f library)
  ├─ BlackboxProvider      ─── HTTP POST + custom response format
  └─ PhindProvider         ─── HTTP POST + SSE/JSON-line response

Adapter Implementation Patterns

Pattern 1: SDK-Based (Puter)

class PuterProvider extends BaseProvider {
  async chat(messages, options) {
    const response = await this.puter.ai.chat(messages, {model, stream: false});
    return this.formatOpenAIResponse(content, model);
  }

  async chatStream(messages, options) {
    return await this.puter.ai.chat(messages, {model, stream: true});
    // Returns AsyncIterable directly from SDK
  }
}

Pattern 2: OpenAI-Compatible HTTP (OpenRouter, Groq)

class OpenRouterProvider extends BaseProvider {
  async chat(messages, options) {
    const response = await fetch(`${baseUrl}/chat/completions`, { ... });
    const data = await response.json();
    return data; // Already in OpenAI format
  }

  async chatStream(messages, options) {
    const response = await fetch(`${baseUrl}/chat/completions`, { stream: true });
    return response.body; // Returns ReadableStream
  }
}

Pattern 3: Custom HTTP + SSE Transform (DuckDuckGo, Blackbox)

class DuckDuckGoProvider extends BaseProvider {
  async chat(messages, options) {
    // Custom protocol with VQD token
    const vqd = await this._getVQD();
    const response = await fetch(`${baseUrl}/duckchat/v1/chat`, { headers: {'x-vqd-4': vqd} });
    const content = await this._collectSSE(response);
    return this.formatOpenAIResponse(content, model);
  }

  async *chatStream(messages, options) {
    // Async generator that transforms provider SSE into OpenAI chunks
    const reader = response.body.getReader();
    while (true) {
      const {done, value} = await reader.read();
      if (done) break;
      yield {object: "chat.completion.chunk", choices: [{delta: {content}}]};
    }
  }
}

Pattern 4: Subprocess (G4F)

class G4FProvider extends BaseProvider {
  async chat(messages, options) {
    const script = this._buildScript(messages, model);
    const {stdout} = await execFileAsync('python3', ['-c', script]);
    return this.formatOpenAIResponse(stdout, model);
  }

  async chatStream(messages, options) {
    // No native streaming — wrap non-streaming result
    const result = await this.chat(messages, options);
    async function* singleChunk() { yield result; }
    return singleChunk();
  }
}

Streaming Abstraction

The gateway handles two streaming return types:

Return Type Providers Handling
AsyncIterable Puter, DuckDuckGo, Blackbox, Phind, HuggingFace, G4F for await (const chunk of stream)
ReadableStream Pollinations, OpenRouter, Groq reader = stream.getReader() + line parsing

The API Gateway normalizes both types into consistent OpenAI SSE format.


Health Check System

Health States

State Description Routing Behavior
healthy Provider responds correctly Normal priority
unknown Not yet checked or just started Treated as degraded
degraded Recent failure(s) Deprioritized, still used as fallback
down Multiple consecutive failures Excluded from routing (last resort only)

Health Check Methods by Provider

Provider Method What It Checks
Puter.js puter.ai.chat('ping', {model: 'gpt-4o-mini'}) SDK responds to chat
Pollinations POST / with ping message API responds with 200
DuckDuckGo _getVQD() token fetch VQD token obtainable
OpenRouter GET /models API reachable
Groq GET /models with auth API key valid + reachable
HuggingFace POST /models/{model} with "ping" API key valid + model available
G4F Check Python + g4f availability Dependencies installed
Blackbox POST /api/chat with minimal request API responds with 200
Phind HEAD / on base URL Site reachable

Health Degradation

Health status degrades progressively on failures:

healthy  ──(1 failure)──►  degraded  ──(2nd failure)──►  down

And recovers on success:

down  ──(success)──►  healthy
degraded  ──(success)──►  healthy

Periodic Health Checks

  • Interval: Configurable via HEALTH_CHECK_INTERVAL_MS (default: 60 seconds)
  • Timeout: 10 seconds per provider check
  • Parallelism: All providers checked in parallel via Promise.allSettled
  • Logging: "Health check: 7/9 providers healthy"

Failover Strategy

Failover Flow

Request arrives for model "gpt-4o"
  │
  ├─ Providers: [Puter(P1, healthy), G4F(P2, healthy)]
  │
  ├─ Try 1: Puter.chat()
  │   ├─ Success → return result
  │   └─ Error → Puter.healthStatus = "degraded"
  │              log: "puter failed for gpt-4o: timeout, trying next..."
  │
  ├─ Try 2: G4F.chat()
  │   ├─ Success → return result
  │   └─ Error → G4F.healthStatus degrades
  │              log: "g4f failed for gpt-4o: error, trying next..."
  │
  └─ All failed → throw "All providers failed for model: gpt-4o"

Streaming Failover

For streaming requests, failover works at the connection establishment phase only:

chatStreamWithFailover(modelId, messages)
  │
  ├─ Try provider.chatStream() — if connection fails, try next
  │
  └─ Once stream is established, errors are sent as SSE events
      (cannot failover mid-stream)

Provider Priority Tiers

Priority Providers Rationale
P1 (highest) Puter, Pollinations, DuckDuckGo, OpenRouter Free, no API key required, reliable
P2 Groq, HuggingFace, G4F Requires API key or Python dependency
P3 (lowest) Blackbox, Phind Reverse-engineered APIs, less stable

Round-Robin Algorithm

When multiple providers at the same priority level support the same model and are healthy, the round-robin algorithm distributes requests evenly.

Implementation

_roundRobin(modelId, providers) {
  const key = modelId;
  const current = this.roundRobinIndex.get(key) || 0;
  const next = current % providers.length;
  this.roundRobinIndex.set(key, next + 1);
  return providers[next];
}

Example

For model gpt-4o-mini supported by Puter (P1), Pollinations (P1), and DuckDuckGo (P1):

Request Round-Robin Index Selected Provider
1st 0 % 3 = 0 Puter
2nd 1 % 3 = 1 Pollinations
3rd 2 % 3 = 2 DuckDuckGo
4th 3 % 3 = 0 Puter
5th 4 % 3 = 1 Pollinations

The round-robin state is maintained per model ID in a Map, ensuring fair distribution across models independently.


Model Resolution Pipeline

The complete pipeline from user request to provider selection:

User Request: model = "gpt4"
  │
  ▼
Step 1: Alias Resolution (resolveModel)
  "gpt4" → MODEL_ALIASES["gpt4"] → "gpt-4o"
  │
  ▼
Step 2: Provider Matching (getProvidersForModel)
  "gpt-4o" → [PuterProvider, G4FProvider]
  │
  ▼
Step 3: Health Filtering
  Filter out "down" providers
  │
  ▼
Step 4: Priority Sorting
  Sort by: health > priority > latency
  [Puter(P1, healthy), G4F(P2, healthy)]
  │
  ▼
Step 5: Round-Robin (if multiple P1 healthy providers)
  Select next provider in rotation
  │
  ▼
Step 6: Execute with Failover
  Try selected provider → fallback to next on failure

Configuration System

Environment Variables

Variable Default Description
PORT 3333 Server listening port
CORS_ORIGIN * Allowed CORS origin(s)
API_KEY Optional API key for authentication
NODE_ENV development Environment mode (sanitizes errors in production)
RATELIMIT_WINDOW_MS 60000 Rate limit window in milliseconds
RATELIMIT_MAX_REQUESTS 100 Max requests per window per IP
HEALTH_CHECK_INTERVAL_MS 60000 Provider health check interval
MODEL_SYNC_INTERVAL_MS 3600000 Model sync interval (1 hour)
PUTER_AUTH_TOKEN Puter.js authentication token (optional)
GROQ_API_KEY Groq API key (required for Groq)
HUGGINGFACE_API_KEY HuggingFace API key (required for HF)
OPENROUTER_API_KEY OpenRouter API key (optional, enables paid models)

Provider Disable Flags

Each provider can be individually disabled:

Variable Effect
DISABLE_PUTER=true Disable Puter.js provider
DISABLE_POLLINATIONS=true Disable Pollinations provider
DISABLE_DUCKDUCKGO=true Disable DuckDuckGo provider
DISABLE_OPENROUTER=true Disable OpenRouter provider
DISABLE_GROQ=true Disable Groq provider
DISABLE_HUGGINGFACE=true Disable HuggingFace provider
DISABLE_G4F=true Disable G4F provider
DISABLE_BLACKBOX=true Disable Blackbox provider
DISABLE_PHIND=true Disable Phind provider

Example .env

PORT=3333
NODE_ENV=production
API_KEY=my-secret-gateway-key

# Provider keys (optional)
PUTER_AUTH_TOKEN=puter_token_here
GROQ_API_KEY=gsk_abc123...
HUGGINGFACE_API_KEY=hf_xyz789...
OPENROUTER_API_KEY=sk-or-...

# Rate limiting
RATELIMIT_WINDOW_MS=60000
RATELIMIT_MAX_REQUESTS=100

# Health & sync
HEALTH_CHECK_INTERVAL_MS=60000
MODEL_SYNC_INTERVAL_MS=3600000

# Disable unstable providers
DISABLE_BLACKBOX=true
DISABLE_PHIND=true

# CORS
CORS_ORIGIN=https://myapp.example.com

Project Structure

ProxyGateLLM/
├── index.js                    # Layer 1: Express server, endpoints
├── router.js                   # Layer 2: Model resolution, auto-routing
├── middleware.js                # Rate limiting, validation, auth
├── globals.js                  # Browser API polyfills for Node.js
│
├── config/
│   └── providers.js            # Provider config, model definitions, aliases
│
├── providers/
│   ├── base.js                 # BaseProvider abstract class
│   ├── index.js                # ProviderRegistry singleton
│   ├── puter.js                # Puter.js SDK adapter
│   ├── pollinations.js         # Pollinations AI adapter
│   ├── duckduckgo.js           # DuckDuckGo AI adapter
│   ├── openrouter.js           # OpenRouter adapter
│   ├── groq.js                 # Groq adapter
│   ├── huggingface.js          # HuggingFace adapter
│   ├── g4f.js                  # G4F/FreeGPT adapter
│   ├── blackbox.js             # Blackbox AI adapter
│   └── phind.js                # Phind adapter
│
├── utils/
│   ├── provider-manager.js     # Layer 3: Routing, failover, health
│   └── model-sync.js           # Layer 5: Model auto-fetch service
│
├── dashboard/
│   ├── index.html              # PWA dashboard
│   └── manifest.json           # PWA manifest
│
├── agent/
│   └── index.js                # AI agent module
│
├── polyfills.js                # Additional polyfills
├── preload.js                  # Module preload
├── client.js                   # Puter.js client wrapper
├── cli.mjs                     # CLI interface
│
├── package.json                # Dependencies & scripts
├── .env                        # Environment configuration
├── setup.sh                    # Installation script
├── start.sh                    # Startup script
└── puter-proxy.service         # systemd service file

Deployment

Development

npm install
npm run dev     # node --watch index.js (auto-restart on changes)

Production

npm install
npm start       # node index.js

Systemd (Linux)

sudo cp puter-proxy.service /etc/systemd/system/
sudo systemctl enable puter-proxy
sudo systemctl start puter-proxy

# View logs
sudo journalctl -u puter-proxy -f

# Restart
sudo systemctl restart puter-proxy

Process Management

The gateway handles graceful shutdown:

SIGTERM / SIGINT
  │
  ├─ Stop Provider Manager (clear health check interval)
  ├─ Stop Model Sync Service (clear sync interval)
  └─ process.exit(0)