A lightweight, efficient proxy service that provides free and unlimited access to DeepInfra's AI models through their OpenAI-compatible API.
- 🆓 Free & Unlimited - Access DeepInfra models without rate limits or costs
- 🔄 Auto-rotating proxies - Uses a pool of public proxies that automatically refreshes
- 🛡️ Optional API key authentication - Secure your instance when needed
- 📊 Interactive Swagger UI - Easy-to-use API documentation
- 🔍 Model availability checks - Only exposes models that are actually accessible
- ⚡ Streaming support - Full support for streaming responses
- 🔄 OpenAI-compatible API - Drop-in replacement for OpenAI API clients
- 📋 OpenAI-compatible /v1/models endpoint - Standard models listing endpoint
- 🏷️ Model metadata - Enhanced model information with type categorization
- Go 1.20 or higher
- Docker (optional, for containerized deployment)
# Pull the Docker image from GitHub Container Registry
docker pull ghcr.io/metimol/deepinfra-wrapper:latest
# Run the container
docker run -p 8080:8080 ghcr.io/metimol/deepinfra-wrapper:latest# Build the Docker image
docker build -t deepinfra-proxy .
# Run the container
docker run -p 8080:8080 deepinfra-proxy# Download dependencies
go mod download
# Build the application
go build -o deepinfra-proxy .
# Run the application
./deepinfra-proxyYou can enable API key authentication by setting the API_KEY environment variable:
# With Docker
docker run -p 8080:8080 -e API_KEY=your-secret-key deepinfra-proxy
# Without Docker
API_KEY=your-secret-key ./deepinfra-proxyWhen API key authentication is enabled, clients will need to include the API key in the Authorization header:
Authorization: Bearer your-secret-key
POST /v1/chat/completions
Example request:
{
"model": "meta-llama/Llama-2-70b-chat-hf",
"messages": [
{
"role": "user",
"content": "Tell me a joke about programming"
}
],
"temperature": 0.7,
"max_tokens": 1000,
"stream": false
}GET /v1/models
Returns a list of all available models in OpenAI-compatible format. This endpoint follows the official OpenAI API specification and works with any OpenAI-compatible client.
Example response:
{
"object": "list",
"data": [
{
"id": "meta-llama/Llama-2-70b-chat-hf",
"object": "model",
"created": 1677610602,
"owned_by": "deepinfra"
},
{
"id": "mistralai/Mixtral-8x7B-Instruct-v0.1",
"object": "model",
"created": 1677610602,
"owned_by": "deepinfra"
}
]
}GET /models
Returns a simple array of model names. This endpoint is maintained for backward compatibility.
Example response:
[
"meta-llama/Llama-2-70b-chat-hf",
"mistralai/Mixtral-8x7B-Instruct-v0.1"
]GET /docs
Interactive Swagger UI documentation for exploring and testing the API endpoints.
GET /openapi.json
OpenAPI specification document that can be imported into API tools.
| Variable | Description | Default |
|---|---|---|
API_KEY |
Secret key for API authentication | None (authentication disabled) |
PORT |
Port to run the server on | 8080 |
- The proxy fetches and maintains a list of working public proxies
- It regularly checks which DeepInfra models are accessible and caches this list
- When a request comes in, it routes the request through one of the working proxies to DeepInfra
- If a proxy fails, it's automatically removed from the rotation
- New proxies are regularly added to the pool to ensure reliability
This service is fully compatible with the OpenAI API specification. You can use any OpenAI-compatible client library or tool by simply changing the base URL to point to your DeepInfra Wrapper instance.
POST /v1/chat/completions- Chat completions (matches OpenAI API)GET /v1/models- List available models (matches OpenAI API format)
- ✅ Chat completions
- ✅ Streaming responses
- ✅ Model listing
- ✅ Temperature and max_tokens parameters
- ✅ Message history and conversation context
- ✅ System messages
The service automatically categorizes models by type:
- Text Generation: LLaMA, GPT, Claude, Mistral, DeepSeek, Qwen models
- Audio: Whisper models for speech recognition
- Image: Stable Diffusion, SDXL models for image generation
- Embedding: Text embedding models
curl -X POST "http://localhost:8080/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-2-70b-chat-hf",
"messages": [
{
"role": "user",
"content": "Hello, how are you today?"
}
]
}'# OpenAI-compatible format
curl "http://localhost:8080/v1/models"
# Legacy format
curl "http://localhost:8080/models"from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1/",
api_key="your-api-key" # Only needed if API_KEY is set
)
# List available models
models = client.models.list()
print("Available models:")
for model in models.data:
print(f"- {model.id} (owned by {model.owned_by})")
# Chat completion
response = client.chat.completions.create(
model="meta-llama/Llama-2-70b-chat-hf",
messages=[
{"role": "user", "content": "What's the capital of France?"}
]
)
print(response.choices[0].message.content)import { OpenAI } from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:8080/v1/",
apiKey: "your-api-key", // Only needed if API_KEY is set
});
async function main() {
const response = await openai.chat.completions.create({
model: "meta-llama/Llama-2-70b-chat-hf",
messages: [
{ role: "user", content: "Explain quantum computing in simple terms" }
],
});
console.log(response.choices[0].message.content);
}
main();- The service depends on the availability of public proxies
- Response times may vary based on proxy performance
- Some models might become temporarily unavailable
Contributions are welcome! Please feel free to submit a Pull Request.