Welcome to your comprehensive guide for selecting the perfect Large Language Model (LLM) or API for your projects! Whether you're looking for free options, self-hostable solutions, or powerful paid APIs, we've got you covered. Let's dive in! π»π
These APIs offer free access to powerful LLMs with some limitations on usage volume or features.
| Name | Description | Link | Limitations |
|---|---|---|---|
| OpenRouter | Access to multiple LLMs including DeepHermes 3 and Llama 3 | OpenRouter | 20 requests/minute, 200 requests/day |
| DeepSeek R1 | Family of models with various sizes and distillations | DeepSeek | Rate limits vary by model |
| Dolphin 3.0 Mistral 24B | Implementation of the Mistral model | Dolphin | Rate limits apply |
| Gemini 2.0 Flash Lite Preview | Google's experimental AI model | Google AI Studio | Data usage restrictions outside UK/CH/EEA/EU |
| Mistral (La Plateforme) | Free tier requires opting into data training | Mistral | Requires phone number verification |
| HuggingFace Serverless Inference | Inference for various open models | HuggingFace | Limited to models smaller than 10GB |
| Cerebras | Offers Llama 3.1 8B model | Cerebras | Free tier restricted to 8K context |
| Together AI | Platform for developing and deploying AI models | Together AI | Rate limits apply |
Run these models locally on consumer-grade hardware like the RTX 4090.
| Model Name | Description | Strengths | Parameter Size | License | Quantization Support | Memory Requirements | Link |
|---|---|---|---|---|---|---|---|
| LLaMA 2 | Meta's open-source language model | Strong general-purpose performance, multilingual capabilities | 7B, 13B, 70B | Custom (non-commercial) | Yes | 7B: 10GB VRAM, 13B: 20GB VRAM | Meta LLaMA 2 |
| Mistral | Efficient architecture with strong performance | High efficiency, good reasoning capabilities | 7B, 70B | Apache 2.0 | Yes | 7B: 10GB VRAM, 70B: 40GB VRAM | Mistral AI |
| Falcon | Open-source model from Technology Innovation Institute | Strong coding and reasoning abilities | 7B, 40B | Apache 2.0 | Yes | 7B: 10GB VRAM, 40B: 35GB VRAM | Falcon LLM |
| Qwen | Alibaba Cloud's open-source model | Strong multilingual support and reasoning | 7B, 14B | Apache 2.0 | Yes | 7B: 10GB VRAM, 14B: 20GB VRAM | Qwen AI |
| OpenAssistant | Community-developed conversational model | Strong dialogue capabilities | 7B, 12B | Apache 2.0 | Yes | 7B: 10GB VRAM, 12B: 25GB VRAM | OpenAssistant |
| RedPajama | Open-source model based on LLaMA architecture | Good performance with efficient training | 3B, 7B | Apache 2.0 | Yes | 3B: 8GB VRAM, 7B: 15GB VRAM | RedPajama |
| Phi-4 | Microsoft's open-source model | Strong reasoning and mathematical abilities | 14B | MIT | Yes | 20GB VRAM | Microsoft Phi |
| GPT4All | Collection of models optimized for local deployment | Easy to deploy, good performance on consumer hardware | 3B, 7B | Varies by model | Yes | 3B: 8GB VRAM, 7B: 15GB VRAM | GPT4All |
| h2oGPT | H2O.ai's open-source model with RAG capabilities | Strong for document understanding and retrieval | 7B, 13B | Apache 2.0 | Yes | 7B: 10GB VRAM, 13B: 20GB VRAM | h2oGPT |
These APIs offer powerful LLM capabilities with flexible pricing models.
| Model | Provider | Input Price ($/million tokens) | Output Price ($/million tokens) | Context Length | Max Output Tokens | Link |
|---|---|---|---|---|---|---|
| gpt-4.5 | OpenAI | 75.00 | 150.00 | 128k | 16,384 | OpenAI |
| gpt-4o | OpenAI | 2.50 | 10.00 | 128k | 16,384 | OpenAI |
| o1-2024-12-17 | OpenAI | 15.00 | 60.00 | 200k | 100,000 | OpenAI |
| o3-mini-high | OpenAI | 1.10 | 4.40 | 200k | 100,000 | OpenAI |
| Claude 3.7 Sonnet | Anthropic | 3.00 | 15.00 | 200k | 128,000 | Anthropic |
| DeepSeek-R1 | DeepSeek | 0.55 | 2.19 | 64k | 8,000 | DeepSeek |
| DeepSeek-V3 | DeepSeek | 0.27 | 1.10 | 64k | 8,000 | DeepSeek |
| Gemini-2.0-Flash-001 | 0.10 | 0.40 | 1,000k | 8,192 | Google AI | |
| Qwen2.5-Max | Alibaba | 1.60 | 6.40 | 32k | 8,192 | Qwen AI |
| Qwen-Plus-0125 | Alibaba | 0.40 | 1.20 | 131k | 8,192 | Qwen AI |
- Budget: Determine if you need a free solution or if you're willing to pay for enhanced capabilities.
- Use Case: Match the model's strengths to your specific needs (e.g., coding, multilingual support, reasoning).
- Hardware: If self-hosting, ensure your hardware meets the model's memory requirements.
- Scalability: Consider if you need to scale usage up or down based on demand.
- Integration: Check if the API or model works with your existing systems and workflows.
- OpenLLM: Simplifies running open-source LLMs with API endpoints
- llama.cpp: Optimized C/C++ implementation for running models on CPU/GPU
- vLLM: High-performance inference engine for LLMs
- Text Generation Inference (TGI): Fast inference solution from Hugging Face
If you know of other free LLM APIs or self-hostable models that should be included in this list, please submit a pull request with the relevant information.
This repository is licensed under the MIT License.