Compare the costs of running LLM inference on local hardware vs cloud GPU rental vs API providers.
A decision-making tool for compute purchases. Demand for AI compute is growing faster than supply, and these markets are becoming more complex. The goal is to help developers make informed decisions using real benchmark data rather than vibes and vendor marketing. See VISION.md for more on where this is heading.
- Local Hardware: Mac Studio M3 Ultra (96GB–512GB), NVIDIA DGX Spark
- Cloud GPU Rental: RunPod, Denvr, Lambda, GCP, AWS (H100s)
- API Providers: Groq, Together.ai, Fireworks, DeepInfra, OpenAI, Moonshot
- Models by Developer:
- OpenAI: gpt-oss-20b, gpt-oss-120b
- Meta: Llama 3.1 8B, 70B, 405B
- DeepSeek: DeepSeek Coder 33B, DeepSeek V3
- Alibaba: Qwen2.5 7B, 32B, 72B
- Moonshot: Kimi K2
- Defog: SQLCoder 7B, 34B, 70B
- Calculations: Daily/monthly costs, payoff period for hardware investment
# Copy config template
cp astro.config.example.mjs astro.config.mjs
# Install dependencies
npm install
# Run development server
npm run dev
# Build for production
npm run build
# Preview production build
npm run preview- Fork/clone this repository
- Update
astro.config.mjs:- Change
siteto your GitHub Pages URL - Change
baseto your repository name
- Change
- Enable GitHub Pages in repository settings:
- Go to Settings → Pages
- Source: GitHub Actions
- Push to
mainbranch — deployment is automatic
All benchmarks use Q4_K_M quantization (unless noted), batch size 1, decode speed for interactive use.
| Source | Description |
|---|---|
| llama.cpp M-series Discussion | Apple Silicon benchmarks |
| llama.cpp DGX Spark Discussion | NVIDIA DGX Spark benchmarks |
| dlewis.io H100 Evaluation | Llama 3.3 70B on H100 vs A100 |
| VALDI H100 Docs | Llama 3.1 inference testing |
| Hardware Corner DeepSeek | DeepSeek V3 on Mac Studio |
| MacRumors DeepSeek R1 | DeepSeek R1 on M3 Ultra |
| NVIDIA gpt-oss Blog | gpt-oss acceleration |
| OpenAI gpt-oss Intro | gpt-oss model specs |
| Moonshot Kimi K2 | Kimi K2 specifications |
- RunPod: $1.99
- Denvr: $2.10
- Lambda: $2.99
- GCP: $3.00
- AWS: $3.90
- Mac Studio M3 Ultra: Apple.ca CAD pricing converted at 0.72 USD/CAD
- NVIDIA DGX Spark: $3,999 USD
- Astro — Static site generator
- React — Interactive calculator component
- Tailwind CSS — Styling
src/
├── components/
│ └── PayoffCalculator.jsx # Main interactive component
├── data/
│ ├── models.json # Model specs by developer
│ ├── hardware.json # Hardware pricing
│ ├── cloud-providers.json # GPU rental pricing
│ └── api-providers.json # API pricing by model
├── lib/
│ └── calculations.js # Pure calculation functions
└── pages/
└── index.astro # Main page
Corrections and updates welcome! The data in this tool will get stale as prices change and new hardware ships. Please open an issue or PR if you spot outdated information.
MIT