Akamai inference Cloud - AI Quickstart gpt-oss-120b LLM

Automated deployment script to run your private, self-hosted LLM inference server on Akamai Cloud GPU instances. Pre-configured with OpenAI's gpt-oss-120b (120B parameter open-source model) — the most intelligent American open-weights model. Get vLLM and Open-WebUI up and running in minutes with a single command.

About gpt-oss-120b

gpt-oss-120b is OpenAI's flagship open-source model with 116.8B total parameters (5.1B active per token via MoE architecture). It achieves an Intelligence Index score of 58 on the Artificial Analysis benchmark, making it the top tier intelligent open-weights model available.

Hardware requirements: This deployment uses 4x RTX 4000 Ada GPUs (20GB VRAM each, 80GB total) to accommodate the model size (~69GB total, ~17GB per GPU with tensor parallelism) and support the full 128K token context length with FP8 KV cache.

Key advantages:

State-of-the-art performance: Achieves 90% on MMLU, 90% on MMLU-Pro, 80.9% on GPQA (PhD-level science), and 97.9% on AIME 2025 math benchmarks
Near o4-mini parity: Matches or exceeds OpenAI o4-mini on competition coding (Codeforces), general problem solving, and tool calling
Efficient architecture: MoE design with only 5.1B active parameters per token enables high throughput despite large total parameter count
Production-ready: Released under Apache 2.0 license, instruction-tuned for reliable, high-quality responses out of the box
Multi-GPU optimized: Tensor parallel across 4x RTX 4000 Ada GPUs for optimal inference performance

📚 Looking for different models?

Check out these other quickstart repositories:

Model	Parameters	Description	Repository
GPT-OSS-120B	120B	OpenAI's flagship open-source model (this repo)	ai-quickstart-gpt-oss-120b
GPT-OSS-20B	20B	Compact open-source GPT model	ai-quickstart-gpt-oss-20b
Qwen3-14B-FP8	14B	Qwen3 with FP8 quantization	ai-quickstart-qwen3-14b-fp8
NVIDIA Nemotron Nano 9B v2	9B	NVIDIA's efficient Nemotron model	ai-quickstart-nvidia-nemotron-nano-9b-v2

🚀 Quick Start

Just run this single command:

curl -fsSL https://raw.githubusercontent.com/linode/ai-quickstart-gpt-oss-120b/main/deploy.sh | bash

That's it! The script will download required files and guide you through the interactive deployment process.

✨ Features

Fully Automated Deployment: handles instance creation with real-time progress tracking
Basic AI Stack: vLLM for LLM inference with pre-loaded model and Open-WebUI for chat interface
Cross-Platform Support: Works on macOS and Windows (Git Bash/WSL)

🏗️ What Gets Deployed

Linode GPU Instance with

Ubuntu 24.04 LTS with NVIDIA drivers
Docker & NVIDIA Container Toolkit
Systemd service for automatic startup on reboot

Docker container

	Service	Description
	Caddy	Reverse proxy with automatic HTTPS (port 80/443)
	vLLM	High-throughput LLM inference engine with OpenAI-compatible API (port 8000)
	Open-WebUI	Feature-rich web interface for AI chat interactions (port 8080)

📋 Requirements

Akamai Cloud Account

Active Linode account with GPU access enabled

Local System Requirements

Required: bash, curl, ssh, jq
Note: jq will be auto-installed if missing

🚦 Getting Started

1. Option A: Single Command Execution

No installation required - just run:

curl -fsSL https://raw.githubusercontent.com/linode/ai-quickstart-gpt-oss-120b/main/deploy.sh | bash

1. Option B: Download and Run

Download the script and run locally:

curl -fsSLO https://raw.githubusercontent.com/linode/ai-quickstart-gpt-oss-120b/main/deploy.sh
bash deploy.sh

1. Option C: Clone Repository

If you prefer to inspect or customize the scripts:

git clone https://github.com/linode/ai-quickstart-gpt-oss-120b
cd ai-quickstart-gpt-oss-120b
./deploy.sh

Note

if you like to add more services check out docker compose template file

vi setup/docker-compose.yml

2. Follow Interactive Prompts

The script will ask you to:

Choose a region (e.g., us-east, eu-west)
Select GPU instance type
Provide instance label
Select or generate SSH keys
Confirm deployment

3. Wait for Deployment

The script automatically:

Creates GPU instance in your linode account
Monitors cloud-init installation progress
Waits for Open-WebUI health check
Waits for vLLM model loading

4. Access Your Services

Once complete, you'll see:

🎉 Setup Complete!

✅ Your AI LLM instance is now running!

🌐 Access URLs:
   Open-WebUI:  https://<ip-label>.ip.linodeusercontent.com

🔐 Access Credentials:
   SSH:   ssh -i /path/to/your/key root@<instance-ip>

Configuration files in GPU Instance

   # Bootstrap script executed by cloud-init (installs drivers, Docker, downloads setup files)
   /opt/ai-quickstart-gpt-oss-120b/bootstrap.sh

   # Setup script that runs after containers start (waits for services to be ready)
   /opt/ai-quickstart-gpt-oss-120b/setup.sh

   # Docker compose file called by systemctl at startup
   /opt/ai-quickstart-gpt-oss-120b/docker-compose.yml

   # Caddy reverse proxy configuration
   /opt/ai-quickstart-gpt-oss-120b/Caddyfile

   # Systemd service definitions
   /etc/systemd/system/ai-quickstart-gpt-oss-120b.service        # Main stack service
   /etc/systemd/system/ai-quickstart-gpt-oss-120b-setup.service  # Setup service (runs once)

🗑️ Delete Instance

To delete a deployed instance:

# Remote execution
curl -fsSL https://raw.githubusercontent.com/linode/ai-quickstart-gpt-oss-120b/main/delete.sh | bash -s -- <instance_id>

# Or download script and run
curl -fsSLO https://raw.githubusercontent.com/linode/ai-quickstart-gpt-oss-120b/main/delete.sh
bash delete.sh <instance_id>

The script will show instance details and ask for confirmation before deletion.

📁 Project Structure

ai-quickstart-gpt-oss-120b/
├── deploy.sh                    # Main deployment script
├── delete.sh                    # Instance deletion script
├── script/
│   └── quickstart_tools.sh      # Shared functions (API, OAuth, utilities)
├── setup/
│   ├── docker-compose.yml       # Docker Compose configuration
│   ├── Caddyfile                # Caddy reverse proxy configuration
│   └── setup.sh                 # Setup script (waits for services to be ready)
└── template/
    ├── cloud-init.yaml          # Cloud-init configuration
    └── bootstrap.sh             # Post-boot installation script (installs drivers, Docker)

🔒 Security

⚠️ IMPORTANT: By default, ports 80 and 443 are exposed to the internet

Immediate Security Steps

Configure Cloud Firewall (Recommended)
- Create Linode Cloud Firewall
- Restrict access to ports 80/443 by source IP
- Allow SSH (port 22) from trusted IPs only
SSH Security
- SSH key authentication required
- Root password provided for emergency console access only

📈 vLLM Runtime Specifications

The deployed vLLM instance runs with the following configuration:

Specification	Value
GPU Memory Utilization	95%
Max Context Length	131,072 tokens
KV Cache Type	FP8
KV Cache Size	~139K tokens
Available KV Cache Memory	1.19 GiB
Model Memory Usage	~69 GiB total (~17 GiB per GPU)
Max Concurrent Requests	2 (full context)
Tensor Parallel Size	4 GPUs

Performance Benchmarks

Benchmarked using vllm bench serve with random dataset:

Metric	128 input / 128 output	512 input / 256 output
Mean TTFT	84ms	232ms
P99 TTFT	106ms	633ms
Output Throughput	48.8 tok/s	21.3 tok/s
Peak Throughput	78 tok/s	78 tok/s
Mean ITL	27ms	29ms

🛠️ Useful Commands

# SSH into your instance
ssh -i /path/to/your/key root@<instance-ip>

# Check container status
docker ps -a

# Check Docker containers log
cd /opt/ai-quickstart-gpt-oss-120b && docker compose logs -f

# Check systemd service status
systemctl status ai-quickstart-gpt-oss-120b.service

# View systemd service logs
journalctl -u ai-quickstart-gpt-oss-120b.service -n 100

# Check cloud-init logs
tail -f /var/log/cloud-init-output.log -n 100

# Restart all services
systemctl restart ai-quickstart-gpt-oss-120b.service

# Check NVIDIA GPU status
nvidia-smi

# Check vLLM loaded models
curl http://localhost:8000/v1/models

# Check Open-WebUI health
curl http://localhost:8080/health

# Check vLLM container logs
docker logs vllm

📊 Privacy & Data Collection

Successful deployments register anonymous statistics (project name, region, instance type) to help improve the service. To opt out, remove the deployment_complete call from deploy.sh.

🤝 Contributing

Issues and pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.

📄 License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
setup		setup
template		template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
delete.sh		delete.sh
deploy.sh		deploy.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Akamai inference Cloud - AI Quickstart gpt-oss-120b LLM

About gpt-oss-120b

📚 Looking for different models?

🚀 Quick Start

✨ Features

🏗️ What Gets Deployed

Linode GPU Instance with

Docker container

📋 Requirements

Akamai Cloud Account

Local System Requirements

🚦 Getting Started

1. Option A: Single Command Execution

1. Option B: Download and Run

1. Option C: Clone Repository

2. Follow Interactive Prompts

3. Wait for Deployment

4. Access Your Services

Configuration files in GPU Instance

🗑️ Delete Instance

📁 Project Structure

🔒 Security

Immediate Security Steps

📈 vLLM Runtime Specifications

Performance Benchmarks

🛠️ Useful Commands

📊 Privacy & Data Collection

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Languages

License

linode/ai-quickstart-gpt-oss-120b

Folders and files

Latest commit

History

Repository files navigation

Akamai inference Cloud - AI Quickstart gpt-oss-120b LLM

About gpt-oss-120b

📚 Looking for different models?

🚀 Quick Start

✨ Features

🏗️ What Gets Deployed

Linode GPU Instance with

Docker container

📋 Requirements

Akamai Cloud Account

Local System Requirements

🚦 Getting Started

1. Option A: Single Command Execution

1. Option B: Download and Run

1. Option C: Clone Repository

2. Follow Interactive Prompts

3. Wait for Deployment

4. Access Your Services

Configuration files in GPU Instance

🗑️ Delete Instance

📁 Project Structure

🔒 Security

Immediate Security Steps

📈 vLLM Runtime Specifications

Performance Benchmarks

🛠️ Useful Commands

📊 Privacy & Data Collection

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages