Complete reference for using the AICR API Server.
The AICR API Server provides HTTP REST access to recipe generation and bundle creation for GPU-accelerated infrastructure. Use the API for programmatic access to configuration recommendations and deployment artifacts.
┌──────────────┐ ┌──────────────┐
│ GET /recipe │─────▶│ Recipe │
└──────────────┘ └──────────────┘
│
▼
┌──────────────┐ ┌──────────────┐
│ POST /bundle │─────▶│ bundles.zip │
└──────────────┘ └──────────────┘
API vs CLI:
- Use the API for remote recipe generation and bundle creation
- Use the CLI for local operations, snapshot capture, and ConfigMap integration
| Feature | API | CLI |
|---|---|---|
| Recipe generation | ✅ GET /v1/recipe | ✅ aicr recipe |
| Bundle creation | ✅ POST /v1/bundle | ✅ aicr bundle |
| Snapshot capture | ❌ Use CLI | ✅ aicr snapshot |
| ConfigMap I/O | ❌ Use CLI | ✅ cm:// URIs |
| Agent deployment | ❌ Use CLI | ✅ aicr snapshot |
Local development (example):
http://localhost:8080
Start the local server:
docker pull ghcr.io/nvidia/aicrd:latest
docker run -p 8080:8080 ghcr.io/nvidia/aicrd:latestGenerate an optimized configuration recipe for your environment:
# GET: Basic recipe for H100 on EKS (query parameters)
curl "http://localhost:8080/v1/recipe?accelerator=h100&service=eks"
# GET: Training workload on Ubuntu
curl "http://localhost:8080/v1/recipe?accelerator=h100&service=eks&intent=training&os=ubuntu"
# POST: Recipe from criteria file (YAML body)
curl -X POST "http://localhost:8080/v1/recipe" \
-H "Content-Type: application/x-yaml" \
-d 'kind: RecipeCriteria
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
name: my-config
spec:
service: eks
accelerator: h100
intent: training'
# Save recipe to file
curl -s "http://localhost:8080/v1/recipe?accelerator=h100&service=eks" -o recipe.jsonCreate deployment bundles from a recipe:
# Pipe recipe directly to bundle endpoint
curl -s "http://localhost:8080/v1/recipe?accelerator=h100&service=eks" | \
curl -X POST "http://localhost:8080/v1/bundle?bundlers=gpu-operator" \
-H "Content-Type: application/json" -d @- -o bundles.zip
# Extract the bundles
unzip bundles.zip -d ./bundlesService information and available routes.
curl "http://localhost:8080/"Response:
{
"service": "aicrd",
"version": "v0.7.6",
"routes": ["/v1/recipe", "/v1/bundle"]
}Generate an optimized configuration recipe based on environment parameters.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
service |
string | any | K8s service: eks, gke, aks, oke, any |
accelerator |
string | any | GPU type: h100, gb200, a100, l40, any |
gpu |
string | any | Alias for accelerator |
intent |
string | any | Workload: training, inference, any |
os |
string | any | Node OS: ubuntu, rhel, cos, amazonlinux, any |
platform |
string | any | Platform/framework: kubeflow, any |
nodes |
integer | 0 | GPU node count (0 = any) |
Examples:
# Minimal request
curl "http://localhost:8080/v1/recipe"
# Specify accelerator
curl "http://localhost:8080/v1/recipe?accelerator=h100"
# Full specification
curl "http://localhost:8080/v1/recipe?service=eks&accelerator=h100&intent=training&os=ubuntu&nodes=8"
# Using gpu alias
curl "http://localhost:8080/v1/recipe?gpu=gb200&service=gke"
# Pretty print with jq
curl -s "http://localhost:8080/v1/recipe?accelerator=h100" | jq '.'Generate an optimized configuration recipe from a criteria file body. This endpoint provides an alternative to query parameters, accepting a Kubernetes-style RecipeCriteria resource in the request body.
Content Types:
application/json- JSON formatapplication/x-yaml- YAML format
Request Body:
The request body must be a RecipeCriteria resource:
kind: RecipeCriteria
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
name: my-criteria
spec:
service: eks
accelerator: gb200
os: ubuntu
intent: training
platform: kubeflow
nodes: 8Examples:
# POST with YAML body
curl -X POST "http://localhost:8080/v1/recipe" \
-H "Content-Type: application/x-yaml" \
-d 'kind: RecipeCriteria
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
name: training-config
spec:
service: eks
accelerator: h100
intent: training'
# POST with JSON body
curl -X POST "http://localhost:8080/v1/recipe" \
-H "Content-Type: application/json" \
-d '{
"kind": "RecipeCriteria",
"apiVersion": "aicr.nvidia.com/v1alpha1",
"metadata": {"name": "training-config"},
"spec": {
"service": "eks",
"accelerator": "h100",
"intent": "training"
}
}'
# POST with criteria file
curl -X POST "http://localhost:8080/v1/recipe" \
-H "Content-Type: application/yaml" \
-d @criteria.yaml
# Pretty print response
curl -s -X POST "http://localhost:8080/v1/recipe" \
-H "Content-Type: application/json" \
-d '{"kind":"RecipeCriteria","apiVersion":"aicr.nvidia.com/v1alpha1","spec":{"service":"eks","accelerator":"h100"}}' \
| jq '.'Response:
Same as GET /v1/recipe - returns a recipe JSON response.
Error Responses:
400 Bad Request- Invalid criteria format, missing required fields, or invalid enum values405 Method Not Allowed- Only GET and POST are supported
Response:
{
"apiVersion": "aicr.nvidia.com/v1alpha1",
"kind": "Recipe",
"metadata": {
"version": "v1.0.0",
"created": "2026-01-11T10:30:00Z",
"appliedOverlays": [
"base",
"eks",
"eks-training",
"gb200-eks-training"
]
},
"criteria": {
"service": "eks",
"accelerator": "gb200",
"intent": "training",
"os": "any",
"platform": "any"
},
"componentRefs": [
{
"name": "gpu-operator",
"version": "v25.3.3",
"order": 1,
"repository": "https://helm.ngc.nvidia.com/nvidia"
},
{
"name": "network-operator",
"version": "v25.4.0",
"order": 2,
"repository": "https://helm.ngc.nvidia.com/nvidia"
}
],
"constraints": {
"driver": {
"version": "580.82.07",
"cudaVersion": "13.1"
}
}
}Generate deployment bundles from a recipe.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
bundlers |
string | (all) | Comma-delimited list of bundler types to execute |
set |
string[] | Value overrides (format: bundler:path.to.field=value). Repeat for multiple. |
|
system-node-selector |
string[] | Node selectors for system components (format: key=value). Repeat for multiple. |
|
system-node-toleration |
string[] | Tolerations for system components (format: key=value:effect). Repeat for multiple. |
|
accelerated-node-selector |
string[] | Node selectors for GPU nodes (format: key=value). Repeat for multiple. |
|
accelerated-node-toleration |
string[] | Tolerations for GPU nodes (format: key=value:effect). Repeat for multiple. |
|
nodes |
int | 0 | Estimated number of GPU nodes (0 = unset). Written to Helm value paths declared in the registry under nodeScheduling.nodeCountPaths. |
deployer |
string | helm | Deployment method: helm or argocd |
Request Body:
The request body is the recipe (RecipeResult) directly. No wrapper object needed.
Supported Bundlers:
Bundler names correspond to component names in recipes/registry.yaml. Any component registered there can be passed as a bundler. Current components:
| Bundler | Description |
|---|---|
gpu-operator |
NVIDIA GPU Operator — driver and runtime lifecycle |
network-operator |
NVIDIA Network Operator — RDMA, SR-IOV, host networking |
aws-efa |
AWS Elastic Fabric Adapter device plugin (EKS) |
cert-manager |
TLS certificate management |
skyhook-operator |
OS-level node tuning and kernel configuration |
skyhook-customizations |
Environment-specific node tuning profiles |
nvsentinel |
GPU health monitoring and automated remediation |
nvidia-dra-driver-gpu |
Dynamic Resource Allocation driver for GPUs |
kube-prometheus-stack |
Prometheus, Grafana, Alertmanager monitoring stack |
prometheus-adapter |
Custom metrics for HPA scaling |
aws-ebs-csi-driver |
Amazon EBS CSI driver (EKS) |
k8s-ephemeral-storage-metrics |
Ephemeral storage usage metrics |
kai-scheduler |
DRA-aware gang scheduler with topology-aware placement |
dynamo-crds |
NVIDIA Dynamo inference serving CRDs |
dynamo-platform |
NVIDIA Dynamo inference serving platform |
kgateway-crds |
Kubernetes Gateway API CRDs |
kgateway |
Kubernetes Gateway API implementation |
kubeflow-trainer |
Kubeflow Training Operator for distributed training |
Examples:
# Basic: pipe recipe to bundle (GPU Operator only)
curl -s "http://localhost:8080/v1/recipe?accelerator=h100&service=eks" | \
curl -X POST "http://localhost:8080/v1/bundle?bundlers=gpu-operator" \
-H "Content-Type: application/json" -d @- -o bundles.zip
# Advanced: with value overrides and ArgoCD deployer
curl -s "http://localhost:8080/v1/recipe?accelerator=h100&service=eks" | \
curl -X POST "http://localhost:8080/v1/bundle?bundlers=gpu-operator&deployer=argocd&repo=https://github.com/my-org/my-gitops-repo.git&set=gpuoperator:gds.enabled=true" \
-H "Content-Type: application/json" -d @- -o bundles.zip
# With node scheduling for system and GPU nodes
curl -X POST "http://localhost:8080/v1/bundle?bundlers=gpu-operator&system-node-selector=nodeGroup=system&system-node-toleration=dedicated=system:NoSchedule&accelerated-node-selector=nvidia.com/gpu.present=true&accelerated-node-toleration=nvidia.com/gpu=present:NoSchedule" \
-H "Content-Type: application/json" \
-d @recipe.json \
-o bundles.zip
# Generate GPU Operator bundle from saved recipe
curl -X POST "http://localhost:8080/v1/bundle?bundlers=gpu-operator" \
-H "Content-Type: application/json" \
-d @recipe.json \
-o bundles.zip
# Generate all available bundles (no bundlers param)
curl -X POST "http://localhost:8080/v1/bundle" \
-H "Content-Type: application/json" \
-d '{
"apiVersion": "aicr.nvidia.com/v1alpha1",
"kind": "Recipe",
"componentRefs": [
{"name": "gpu-operator", "version": "v25.3.3", "type": "helm"},
{"name": "network-operator", "version": "v25.4.0", "type": "helm"}
]
}' \
-o bundles.zip
# Generate multiple specific bundles
curl -X POST "http://localhost:8080/v1/bundle?bundlers=gpu-operator,network-operator" \
-H "Content-Type: application/json" \
-d '{
"apiVersion": "aicr.nvidia.com/v1alpha1",
"kind": "Recipe",
"componentRefs": [
{"name": "gpu-operator", "version": "v25.3.3", "type": "helm"},
{"name": "network-operator", "version": "v25.4.0", "type": "helm"}
]
}' \
-o bundles.zipResponse Headers:
| Header | Description | Example |
|---|---|---|
Content-Type |
Always application/zip |
application/zip |
Content-Disposition |
Download filename | attachment; filename="bundles.zip" |
X-Bundle-Files |
Total files in archive | 10 |
X-Bundle-Size |
Uncompressed size (bytes) | 45678 |
X-Bundle-Duration |
Generation time | 1.234s |
Bundle Structure:
bundles.zip
├── gpu-operator/
│ ├── values.yaml # Helm chart values
│ ├── manifests/
│ │ ├── clusterpolicy.yaml # ClusterPolicy CR
│ │ └── dcgm-exporter.yaml # DCGM Exporter config
│ ├── scripts/
│ │ ├── install.sh # Installation script
│ │ └── uninstall.sh # Cleanup script
│ ├── README.md # Deployment instructions
│ └── checksums.txt # SHA256 checksums
└── network-operator/
└── ...
Service health check (liveness probe).
curl "http://localhost:8080/health"Response:
{
"status": "healthy",
"timestamp": "2026-01-11T10:30:00Z"
}Service readiness check (readiness probe).
curl "http://localhost:8080/ready"Response:
{
"status": "ready",
"timestamp": "2026-01-11T10:30:00Z"
}Prometheus metrics endpoint.
curl "http://localhost:8080/metrics"Key Metrics:
| Metric | Type | Description |
|---|---|---|
aicr_http_requests_total |
counter | Total HTTP requests by method, path, status |
aicr_http_request_duration_seconds |
histogram | Request latency distribution |
aicr_http_requests_in_flight |
gauge | Current concurrent requests |
aicr_rate_limit_rejects_total |
counter | Rate limit rejections |
Fetch a recipe and generate bundles in one workflow:
#!/bin/bash
# Step 1: Get recipe for H100 on EKS for training
echo "Fetching recipe..."
curl -s "http://localhost:8080/v1/recipe?accelerator=h100&service=eks&intent=training" \
-o recipe.json
# Display recipe summary
echo "Recipe components:"
jq -r '.componentRefs[] | " - \(.name): \(.version)"' recipe.json
# Step 2: Generate bundles from recipe (pipe directly)
echo "Generating bundles..."
curl -s -X POST "http://localhost:8080/v1/bundle?bundlers=gpu-operator" \
-H "Content-Type: application/json" \
-d @recipe.json \
-o bundles.zip
# Alternative: one-liner without intermediate file
# curl -s "http://localhost:8080/v1/recipe?accelerator=h100&service=eks" | \
# curl -X POST "http://localhost:8080/v1/bundle?bundlers=gpu-operator" \
# -H "Content-Type: application/json" -d @- -o bundles.zip
# Step 3: Extract and verify
echo "Extracting bundles..."
unzip -q bundles.zip -d ./deployment
# Verify checksums
echo "Verifying checksums..."
cd deployment/gpu-operator
sha256sum -c checksums.txt
# Step 4: Deploy (example)
echo "Bundle ready for deployment:"
ls -laError Response Format:
{
"code": "ERROR_CODE",
"message": "Human-readable error description",
"details": { ... },
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-01-11T10:30:00Z",
"retryable": true
}Error Codes:
| Code | HTTP Status | Description | Retryable |
|---|---|---|---|
INVALID_REQUEST |
400 | Invalid query parameters, request body, or disallowed criteria value | No |
METHOD_NOT_ALLOWED |
405 | Wrong HTTP method | No |
NO_MATCHING_RULE |
404 | No configuration found | No |
RATE_LIMIT_EXCEEDED |
429 | Too many requests | Yes |
INTERNAL_ERROR |
500 | Server error | Yes |
Handling Rate Limits:
# Check rate limit headers
curl -I "http://localhost:8080/v1/recipe?accelerator=h100"
# Response headers:
# X-RateLimit-Limit: 100
# X-RateLimit-Remaining: 95
# X-RateLimit-Reset: 1736589000When rate limited (HTTP 429), use the Retry-After header:
# Retry with backoff
response=$(curl -s -w "%{http_code}" "http://localhost:8080/v1/recipe?accelerator=h100")
if [ "${response: -3}" = "429" ]; then
retry_after=$(curl -sI "http://localhost:8080/v1/recipe" | grep -i "Retry-After" | awk '{print $2}')
echo "Rate limited. Retrying after ${retry_after}s..."
sleep "$retry_after"
fi- Limit: 100 requests per second per IP
- Burst: 200 requests
- Headers:
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset - 429 Response: Includes
Retry-Afterheader
The API server can be configured to restrict which criteria values are allowed. This enables operators to limit the API to specific accelerators, services, intents, or OS types.
Allowlists are configured via environment variables when starting the server:
| Environment Variable | Description | Example |
|---|---|---|
AICR_ALLOWED_ACCELERATORS |
Comma-separated list of allowed GPU types | h100,l40 |
AICR_ALLOWED_SERVICES |
Comma-separated list of allowed K8s services | eks,gke |
AICR_ALLOWED_INTENTS |
Comma-separated list of allowed workload intents | training |
AICR_ALLOWED_OS |
Comma-separated list of allowed OS types | ubuntu,rhel |
Behavior:
- If an environment variable is not set, all values for that criteria are allowed
- If an environment variable is set, only the specified values are permitted
- The
anyvalue is always allowed regardless of allowlist configuration - Allowlists apply to both
/v1/recipeand/v1/bundleendpoints
# Start server allowing only H100 and L40 GPUs on EKS
docker run -p 8080:8080 \
-e AICR_ALLOWED_ACCELERATORS=h100,l40 \
-e AICR_ALLOWED_SERVICES=eks \
ghcr.io/nvidia/aicrd:latestWhen a disallowed criteria value is requested:
curl "http://localhost:8080/v1/recipe?accelerator=gb200&service=eks"Response (HTTP 400):
{
"code": "INVALID_REQUEST",
"message": "accelerator type not allowed",
"details": {
"requested": "gb200",
"allowed": ["h100", "l40"]
},
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-01-27T10:30:00Z",
"retryable": false
}The CLI (aicr) is not affected by allowlists. Allowlists only apply to the API server, allowing operators to restrict API access while maintaining full CLI functionality for administrative tasks.
import requests
import zipfile
import io
BASE_URL = "http://localhost:8080"
# Get recipe
params = {
"accelerator": "h100",
"service": "eks",
"intent": "training",
"os": "ubuntu"
}
resp = requests.get(f"{BASE_URL}/v1/recipe", params=params)
resp.raise_for_status()
recipe = resp.json()
print(f"Recipe has {len(recipe['componentRefs'])} components")
# Generate bundles — recipe is the request body, bundlers are query params
resp = requests.post(
f"{BASE_URL}/v1/bundle",
params={"bundlers": "gpu-operator"},
json=recipe,
)
resp.raise_for_status()
# Extract zip
with zipfile.ZipFile(io.BytesIO(resp.content)) as zf:
zf.extractall("./deployment")
print(f"Extracted {len(zf.namelist())} files")package main
import (
"encoding/json"
"fmt"
"io"
"net/http"
"net/url"
"os"
)
func main() {
baseURL := "http://localhost:8080"
// Get recipe
params := url.Values{}
params.Add("accelerator", "h100")
params.Add("service", "eks")
resp, err := http.Get(baseURL + "/v1/recipe?" + params.Encode())
if err != nil {
panic(err)
}
defer resp.Body.Close()
var recipe map[string]interface{}
json.NewDecoder(resp.Body).Decode(&recipe)
fmt.Printf("Got recipe with %d components\n",
len(recipe["componentRefs"].([]interface{})))
}const BASE_URL = "http://localhost:8080";
async function main() {
// Get recipe
const params = new URLSearchParams({
accelerator: "h100",
service: "eks",
intent: "training"
});
const recipeResp = await fetch(`${BASE_URL}/v1/recipe?${params}`);
const recipe = await recipeResp.json();
console.log(`Recipe has ${recipe.componentRefs.length} components`);
// Generate bundles — recipe is the request body, bundlers are query params
const bundleResp = await fetch(`${BASE_URL}/v1/bundle?bundlers=gpu-operator`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(recipe),
});
// Save zip
const buffer = await bundleResp.arrayBuffer();
require("fs").writeFileSync("bundles.zip", Buffer.from(buffer));
console.log("Bundles saved to bundles.zip");
}
main();#!/bin/bash
# Generate recipes for multiple environments
environments=(
"os=ubuntu&accelerator=h100&service=eks"
"os=ubuntu&accelerator=gb200&service=gke"
"os=rhel&accelerator=a100&service=aks"
)
for env in "${environments[@]}"; do
echo "Fetching recipe for: $env"
curl -s "http://localhost:8080/v1/recipe?${env}" \
| jq -r '.componentRefs[] | "\(.name): \(.version)"'
echo ""
doneThe full OpenAPI 3.1 specification is available at: api/aicr/v1/server.yaml
Generate client SDKs:
# Download spec
curl https://raw.githubusercontent.com/NVIDIA/aicr/main/api/aicr/v1/server.yaml \
-o openapi.yaml
# Generate Python client
openapi-generator-cli generate -i openapi.yaml -g python -o ./python-client
# Generate Go client
openapi-generator-cli generate -i openapi.yaml -g go -o ./go-client
# Generate TypeScript client
openapi-generator-cli generate -i openapi.yaml -g typescript-fetch -o ./ts-client"Invalid accelerator type" error:
# Use valid values: h100, gb200, a100, l40, any
curl "http://localhost:8080/v1/recipe?accelerator=h100""Recipe is required" error:
# Ensure recipe is in request body
curl -X POST "http://localhost:8080/v1/bundle" \
-H "Content-Type: application/json" \
-d '{"recipe": {...}}' # recipe must not be nullEmpty zip file:
# Check recipe has componentRefs
curl -s "http://localhost:8080/v1/recipe?accelerator=h100" | jq '.componentRefs'Connection refused (local):
# Start local server first
make server- CLI Reference - Command-line interface
- Agent Deployment - Kubernetes agent for snapshot capture
- Installation Guide - Setup instructions
- Data Flow - Understanding recipe data architecture
- Automation Guide - CI/CD integration patterns
- Kubernetes Deployment - Self-hosted API server deployment