The aicrd provides HTTP REST API access to AICR configuration recipe generation and bundle creation capabilities.
The API server provides HTTP REST access to Steps 2 and 4 of the AICR workflow – recipe generation and bundle creation. It is a production-ready HTTP service built on Go's net/http with middleware for rate limiting, metrics, request tracking, and graceful shutdown.
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Snapshot │─────▶│ Recipe │─────▶│ Validate │─────▶│ Bundle │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
CLI/Agent only API Server CLI only API Server
API Server Capabilities:
- Recipe generation (Step 2) via
GET /v1/recipeendpoint - Bundle creation (Step 4) via
POST /v1/bundleendpoint - Query mode only – generates recipes from environment parameters
- Health and metrics endpoints for Kubernetes deployment
- Production-ready HTTP server with middleware stack
- Supply chain security with SLSA Build Level 3 attestations
API Server Limitations:
- No snapshot capture – Use CLI
aicr snapshotor Kubernetes Agent - No snapshot mode – Cannot analyze captured snapshots (query mode only)
- No validation – Use CLI
aicr validateto check constraints against snapshots - No ConfigMap integration – API server doesn't read/write ConfigMaps
API Server Configuration:
- Criteria allowlists – Restrict allowed values for accelerator, service, intent, and OS via environment variables
- Value overrides – Supported via
?set=bundler:path=valuequery parameters on/v1/bundle - Node scheduling – Supported via
?system-node-selectorand?accelerated-node-selectorquery parameters
For complete workflow, use the CLI which supports:
- All four steps: snapshot → recipe → validate → bundle
- ConfigMap I/O:
cm://namespace/nameURIs - Agent deployment: Kubernetes Job with RBAC
- E2E testing: Chainsaw tests in
tests/chainsaw/cli/
flowchart TD
A["aicrd<br/>cmd/aicrd/main.go"] --> B["pkg/api/server.go<br/>Serve()"]
B --> B1["• Initialize logging<br/>• Create recipe.Builder<br/>• Create bundler.DefaultBundler<br/>• Setup routes: /v1/recipe, /v1/bundle<br/>• Create server with middleware<br/>• Graceful shutdown"]
B1 --> C["pkg/server/server.go<br/>HTTP Server Infrastructure"]
C --> C1["Server Config:<br/>Port: 8080, Rate: 100 req/s<br/>Timeouts, Max Header: 64KB"]
C --> C2["Middleware Chain:<br/>1. Metrics<br/>2. Request ID<br/>3. Panic Recovery<br/>4. Rate Limiting<br/>5. Logging<br/>6. Handler"]
C --> C3["Routes:<br/>/health, /ready, /metrics<br/>/v1/recipe, /v1/bundle"]
C3 --> D["Application Handlers"]
D --> D1["recipe.Builder.HandleRecipes<br/>GET /v1/recipe"]
D --> D2["bundler.DefaultBundler.HandleBundles<br/>POST /v1/bundle"]
D1 --> D1a["1. Method validation (GET)<br/>2. Parse query params<br/>3. Build query<br/>4. Builder.Build(ctx, query)<br/>5. Return JSON response"]
D2 --> D2a["1. Method validation (POST)<br/>2. Parse JSON body<br/>3. Validate recipe<br/>4. Generate bundles<br/>5. Return ZIP response"]
flowchart TD
A["HTTP Client<br/>GET /v1/recipe?os=ubuntu&gpu=gb200"] --> M1
M1["1. Metrics Middleware<br/>• Start timer<br/>• Increment in-flight counter<br/>• Wrap response writer<br/>• Record duration & count"] --> M2
M2["2. Request ID Middleware<br/>• Extract X-Request-Id<br/>• Generate UUID if missing<br/>• Store in context<br/>• Add to response header"] --> M3
M3["3. Panic Recovery<br/>• Wrap in defer/recover<br/>• Log errors<br/>• Return 500 on panic"] --> M4
M4["4. Rate Limit Middleware<br/>• Check rateLimiter.Allow()<br/>• Return 429 if exceeded<br/>• Add rate limit headers"] --> M5
M5["5. Logging Middleware<br/>• Log request start<br/>• Capture status<br/>• Log completion"] --> H
H["6. Application Handler<br/>recipe.Builder.HandleRecipes"] --> H1
H1["A. Method Validation<br/>(GET only)"] --> H2
H2["B. Parse Query Parameters<br/>service, accelerator, intent, os, nodes"] --> H3
H3["C. Format Validation<br/>• Validate enums<br/>• Parse values<br/>• Return 400 on error"] --> H3a
H3a["D. Allowlist Validation<br/>• Check against configured allowlists<br/>• Return 400 if disallowed"] --> H4
H4["E. Build Recipe<br/>• Builder.BuildFromCriteria(ctx, criteria)<br/>• Load store (cached)<br/>• Apply matching overlays"] --> H5
H5["F. Respond<br/>• Set Cache-Control<br/>• Serialize to JSON<br/>• Return 200 OK"] --> Z
Z[JSON Response]
Minimal entry point:
package main
import (
"log"
"github.com/NVIDIA/aicr/pkg/api"
)
func main() {
if err := api.Serve(); err != nil {
log.Fatal(err)
}
}Responsibilities:
- Initialize structured logging
- Parse criteria allowlists from environment variables
- Create recipe builder with allowlist configuration
- Create bundle handler with allowlist configuration
- Setup HTTP routes
- Configure server with middleware
- Handle graceful shutdown
Key Features:
- Version info injection via ldflags:
version,commit,date - Routes:
/v1/recipe→ recipe handler,/v1/bundle→ bundle handler - Criteria allowlists parsed from
AICR_ALLOWED_*environment variables - Server configured with production defaults
- Graceful shutdown on SIGINT/SIGTERM
Initialization Flow:
func Serve() error {
// 1. Setup logging
logging.SetDefaultStructuredLogger(name, version)
// 2. Parse allowlists from environment
allowLists, err := recipe.ParseAllowListsFromEnv()
if err != nil {
return fmt.Errorf("failed to parse allowlists: %w", err)
}
// 3. Create recipe handler with allowlists
rb := recipe.NewBuilder(
recipe.WithVersion(version),
recipe.WithAllowLists(allowLists),
)
// 4. Create bundle handler with allowlists
bb, err := bundler.New(
bundler.WithAllowLists(allowLists),
)
// 5. Setup routes and start server
// ...
}Production-ready HTTP server implementation with 10 files:
server.go (217 lines)
- Server struct with config, HTTP server, rate limiter, ready state
- Functional options pattern for configuration
- Graceful shutdown using
signal.NotifyContextanderrgroup - Default root handler listing available routes
config.go (72 lines)
- Configuration struct with sensible defaults
- Environment variable support (PORT)
- Timeout configuration (read, write, idle, shutdown)
- Rate limiting parameters
middleware.go (123 lines)
- Middleware chain builder
- Request ID middleware (UUID generation/validation)
- Rate limiting middleware (token bucket)
- Panic recovery middleware
- Logging middleware (structured logs)
health.go (60 lines)
/health- Liveness probe (always returns 200)/ready- Readiness probe (returns 503 when not ready)- JSON response with status and timestamp
errors.go (49 lines)
- Standardized error response structure
- Error codes (RATE_LIMIT_EXCEEDED, INTERNAL_ERROR, etc.)
- WriteError helper with request ID tracking
metrics.go (90 lines)
- Prometheus metrics:
aicr_http_requests_total- Counter by method, path, statusaicr_http_request_duration_seconds- Histogram by method, pathaicr_http_requests_in_flight- Gaugeaicr_rate_limit_rejects_total- Counteraicr_panic_recoveries_total- Counter
context.go (8 lines)
- Context key type for request ID storage
doc.go (200 lines)
- Comprehensive package documentation
- Usage examples
- API endpoint descriptions
- Error handling documentation
- Deployment examples
flowchart TD
A[HTTP Request] --> B[Metrics Start]
B --> C[Request ID<br/>Generation/Validation]
C --> D[Panic Recovery Setup]
D --> E[Rate Limit Check]
E --> F[Logging]
F --> G[Application Handler]
G --> H[Logging Complete]
H --> I[Panic Recovery Cleanup]
I --> J[Request ID in Response]
J --> K[Metrics Complete]
K --> L[HTTP Response]
HTTP handler for recipe generation endpoint. Supports both GET (query parameters) and POST (criteria body) methods.
func (b *Builder) HandleRecipes(w http.ResponseWriter, r *http.Request) {
var criteria *Criteria
var err error
// 1. Route based on HTTP method
switch r.Method {
case http.MethodGet:
// 2a. Parse query parameters for GET
criteria, err = ParseCriteriaFromRequest(r)
case http.MethodPost:
// 2b. Parse request body for POST (JSON or YAML)
criteria, err = ParseCriteriaFromBody(r.Body, r.Header.Get("Content-Type"))
defer r.Body.Close()
default:
// Reject other methods
w.Header().Set("Allow", "GET, POST")
return 405
}
// 3. Validate criteria format
if err := criteria.Validate(); err != nil {
return 400 with error details
}
// 4. Validate against allowlists (if configured)
if b.AllowLists != nil {
if err := b.AllowLists.ValidateCriteria(criteria); err != nil {
return 400 with allowed values in error details
}
}
// 5. Build recipe
recipe, err := b.BuildFromCriteria(r.Context(), criteria)
if err != nil {
return 500
}
// 6. Set cache headers
w.Header().Set("Cache-Control", "public, max-age=600")
// 7. Respond with JSON
serializer.RespondJSON(w, http.StatusOK, recipe)
}POST requests accept a RecipeCriteria resource (Kubernetes-style):
kind: RecipeCriteria
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
name: my-criteria
spec:
service: eks
accelerator: gb200
os: ubuntu
intent: trainingSupported content types:
application/json- JSON formatapplication/x-yaml- YAML format
#### Query Parameter Parsing
| Parameter | Type | Validation | Example |
|-----------|------|------------|--------|
| `service` | ServiceType | Enum: eks, gke, aks, oke, any | `service=eks` |
| `accelerator` | AcceleratorType | Enum: h100, gb200, a100, l40, any | `accelerator=h100` |
| `gpu` | AcceleratorType | Alias for accelerator | `gpu=h100` |
| `intent` | IntentType | Enum: training, inference, any | `intent=training` |
| `os` | OSType | Enum: ubuntu, rhel, cos, amazonlinux, any | `os=ubuntu` |
| `nodes` | int | >= 0 | `nodes=8` |
### Recipe Builder: `pkg/recipe/builder.go`
Shared with CLI - same logic as described in CLI architecture.
## API Endpoints
### Recipe Generation
**Endpoints**:
- `GET /v1/recipe` - Generate recipe from query parameters
- `POST /v1/recipe` - Generate recipe from criteria body
#### GET Method
**Query Parameters**:
- `service` - Kubernetes service type (eks, gke, aks, oke)
- `accelerator` - GPU/accelerator type (h100, gb200, a100, l40)
- `gpu` - Alias for accelerator (backwards compatibility)
- `intent` - Workload intent (training, inference)
- `os` - Operating system family (ubuntu, rhel, cos, amazonlinux)
- `nodes` - Number of GPU nodes (0 = any/unspecified)
#### POST Method
**Content Types**: `application/json`, `application/x-yaml`
**Request Body**: `RecipeCriteria` resource with kind, apiVersion, metadata, and spec fields.
```yaml
kind: RecipeCriteria
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
name: my-criteria
spec:
service: eks
accelerator: h100
intent: training
Response: 200 OK
{
"apiVersion": "aicr.nvidia.com/v1alpha1",
"kind": "Recipe",
"metadata": {
"version": "v1.0.0",
"created": "2025-12-25T12:00:00Z",
"appliedOverlays": [
"base",
"eks",
"eks-training",
"gb200-eks-training"
]
},
"criteria": {
"service": "eks",
"accelerator": "gb200",
"intent": "training",
"os": "any"
},
"componentRefs": [
{
"name": "gpu-operator",
"version": "v25.3.3",
"order": 1
}
],
"constraints": {
"driver": {
"version": "580.82.07"
}
}
}Error Response: 400 Bad Request
{
"code": "INVALID_REQUEST",
"message": "invalid gpu type: must be one of h100, gb200, a100, l40, ALL",
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2025-12-25T12:00:00Z",
"retryable": false
}Rate Limited: 429 Too Many Requests
{
"code": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded",
"details": {
"limit": 100,
"burst": 200
},
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2025-12-25T12:00:00Z",
"retryable": true
}Headers:
X-Request-Id- Unique request identifierX-RateLimit-Limit- Total requests allowed per secondX-RateLimit-Remaining- Requests remaining in current windowX-RateLimit-Reset- Unix timestamp when window resetsCache-Control- Caching policy (public, max-age=300)
Endpoint: GET /health
Response: 200 OK
{
"status": "healthy",
"timestamp": "2025-12-25T12:00:00Z"
}Endpoint: GET /ready
Response: 200 OK (ready) or 503 Service Unavailable (not ready)
{
"status": "ready",
"timestamp": "2025-12-25T12:00:00Z"
}Endpoint: GET /metrics
Response: Prometheus text format
# HELP aicr_http_requests_total Total number of HTTP requests
# TYPE aicr_http_requests_total counter
aicr_http_requests_total{method="GET",path="/v1/recipe",status="200"} 1234
# HELP aicr_http_request_duration_seconds HTTP request latency in seconds
# TYPE aicr_http_request_duration_seconds histogram
aicr_http_request_duration_seconds_bucket{method="GET",path="/v1/recipe",le="0.005"} 1000
aicr_http_request_duration_seconds_sum{method="GET",path="/v1/recipe"} 12.34
aicr_http_request_duration_seconds_count{method="GET",path="/v1/recipe"} 1234
# HELP aicr_http_requests_in_flight Current number of HTTP requests being processed
# TYPE aicr_http_requests_in_flight gauge
aicr_http_requests_in_flight 5
# HELP aicr_rate_limit_rejects_total Total number of requests rejected due to rate limiting
# TYPE aicr_rate_limit_rejects_total counter
aicr_rate_limit_rejects_total 42
# HELP aicr_panic_recoveries_total Total number of panics recovered in HTTP handlers
# TYPE aicr_panic_recoveries_total counter
aicr_panic_recoveries_total 0
Endpoint: GET /
Response: 200 OK
{
"service": "aicrd",
"version": "v1.0.0",
"routes": [
"/v1/recipe"
]
}# Basic recipe request
curl "http://localhost:8080/v1/recipe?os=ubuntu&gpu=h100"
# Full specification
curl "http://localhost:8080/v1/recipe?os=ubuntu&service=eks&accelerator=gb200&intent=training&nodes=8"
# With request ID
curl -H "X-Request-Id: 550e8400-e29b-41d4-a716-446655440000" \
"http://localhost:8080/v1/recipe?os=ubuntu&gpu=h100"
# Health check
curl http://localhost:8080/health
# Readiness check
curl http://localhost:8080/ready
# Metrics
curl http://localhost:8080/metricsNote: This section describes the demonstration deployment of the
aicrdAPI server for testing and development purposes only. It is not a production service. Users should self-host theaicrdAPI server in their own infrastructure for production use. See the Kubernetes Deployment section below for deployment guidance.
The demo API server is deployed to Google Cloud Run as an example of how to deploy aicrd:
Demo Configuration:
- Platform: Google Cloud Run (fully managed serverless)
- Authentication: Public access (for demo purposes)
- Auto-scaling: 0-100 instances based on load
- Region:
us-west1
CI/CD Pipeline (on-tag.yaml):
flowchart LR
A["Git Tag<br/>v0.8.12"] --> B["GitHub Actions"]
B --> C["Go CI<br/>(Test + Lint)"]
C --> D["Build Image<br/>(ko + goreleaser)"]
D --> E["Generate SBOM<br/>(Syft)"]
E --> F["Sign Attestations<br/>(Cosign keyless)"]
F --> G["Push to GHCR<br/>ghcr.io/nvidia/aicrd"]
G --> H["Demo Deploy<br/>(example)"]
H --> I["Health Check<br/>Verification"]
Supply Chain Security:
- SLSA Build Level 3 compliance
- Signed SBOMs in SPDX format
- Attestations logged in Rekor transparency log
- Verification:
gh attestation verify oci://ghcr.io/nvidia/aicrd:TAG --owner nvidia
Demo Monitoring:
- Health endpoint:
/health - Readiness endpoint:
/ready - Prometheus metrics:
/metrics - Request tracing with
X-Request-Idheaders
Scaling Behavior (demo):
- Min instances: 0 (scales to zero when idle)
- Max instances: 100 (automatic scaling)
- Cold start: 2-3 seconds
- Request timeout: 30 seconds
- Concurrency: 80 requests per instance
Cloud Run Benefits (for reference):
- Zero operational overhead
- Automatic HTTPS with managed certificates
- Built-in DDoS protection
- Pay-per-use pricing (scales to zero)
- Global load balancing
Go Client:
import (
"encoding/json"
"fmt"
"net/http"
"net/url"
)
func getRecipe(os, gpu string) (*Recipe, error) {
baseURL := "http://localhost:8080/v1/recipe"
params := url.Values{}
params.Add("os", os)
params.Add("gpu", gpu)
resp, err := http.Get(baseURL + "?" + params.Encode())
if err != nil {
return nil, err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("unexpected status: %d", resp.StatusCode)
}
var recipe Recipe
if err := json.NewDecoder(resp.Body).Decode(&recipe); err != nil {
return nil, err
}
return &recipe, nil
}Python Client:
import requests
def get_recipe(os, gpu):
url = "http://localhost:8080/v1/recipe"
params = {"os": os, "gpu": gpu}
response = requests.get(url, params=params)
response.raise_for_status()
return response.json()
# Usage
recipe = get_recipe("ubuntu", "h100")
print(f"Matched {len(recipe['matchedRuleId'])} rules")apiVersion: apps/v1
kind: Deployment
metadata:
name: aicrd
namespace: aicr-system
spec:
replicas: 3
selector:
matchLabels:
app: aicrd
template:
metadata:
labels:
app: aicrd
spec:
containers:
- name: server
image: ghcr.io/nvidia/aicrd:v1.0.0
ports:
- containerPort: 8080
name: http
env:
- name: PORT
value: "8080"
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: aicrd
namespace: aicr-system
spec:
selector:
app: aicrd
ports:
- port: 80
targetPort: http
type: ClusterIP
---
apiVersion: v1
kind: ServiceMonitor
metadata:
name: aicrd
namespace: aicr-system
spec:
selector:
matchLabels:
app: aicrd
endpoints:
- port: http
path: /metrics
interval: 30sapiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: aicrd
namespace: aicr-system
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- api.aicr.nvidia.com
secretName: aicr-api-tls
rules:
- host: api.aicr.nvidia.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: aicrd
port:
number: 80apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: aicrd
namespace: aicr-system
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: aicrd
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: aicr_http_requests_in_flight
target:
type: AverageValue
averageValue: "50"- Rate Limit: 100 requests/second per instance (configurable)
- Burst: 200 requests (configurable)
- Target Latency: p50 <10ms, p99 <50ms
- Max Concurrent: Limited by rate limiter
- CPU: ~50m idle, ~200m at 100 req/s
- Memory: ~100MB baseline, ~200MB at peak
- Disk: None (stateless, embedded recipe data)
- Horizontal: Fully stateless, linear scaling
- Vertical: Recipe store cached in memory (sync.Once)
- Load Balancing: Round-robin or least-connections
- Recipe Store: Loaded once per process, cached globally
- Client-Side: 5-minute cache via Cache-Control header
- CDN: Recommended for public-facing deployments
All errors follow a consistent JSON structure:
{
"code": "ERROR_CODE",
"message": "Human-readable error message",
"details": {"key": "value"},
"requestId": "uuid",
"timestamp": "2025-12-25T12:00:00Z",
"retryable": true/false
}| Code | HTTP Status | Description | Retryable |
|---|---|---|---|
RATE_LIMIT_EXCEEDED |
429 | Too many requests | Yes |
INVALID_REQUEST |
400 | Invalid parameters or disallowed criteria value | No |
METHOD_NOT_ALLOWED |
405 | Wrong HTTP method | No |
INTERNAL_ERROR |
500 | Server error | Yes |
SERVICE_UNAVAILABLE |
503 | Not ready | Yes |
Allowlist Validation Error Example:
When a request uses a criteria value not in the configured allowlist:
{
"code": "INVALID_REQUEST",
"message": "accelerator type not allowed",
"details": {
"requested": "gb200",
"allowed": ["h100", "l40"]
},
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-01-27T12:00:00Z",
"retryable": false
}- Validation Errors: Return 400 with specific error message
- Rate Limiting: Return 429 with Retry-After header
- Panics: Recover, log, return 500
- Context Cancellation: Return early, cleanup resources
- Resource Exhaustion: Rate limiting prevents this
Rate Limiting:
- Token bucket algorithm prevents abuse
- Per-instance limit (shared across all clients)
- Configurable limits and burst
Header Attacks:
- 64KB header size limit
- 5-second header read timeout
- Prevents slowloris attacks
Resource Exhaustion:
- Request timeouts (read, write, idle)
- In-flight request limits
- Graceful shutdown prevents connection drops
Input Validation:
- Strict enum validation
- Version string parsing with bounds
- UUID validation for request IDs
TLS:
- Use reverse proxy (nginx, Envoy) for TLS termination
- Or add TLS support to server (future enhancement)
Authentication:
- Add API key middleware (future enhancement)
- Or use service mesh mTLS (Istio, Linkerd)
Authorization:
- Currently none (public API)
- Could add rate limits per API key
Monitoring:
- Prometheus metrics for observability
- Request ID tracking for distributed tracing
- Structured logging for debugging
Request Metrics:
aicr_http_requests_total- Total requests by method, path, statusaicr_http_request_duration_seconds- Request latency histogramaicr_http_requests_in_flight- Current active requests
Error Metrics:
aicr_rate_limit_rejects_total- Rate limit rejectionsaicr_panic_recoveries_total- Panic recoveries
Example queries:
# Request rate
rate(aicr_http_requests_total[5m])
# Error rate
rate(aicr_http_requests_total{status=~"5.."}[5m])
# Latency percentiles
histogram_quantile(0.99, rate(aicr_http_request_duration_seconds_bucket[5m]))
# Rate limit rejections
rate(aicr_rate_limit_rejects_total[5m])
groups:
- name: aicrd
rules:
- alert: HighErrorRate
expr: rate(aicr_http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
annotations:
summary: High error rate on aicrd
- alert: HighLatency
expr: histogram_quantile(0.99, rate(aicr_http_request_duration_seconds_bucket[5m])) > 0.1
for: 5m
annotations:
summary: High latency on aicrd
- alert: HighRateLimitRejects
expr: rate(aicr_rate_limit_rejects_total[5m]) > 10
for: 5m
annotations:
summary: High rate limit rejectionsRequest ID tracking enables correlation:
- Client sends request with
X-Request-Idheader - Server logs all operations with request ID
- Response includes same
X-Request-Id - Client can correlate logs across services
Future: OpenTelemetry integration for full tracing
- Handler validation logic
- Middleware functionality
- Error response formatting
- Query parsing
- Full HTTP request/response cycle
- Rate limiting behavior
- Graceful shutdown
- Health/ready endpoints
- Sustained load at rate limit
- Burst handling
- Latency under load
- Memory stability
func TestRecipeHandler(t *testing.T) {
// Create test server
builder := recipe.NewBuilder()
handler := builder.HandleRecipes
// Create test request
req := httptest.NewRequest(
"GET",
"/v1/recipe?os=ubuntu&gpu=h100",
nil,
)
w := httptest.NewRecorder()
// Execute handler
handler(w, req)
// Verify response
assert.Equal(t, http.StatusOK, w.Code)
var resp recipe.Recipe
err := json.Unmarshal(w.Body.Bytes(), &resp)
assert.NoError(t, err)
assert.Equal(t, "ubuntu", resp.Request.Os)
}net/http- Standard HTTP servergolang.org/x/time/rate- Rate limitinggolang.org/x/sync/errgroup- Concurrent error handlinggithub.com/prometheus/client_golang- Prometheus metricsgithub.com/google/uuid- UUID generationgopkg.in/yaml.v3- Recipe store parsinglog/slog- Structured logging
pkg/recipe- Recipe building logicpkg/measurement- Data modelpkg/version- Semantic versioningpkg/serializer- JSON response formattingpkg/logging- Logging configuration
Production builds are automated through GitHub Actions workflows. When a semantic version tag is pushed (e.g., v0.8.12), the on-tag.yaml workflow:
- Validates code with Go CI (tests + linting)
- Builds multi-platform binaries and container images with GoReleaser and ko
- Generates SBOMs (SPDX for binaries and for containers)
- Attests images with SLSA v1.0 provenance and SBOM attestations
- Deploys to Google Cloud Run with Workload Identity Federation
Supply Chain Security:
- SLSA Build Level 3 compliance
- Cosign keyless signing with Fulcio + Rekor
- GitHub Attestation API for provenance
- Multi-platform builds: darwin/linux × amd64/arm64
Verify Release Artifacts:
# Get latest release tag
export TAG=$(curl -s https://api.github.com/repos/NVIDIA/aicr/releases/latest | jq -r '.tag_name')
# Verify attestations
gh attestation verify oci://ghcr.io/nvidia/aicrd:${TAG} --owner nvidiaFor detailed CI/CD architecture, see ../CONTRIBUTING.md#github-actions--cicd and README.md.
For local development and testing:
VERSION ?= $(shell git describe --tags --always --dirty)
COMMIT ?= $(shell git rev-parse --short HEAD)
DATE ?= $(shell date -u +%Y-%m-%dT%H:%M:%SZ)
LDFLAGS := -X github.com/NVIDIA/aicr/pkg/api.version=$(VERSION)
LDFLAGS += -X github.com/NVIDIA/aicr/pkg/api.commit=$(COMMIT)
LDFLAGS += -X github.com/NVIDIA/aicr/pkg/api.date=$(DATE)
go build -ldflags="$(LDFLAGS)" -o bin/aicrd ./cmd/aicrdProduction images are built with ko (automated in CI/CD). For local development:
FROM golang:1.26-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -ldflags="-X github.com/NVIDIA/aicr/pkg/api.version=v1.0.0" \
-o /bin/aicrd ./cmd/aicrd
FROM alpine:3.19
RUN apk --no-cache add ca-certificates
COPY --from=builder /bin/aicrd /usr/local/bin/
EXPOSE 8080
ENTRYPOINT ["aicrd"]Note: Production images use distroless base (gcr.io/distroless/static) for minimal attack surface.
| Variable | Default | Description |
|---|---|---|
PORT |
8080 |
Server port |
AICR_ALLOWED_ACCELERATORS |
(none) | Comma-separated list of allowed GPU types (e.g., h100,l40). If not set, all types allowed. |
AICR_ALLOWED_SERVICES |
(none) | Comma-separated list of allowed K8s services (e.g., eks,gke). If not set, all services allowed. |
AICR_ALLOWED_INTENTS |
(none) | Comma-separated list of allowed intents (e.g., training). If not set, all intents allowed. |
AICR_ALLOWED_OS |
(none) | Comma-separated list of allowed OS types (e.g., ubuntu,rhel). If not set, all OS types allowed. |
Criteria Allowlists:
When allowlist environment variables are configured, the API server validates incoming requests against the allowed values. This enables operators to restrict the API to specific configurations.
# Start server with restricted accelerators
export AICR_ALLOWED_ACCELERATORS=h100,l40
export AICR_ALLOWED_SERVICES=eks,gke
./aicrd
# Server logs on startup:
# INFO criteria allowlists configured accelerators=2 services=2 intents=0 os_types=0
# DEBUG criteria allowlists loaded accelerators=["h100","l40"] services=["eks","gke"] intents=[] os_types=[]Validation behavior:
- Requests with disallowed values return HTTP 400 with error details
- The
anyvalue is always allowed regardless of allowlist - Both
/v1/recipeand/v1/bundleendpoints enforce allowlists - CLI (
aicr) is not affected by allowlists
-
Authentication & Authorization
Rationale: Protect API from unauthorized access, enable usage tracking
Implementation: API key middleware with HMAC-SHA256 verification
Example:func APIKeyMiddleware(validKeys map[string]string) func(http.Handler) http.Handler { return func(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { key := r.Header.Get("X-API-Key") if _, ok := validKeys[key]; !ok { http.Error(w, "Invalid API key", http.StatusUnauthorized) return } next.ServeHTTP(w, r) }) } }
Reference: HTTP Authentication
-
CORS Support
Use Case: Enable browser-based clients (web dashboards)
Implementation:rs/corsmiddleware with configurable origins
Configuration:c := cors.New(cors.Options{ AllowedOrigins: []string{"https://dashboard.example.com"}, AllowedMethods: []string{"GET", "POST", "OPTIONS"}, AllowedHeaders: []string{"Content-Type", "X-API-Key"}, AllowCredentials: true, MaxAge: 86400, // 24 hours }) handler := c.Handler(mux)
Reference: CORS Specification
-
Response Compression
Benefit: Reduce bandwidth by 70-80% for JSON responses
Implementation:gziphandlermiddleware with quality thresholdimport "github.com/NYTimes/gziphandler" handler := gziphandler.GzipHandler(mux) // Only compresses responses > 1KB
Trade-off: CPU usage (+5-10%) vs bandwidth savings
Reference: gziphandler -
Native TLS Support
Rationale: Eliminate need for reverse proxy in simple deployments
Implementation:http.ListenAndServeTLSwith Let's Encrypt integrationimport "golang.org/x/crypto/acme/autocert" m := &autocert.Manager{ Prompt: autocert.AcceptTOS, Cache: autocert.DirCache("/var/cache/aicr"), HostPolicy: autocert.HostWhitelist("api.example.com"), } srv := &http.Server{ Addr: ":https", TLSConfig: m.TLSConfig(), Handler: handler, } srv.ListenAndServeTLS("", "")
Reference: autocert Package
-
API Versioning
Use Case: Support /v2 API with breaking changes while maintaining /v1
Pattern: URL-based versioning with version-specific handlersv1 := http.NewServeMux() v1.HandleFunc("/recipe", handleRecipeV1) v2 := http.NewServeMux() v2.HandleFunc("/recipe", handleRecipeV2) mux := http.NewServeMux() mux.Handle("/v1/", http.StripPrefix("/v1", v1)) mux.Handle("/v2/", http.StripPrefix("/v2", v2))
Reference: API Versioning Best Practices
-
OpenTelemetry Integration
Use Case: Distributed tracing across services
Implementation: OTLP exporter with automatic instrumentationimport ( "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" "go.opentelemetry.io/otel/sdk/trace" ) func initTracer() (*trace.TracerProvider, error) { exporter, err := otlptracehttp.New(context.Background(), otlptracehttp.WithEndpoint("otel-collector:4318"), otlptracehttp.WithInsecure(), ) if err != nil { return nil, err } tp := trace.NewTracerProvider( trace.WithBatcher(exporter), trace.WithResource(/* service name */), ) otel.SetTracerProvider(tp) return tp, nil }
Reference: OpenTelemetry Go
-
Recipe Caching
Benefit: 95%+ cache hit rate for repeated queries
Implementation: Redis with TTL, fallback to recipe builderimport "github.com/redis/go-redis/v9" func getRecipe(ctx context.Context, key string) (*recipe.Recipe, error) { // Try cache first cached, err := rdb.Get(ctx, key).Result() if err == nil { var r recipe.Recipe json.Unmarshal([]byte(cached), &r) return &r, nil } // Cache miss - build recipe r, err := builder.BuildRecipe(ctx, params) if err != nil { return nil, err } // Cache with 1 hour TTL json, _ := json.Marshal(r) rdb.Set(ctx, key, json, time.Hour) return r, nil }
Reference: go-redis
-
GraphQL API
Rationale: Enable clients to request only needed fields
Implementation:graphql-gowith recipe schematype Query { recipe( os: String! osVersion: String gpu: String! service: String ): Recipe } type Recipe { request: RequestInfo! measurements: [Measurement!]! context: RecipeContext }
Trade-off: Added complexity vs flexible querying
Reference: GraphQL Go
-
gRPC Support
Benefit: 5-10x better performance, smaller payloads
Implementation: Protobuf definition with streaming supportservice RecipeService { rpc GetRecipe(RecipeRequest) returns (Recipe); rpc StreamRecipes(stream RecipeRequest) returns (stream Recipe); rpc GetSnapshot(SnapshotRequest) returns (Snapshot); } message RecipeRequest { string os = 1; string os_version = 2; string gpu = 3; string service = 4; }
Deployment: Run HTTP/2 and gRPC on same port with
cmux
Reference: gRPC Go -
Multi-Tenancy
Use Case: SaaS deployment with per-customer isolation
Implementation: Tenant ID from API key, separate rate limitstype TenantRateLimiter struct { limiters map[string]*rate.Limiter mu sync.RWMutex } func (t *TenantRateLimiter) Allow(tenantID string) bool { t.mu.RLock() limiter, exists := t.limiters[tenantID] t.mu.RUnlock() if !exists { t.mu.Lock() limiter = rate.NewLimiter(rate.Limit(100), 200) // Per-tenant t.limiters[tenantID] = limiter t.mu.Unlock() } return limiter.Allow() }
Database: Separate recipe stores per tenant
-
Admin API
Use Case: Runtime configuration updates without restart
Endpoints:POST /admin/config/rate-limit- Update rate limitsPOST /admin/config/log-level- Change log verbosityGET /admin/debug/pprof- CPU/memory profilingPOST /admin/cache/flush- Clear recipe cache Security: Separate admin API key with IP allowlist
-
Feature Flags
Rationale: A/B testing, gradual rollouts, instant rollback
Implementation: LaunchDarkly or custom flag serviceimport "github.com/launchdarkly/go-server-sdk/v7" func handleRecipe(w http.ResponseWriter, r *http.Request) { user := ldclient.NewUser(getUserID(r)) // Check feature flag if client.BoolVariation("use-optimized-builder", user, false) { // Use new optimized recipe builder recipe = optimizedBuilder.Build(params) } else { // Fall back to stable builder recipe = stableBuilder.Build(params) } }
Reference: LaunchDarkly Go SDK
Use Case: Auto-scale API servers based on request rate
Deployment Manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: aicrd
namespace: aicr
spec:
replicas: 3 # Initial replicas
selector:
matchLabels:
app: aicrd
template:
metadata:
labels:
app: aicrd
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: aicrd
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: api-server
image: ghcr.io/nvidia/aicrd:latest # Or use specific tag like v0.8.12
ports:
- name: http
containerPort: 8080
protocol: TCP
- name: metrics
containerPort: 9090
protocol: TCP
env:
- name: PORT
value: "8080"
- name: LOG_LEVEL
value: "info"
# Criteria allowlists (optional - omit to allow all values)
- name: AICR_ALLOWED_ACCELERATORS
value: "h100,l40,a100"
- name: AICR_ALLOWED_SERVICES
value: "eks,gke,aks"
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 2
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: recipes
mountPath: /etc/aicr/recipes
readOnly: true
- name: tmp
mountPath: /tmp
volumes:
- name: recipes
configMap:
name: aicr-recipes
- name: tmp
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: aicrd
namespace: aicr
spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
selector:
app: aicrd
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: aicrd-hpa
namespace: aicr
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: aicrd
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100 # Double pods
periodSeconds: 60
- type: Pods
value: 4 # Add 4 pods
periodSeconds: 60
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300 # 5 min cooldown
policies:
- type: Percent
value: 50 # Remove 50% of pods
periodSeconds: 60Ingress with TLS:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: aicrd
namespace: aicr
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/limit-rps: "20"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.aicr.example.com
secretName: aicr-api-tls
rules:
- host: api.aicr.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: aicrd
port:
name: httpUse Case: Zero-trust security with automatic mTLS encryption
Istio VirtualService:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: aicrd
namespace: aicr
spec:
hosts:
- aicrd.aicr.svc.cluster.local
- api.aicr.example.com
gateways:
- aicr-gateway
http:
- match:
- uri:
prefix: /v1/recipe
route:
- destination:
host: aicrd
port:
number: 80
timeout: 10s
retries:
attempts: 3
perTryTimeout: 3s
retryOn: 5xx,reset,connect-failure
headers:
response:
add:
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Strict-Transport-Security: max-age=31536000
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: aicrd
namespace: aicr
spec:
host: aicrd
trafficPolicy:
tls:
mode: ISTIO_MUTUAL # mTLS between services
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
http2MaxRequests: 100
maxRequestsPerConnection: 2
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: aicrd
namespace: aicr
spec:
selector:
matchLabels:
app: aicrd
mtls:
mode: STRICT # Require mTLS
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: aicrd
namespace: aicr
spec:
selector:
matchLabels:
app: aicrd
action: ALLOW
rules:
- from:
- source:
namespaces: ["aicr", "monitoring"]
to:
- operation:
methods: ["GET", "POST"]
paths: ["/v1/*", "/health", "/metrics"]Use Case: Bare-metal deployment with HAProxy
HAProxy Configuration:
global
log /dev/log local0
maxconn 4096
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5s
timeout client 30s
timeout server 30s
retries 3
option redispatch
frontend aicr_api_frontend
bind *:443 ssl crt /etc/ssl/certs/aicr-api.pem
bind *:80
redirect scheme https if !{ ssl_fc }
# Rate limiting
stick-table type ip size 100k expire 30s store http_req_rate(10s)
http-request track-sc0 src
http-request deny deny_status 429 if { sc_http_req_rate(0) gt 100 }
# Security headers
http-response set-header Strict-Transport-Security "max-age=31536000"
http-response set-header X-Content-Type-Options "nosniff"
default_backend aicr_api_backend
backend aicr_api_backend
balance roundrobin
option httpchk GET /health
http-check expect status 200
server api1 10.0.1.10:8080 check inter 10s fall 3 rise 2 maxconn 100
server api2 10.0.1.11:8080 check inter 10s fall 3 rise 2 maxconn 100
server api3 10.0.1.12:8080 check inter 10s fall 3 rise 2 maxconn 100Use Case: Zero-downtime updates with instant rollback
Kubernetes Service Switching:
#!/bin/bash
# Blue-green deployment script
set -euo pipefail
NAMESPACE=aicr
APP=aicrd
NEW_VERSION=$1
# Deploy green version
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${APP}-green
namespace: ${NAMESPACE}
spec:
replicas: 3
selector:
matchLabels:
app: ${APP}
version: green
template:
metadata:
labels:
app: ${APP}
version: green
spec:
containers:
- name: api-server
image: ghcr.io/nvidia/${APP}:${NEW_VERSION}
# ... same spec as blue ...
EOF
# Wait for green to be ready
kubectl rollout status deployment/${APP}-green -n ${NAMESPACE}
# Run smoke tests
GREEN_IP=$(kubectl get svc ${APP}-green -n ${NAMESPACE} -o jsonpath='{.spec.clusterIP}')
curl -f http://${GREEN_IP}/health || (echo "Health check failed" && exit 1)
curl -f "http://${GREEN_IP}/v1/recipe?os=ubuntu&gpu=h100" || (echo "Recipe test failed" && exit 1)
# Switch service to green
kubectl patch service ${APP} -n ${NAMESPACE} -p '{"spec":{"selector":{"version":"green"}}}'
echo "Switched to green (${NEW_VERSION})"
echo "Monitor for 10 minutes, then delete blue deployment"
echo "Rollback: kubectl patch service ${APP} -n ${NAMESPACE} -p '{\"spec\":{\"selector\":{\"version\":\"blue\"}}}'"
# Optional: Auto-delete blue after monitoring period
# sleep 600
# kubectl delete deployment ${APP}-blue -n ${NAMESPACE}Use Case: Prevent cascading failures when recipe store is slow
Implementation:
import "github.com/sony/gobreaker"
var (
recipeStoreBreaker *gobreaker.CircuitBreaker
)
func init() {
settings := gobreaker.Settings{
Name: "RecipeStore",
MaxRequests: 3, // Half-open state allows 3 requests
Interval: 60 * time.Second, // Reset counts every 60s
Timeout: 30 * time.Second, // Stay open for 30s
ReadyToTrip: func(counts gobreaker.Counts) bool {
failureRatio := float64(counts.TotalFailures) / float64(counts.Requests)
return counts.Requests >= 10 && failureRatio >= 0.6
},
OnStateChange: func(name string, from gobreaker.State, to gobreaker.State) {
log.Info("Circuit breaker state changed",
"name", name,
"from", from,
"to", to,
)
},
}
recipeStoreBreaker = gobreaker.NewCircuitBreaker(settings)
}
func handleRecipe(w http.ResponseWriter, r *http.Request) {
result, err := recipeStoreBreaker.Execute(func() (interface{}, error) {
return buildRecipe(r.Context(), params)
})
if err != nil {
if errors.Is(err, gobreaker.ErrOpenState) {
http.Error(w, "Service temporarily unavailable", http.StatusServiceUnavailable)
return
}
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
recipe := result.(*recipe.Recipe)
json.NewEncoder(w).Encode(recipe)
}Reference: gobreaker
Use Case: Isolate resources for different endpoints
Implementation:
import "golang.org/x/sync/semaphore"
var (
// Separate semaphores for different endpoints
recipeSem = semaphore.NewWeighted(100) // 100 concurrent recipe requests
snapshotSem = semaphore.NewWeighted(10) // 10 concurrent snapshot requests
)
func handleRecipeWithBulkhead(w http.ResponseWriter, r *http.Request) {
// Acquire from recipe bulkhead
if !recipeSem.TryAcquire(1) {
http.Error(w, "Too many requests", http.StatusTooManyRequests)
return
}
defer recipeSem.Release(1)
// Process request
handleRecipe(w, r)
}
func handleSnapshotWithBulkhead(w http.ResponseWriter, r *http.Request) {
// Acquire from snapshot bulkhead (more expensive operation)
if !snapshotSem.TryAcquire(1) {
http.Error(w, "Too many requests", http.StatusTooManyRequests)
return
}
defer snapshotSem.Release(1)
handleSnapshot(w, r)
}Benefit: Recipe slowness doesn't affect snapshot endpoint
Use Case: Resilient calls to external APIs (recipe store, etc.)
Implementation:
import "github.com/cenkalti/backoff/v4"
func fetchRecipeWithRetry(ctx context.Context, key string) (*recipe.Recipe, error) {
var r *recipe.Recipe
operation := func() error {
var err error
r, err = recipeStore.Get(ctx, key)
// Don't retry on 404
if errors.Is(err, ErrNotFound) {
return backoff.Permanent(err)
}
return err
}
// Exponential backoff: 100ms, 200ms, 400ms, 800ms, 1.6s, 3.2s
bo := backoff.NewExponentialBackOff()
bo.InitialInterval = 100 * time.Millisecond
bo.MaxInterval = 5 * time.Second
bo.MaxElapsedTime = 30 * time.Second
err := backoff.Retry(operation, backoff.WithContext(bo, ctx))
return r, err
}Reference: backoff
Use Case: Serve stale/cached data when primary source fails
Implementation:
var (
recipeCacheTTL = 1 * time.Hour
recipeCache = sync.Map{}
)
type cachedRecipe struct {
recipe *recipe.Recipe
timestamp time.Time
}
func handleRecipeWithFallback(w http.ResponseWriter, r *http.Request) {
key := buildCacheKey(r)
// Try primary source
recipe, err := buildRecipe(r.Context(), params)
if err == nil {
// Cache successful response
recipeCache.Store(key, cachedRecipe{
recipe: recipe,
timestamp: time.Now(),
})
json.NewEncoder(w).Encode(recipe)
return
}
// Primary failed - try cache
if cached, ok := recipeCache.Load(key); ok {
cr := cached.(cachedRecipe)
age := time.Since(cr.timestamp)
log.Warn("Serving stale recipe",
"key", key,
"age", age,
"error", err,
)
w.Header().Set("X-Cache", "stale")
w.Header().Set("X-Cache-Age", age.String())
json.NewEncoder(w).Encode(cr.recipe)
return
}
// No cache available
http.Error(w, "Service unavailable", http.StatusServiceUnavailable)
}HTTP Client with Keep-Alive:
var httpClient = &http.Client{
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
DisableCompression: false,
ForceAttemptHTTP2: true,
},
Timeout: 10 * time.Second,
}
// Reuse client for all outbound requests
resp, err := httpClient.Get("https://recipe-store.example.com/recipes")In-Memory Cache with TTL:
import "github.com/patrickmn/go-cache"
var (
responseCache = cache.New(5*time.Minute, 10*time.Minute)
)
func handleRecipeWithCache(w http.ResponseWriter, r *http.Request) {
key := buildCacheKey(r)
// Check cache
if cached, found := responseCache.Get(key); found {
w.Header().Set("X-Cache", "hit")
w.Header().Set("Content-Type", "application/json")
w.Write(cached.([]byte))
return
}
// Cache miss - build recipe
recipe, err := buildRecipe(r.Context(), params)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
// Serialize and cache
data, _ := json.Marshal(recipe)
responseCache.Set(key, data, cache.DefaultExpiration)
w.Header().Set("X-Cache", "miss")
w.Header().Set("Content-Type", "application/json")
w.Write(data)
}Deduplicate Concurrent Identical Requests:
import "golang.org/x/sync/singleflight"
var requestGroup singleflight.Group
func handleRecipeWithCoalescing(w http.ResponseWriter, r *http.Request) {
key := buildCacheKey(r)
// Deduplicate requests with same key
result, err, shared := requestGroup.Do(key, func() (interface{}, error) {
return buildRecipe(r.Context(), params)
})
if shared {
w.Header().Set("X-Request-Coalesced", "true")
}
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
json.NewEncoder(w).Encode(result)
}Benefit: 10 concurrent identical requests = 1 recipe build
# Enable pprof endpoint
import _ "net/http/pprof"
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
# Capture heap profile
curl http://localhost:6060/debug/pprof/heap > heap.prof
# Analyze
go tool pprof heap.prof
(pprof) top10
(pprof) list buildRecipe
# Check for memory leaks
# Compare two profiles taken 5 minutes apart
go tool pprof -base heap1.prof heap2.prof
(pprof) top10 # Shows allocations between profilesimport "golang.org/x/time/rate"
type ipRateLimiter struct {
limiters map[string]*rate.Limiter
mu sync.RWMutex
rate rate.Limit
burst int
}
func newIPRateLimiter(r rate.Limit, b int) *ipRateLimiter {
return &ipRateLimiter{
limiters: make(map[string]*rate.Limiter),
rate: r,
burst: b,
}
}
func (i *ipRateLimiter) getLimiter(ip string) *rate.Limiter {
i.mu.RLock()
limiter, exists := i.limiters[ip]
i.mu.RUnlock()
if !exists {
i.mu.Lock()
limiter = rate.NewLimiter(i.rate, i.burst)
i.limiters[ip] = limiter
// Cleanup old limiters (simple implementation)
if len(i.limiters) > 10000 {
i.limiters = make(map[string]*rate.Limiter)
}
i.mu.Unlock()
}
return limiter
}
func (i *ipRateLimiter) middleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ip := getClientIP(r)
limiter := i.getLimiter(ip)
if !limiter.Allow() {
http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
func getClientIP(r *http.Request) string {
// Check X-Forwarded-For header (behind proxy)
xff := r.Header.Get("X-Forwarded-For")
if xff != "" {
ips := strings.Split(xff, ",")
return strings.TrimSpace(ips[0])
}
// Fall back to RemoteAddr
ip, _, _ := net.SplitHostPort(r.RemoteAddr)
return ip
}import "github.com/go-playground/validator/v10"
var validate = validator.New()
type RecipeRequest struct {
OS string `validate:"required,oneof=ubuntu rhel cos"`
OSVersion string `validate:"omitempty,semver"`
GPU string `validate:"required,oneof=h100 gb200 a100 l40"`
Service string `validate:"omitempty,oneof=eks gke aks self-managed"`
}
func handleRecipe(w http.ResponseWriter, r *http.Request) {
req := RecipeRequest{
OS: r.URL.Query().Get("os"),
OSVersion: r.URL.Query().Get("osv"),
GPU: r.URL.Query().Get("gpu"),
Service: r.URL.Query().Get("service"),
}
if err := validate.Struct(req); err != nil {
validationErrors := err.(validator.ValidationErrors)
http.Error(w, validationErrors.Error(), http.StatusBadRequest)
return
}
// Proceed with validated input
}func securityHeadersMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// HSTS
w.Header().Set("Strict-Transport-Security",
"max-age=31536000; includeSubDomains; preload")
// Prevent MIME sniffing
w.Header().Set("X-Content-Type-Options", "nosniff")
// Prevent clickjacking
w.Header().Set("X-Frame-Options", "DENY")
// XSS protection
w.Header().Set("X-XSS-Protection", "1; mode=block")
// CSP
w.Header().Set("Content-Security-Policy",
"default-src 'none'; script-src 'self'; connect-src 'self'; img-src 'self'; style-src 'self';")
// Referrer policy
w.Header().Set("Referrer-Policy", "strict-origin-when-cross-origin")
next.ServeHTTP(w, r)
})
}import "github.com/prometheus/client_golang/prometheus"
var (
recipeBuildDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "aicr_recipe_build_duration_seconds",
Help: "Time to build recipe",
Buckets: prometheus.ExponentialBuckets(0.001, 2, 12), // 1ms to 4s
},
[]string{"os", "gpu", "service"},
)
recipeCacheHits = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "aicr_recipe_cache_hits_total",
Help: "Number of recipe cache hits",
},
[]string{"cache_type"},
)
activeConnections = prometheus.NewGauge(
prometheus.GaugeOpts{
Name: "aicr_active_connections",
Help: "Number of active HTTP connections",
},
)
)
func init() {
prometheus.MustRegister(
recipeBuildDuration,
recipeCacheHits,
activeConnections,
)
}
func handleRecipe(w http.ResponseWriter, r *http.Request) {
start := time.Now()
defer func() {
duration := time.Since(start).Seconds()
recipeBuildDuration.WithLabelValues(
params.OS,
params.GPU,
params.Service,
).Observe(duration)
}()
// Check cache
if cached, found := cache.Get(key); found {
recipeCacheHits.WithLabelValues("memory").Inc()
// ...
}
// Build recipe
// ...
}import "log/slog"
func handleRecipe(w http.ResponseWriter, r *http.Request) {
// Create logger with request context
logger := slog.With(
"request_id", r.Header.Get("X-Request-ID"),
"remote_addr", r.RemoteAddr,
"user_agent", r.UserAgent(),
)
logger.Info("Handling recipe request",
"os", params.OS,
"gpu", params.GPU,
)
recipe, err := buildRecipe(r.Context(), params)
if err != nil {
logger.Error("Failed to build recipe",
"error", err,
"params", params,
)
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
logger.Info("Recipe built successfully",
"measurement_count", len(recipe.Measurements),
"duration_ms", time.Since(start).Milliseconds(),
)
json.NewEncoder(w).Encode(recipe)
}import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/trace"
)
func handleRecipe(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
tracer := otel.Tracer("aicrd")
ctx, span := tracer.Start(ctx, "handleRecipe",
trace.WithAttributes(
attribute.String("os", params.OS),
attribute.String("gpu", params.GPU),
),
)
defer span.End()
// Propagate context to child operations
recipe, err := buildRecipeWithTrace(ctx, params)
if err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
span.SetAttributes(
attribute.Int("measurement_count", len(recipe.Measurements)),
)
json.NewEncoder(w).Encode(recipe)
}
func buildRecipeWithTrace(ctx context.Context, params Params) (*recipe.Recipe, error) {
tracer := otel.Tracer("aicrd")
ctx, span := tracer.Start(ctx, "buildRecipe")
defer span.End()
// Build recipe with traced context
return builder.Build(ctx, params)
}- net/http Package - Go standard HTTP library
- golang.org/x/time/rate - Token bucket rate limiter
- errgroup - Concurrent error handling
- context Package - Request cancellation and deadlines
- slog Package - Structured logging
- Kubernetes Patterns - Deployment, scaling, networking
- Twelve-Factor App - Cloud-native application principles
- Google SRE Book - Site reliability engineering
- Release Engineering - Deployment best practices
- HTTP/2 in Go - HTTP/2 server push
- RESTful API Design - Google Cloud API design guide
- OpenAPI Specification - API documentation standard
- API Versioning - Version management strategies
- Prometheus Go Client - Metrics collection
- OpenTelemetry Go - Distributed tracing
- Grafana Dashboards - Metrics visualization
- Jaeger Tracing - Distributed tracing backend
- OWASP API Security - API security risks
- HTTP Security Headers - Security header reference
- Rate Limiting Strategies - Google Cloud guide
- mTLS in Kubernetes - Istio mutual TLS
- Go Performance Tips - Optimization techniques
- pprof Profiler - CPU and memory profiling
- High Performance Go - Dave Cheney's workshop
- Go Memory Model - Concurrency guarantees
- Circuit Breaker Pattern - Failure isolation
- Retry with Backoff - Resilient retries
- Chaos Engineering - Resilience testing principles
- SLOs and Error Budgets - Reliability targets