Skip to content

Conversation

@nlamirault
Copy link
Collaborator

@nlamirault nlamirault commented Nov 6, 2025

ai-k8s

Summary by CodeRabbit

  • New Features
    • Introduced AI gateway infrastructure with support for multiple LLM providers (OpenAI, Anthropic, Bedrock, Gemini)
    • Added observability dashboards for gateway and operations monitoring
    • Enabled distributed tracing and logging for enhanced workload tracking
    • Implemented traffic policies for AI workload protection
    • Added Homelab environment support for AI deployments

Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
@nlamirault nlamirault self-assigned this Nov 6, 2025
@nlamirault nlamirault added priority/high After critical issues are fixed, these should be dealt with before any further issues status/in_progress This issue or PR is being worked on, and has someone assigned kind/feature Categorizes issue or PR as related to a new feature area/kubernetes Kubernetes lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Nov 6, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 6, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Introduces comprehensive AI workload support through new Helm charts for kgateway (Gateway API implementation with AI extensions), kagent (MCP server), and ai application. Adds LLM provider backends (Anthropic, Bedrock, Gemini, OpenAI) with credential management. Updates networking to replace Traefik with kgateway, adds Grafana monitoring dashboards, and configures OpenTelemetry tracing integration.

Changes

Cohort / File(s) Change Summary
AI Application Chart
gitops/argocd/apps/ai/Chart.yaml, gitops/argocd/apps/ai/values.yaml, gitops/argocd/apps/ai/values-aws-staging.yaml, gitops/argocd/apps/ai/values-talos-homelab.yaml
Adds new AI application Helm chart with ArgoCD ApplicationSet configurations for AWS staging and Talos homelab environments, including k8sgpt, ollama, and kagent deployments.
Gateway API / KGateway Core
gitops/argocd/charts/gateway-api/kgateway/Chart.yaml, gitops/argocd/charts/gateway-api/kgateway/values.yaml, gitops/argocd/charts/gateway-api/kgateway/values-talos-homelab.yaml, gitops/argocd/charts/gateway-api/kgateway/README.md
Introduces KGateway Helm chart with dependencies, gateway core configuration, resource limits for Talos homelab, and chart documentation.
KGateway Gateway & Core Resources
gitops/argocd/charts/gateway-api/kgateway/templates/gateway.yaml, gitops/argocd/charts/gateway-api/kgateway/templates/gatewayparameters.yaml, gitops/argocd/charts/gateway-api/kgateway/templates/agent-gateway-configmap.yaml
Adds Kubernetes Gateway resource, GatewayParameters custom resource with AI extension configuration, and agent gateway ConfigMap with OpenTelemetry tracing setup.
LLM Provider Backends
gitops/argocd/charts/gateway-api/kgateway/templates/*{anthropic,bedrock,gemini,openai}-backend.yaml
Adds conditional Backend resources for four LLM providers (Anthropic, Bedrock, Gemini, OpenAI) with AI type spec and authentication configuration.
LLM Provider Credentials
gitops/argocd/charts/gateway-api/kgateway/templates/*{anthropic,bedrock,gemini,openai}-credentials.yaml
Adds ExternalSecret templates for each LLM provider to fetch credentials from akeyless ClusterSecretStore with 1-hour refresh intervals.
LLM Provider Routes
gitops/argocd/charts/gateway-api/kgateway/templates/*{anthropic,bedrock,gemini,openai}-httproute.yaml
Adds HTTPRoute resources routing path prefixes to respective LLM provider backends with retry (3 attempts) and 20-second timeout policies.
KGateway Observability
gitops/argocd/charts/gateway-api/kgateway/templates/logging-httplistenerpolicy.yaml, gitops/argocd/charts/gateway-api/kgateway/templates/tracing-httplistenerpolicy.yaml, gitops/argocd/charts/gateway-api/kgateway/templates/configmap-dashboards.yaml, gitops/argocd/charts/gateway-api/kgateway/templates/trafficpolicy.yaml, gitops/argocd/charts/gateway-api/kgateway/dashboards/networking/*
Adds HTTPListenerPolicy templates for logging and tracing, ConfigMap dashboard generator, TrafficPolicy for prompt guard, and Grafana dashboards (kgateway-envoy, kgateway-operations).
MCP Kagent Chart
gitops/argocd/charts/mcp/kagent/Chart.yaml, gitops/argocd/charts/mcp/kagent/README.md, gitops/argocd/charts/mcp/kagent/values.yaml, gitops/argocd/charts/mcp/kagent/values-talos-homelab.yaml
Introduces Kagent (MCP server) Helm chart with comprehensive values for database, providers (Anthropic, Bedrock, Gemini, OpenAI), agents (k8s, kgateway, istio, promql, observability, etc.), and OpenTelemetry tracing configuration.
Kagent Credentials & Resources
gitops/argocd/charts/mcp/kagent/templates/*{anthropic,bedrock,gemini,openai}-credentials.yaml, gitops/argocd/charts/mcp/kagent/templates/github-credentials.yaml, gitops/argocd/charts/mcp/kagent/templates/github-mcp.yaml
Adds ExternalSecret templates for LLM provider credentials and GitHub PAT, plus MCPServer custom resource for GitHub MCP server deployment.
CRD Dependencies
gitops/argocd/charts/crds/crds/Chart.yaml
Adds kgateway-crds (v0.0.1) and kagent-crds (0.7.4) dependencies to the CRD umbrella chart.
OpenTelemetry Integration
gitops/argocd/charts/opentelemetry/opentelemetry-collector/templates/logging-referencegrant.yaml, gitops/argocd/charts/opentelemetry/opentelemetry-collector/templates/tracing-referencegrant.yaml, gitops/argocd/charts/opentelemetry/opentelemetry-collector/values.yaml
Adds ReferenceGrant templates for OpenTelemetry logging and tracing access control, and configures gateway.opentelemetry values for service discovery.
Networking Updates
gitops/argocd/apps/networking/values-aws-staging.yaml, gitops/argocd/apps/networking/values-talos-homelab.yaml
Updates targetRevision to HEAD for multiple apps, adds kgateway application, reassigns envoy-gateway to gateway-api namespace, and replaces traefik with kgateway in Talos homelab.
Stack Configuration
gitops/argocd/stacks/values-talos-homelab-ai.yaml
Adds new Helm values file for Talos homelab AI stack with portefaix-talos-homelab project and Slack notifications.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Gateway as KGateway<br/>(HTTP Listener)
    participant HTTPRoute as HTTPRoute<br/>(/anthropic)
    participant Backend as Backend<br/>(AI Type)
    participant LLM as Anthropic<br/>LLM Provider
    participant Telemetry as OpenTelemetry<br/>Collector

    Client->>Gateway: HTTP Request
    Gateway->>HTTPRoute: Route /{anthropic-name}
    HTTPRoute->>Backend: Forward to Backend
    Backend->>LLM: Call LLM API<br/>(with auth token)
    LLM-->>Backend: LLM Response
    Backend-->>HTTPRoute: Response
    HTTPRoute-->>Client: HTTP 200 + Result
    
    rect rgb(240, 248, 255)
    Gateway->>Telemetry: Send traces<br/>(OTLP gRPC)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • High file count and heterogeneity: Multiple new Helm charts (kgateway, kagent) with diverse template patterns across LLM provider backends, credentials, routes, observability, and MCP server resources. Each template pair (backend/credentials/httproute) follows a pattern but requires verification of configuration correctness.
  • LLM provider implementations: Eight backend/credential/route template triplets (4 providers × 3 templates) with subtle configuration differences per provider (model, region, apiVersion, auth fields) requiring careful review for accuracy.
  • Infrastructure complexity: New CRD dependencies, OpenTelemetry ReferenceGrant integrations, Grafana dashboard JSON, and conditional template logic for feature flags and namespaces.
  • Potential issues to verify:
    • LLM provider credential mappings to ExternalSecret keys (e.g., ANTHROPIC_API_KEY vs Authorization mapping consistency across templates)
    • HTTPRoute path prefix patterns and backend service references correctness
    • Grafana dashboard JSON panel queries and datasource templating
    • OpenTelemetry trace endpoint configuration and service discovery
    • CRD version compatibility and dependency ordering

Possibly related PRs

Suggested labels

dependency/argo, dependency/helm

Poem

🐰 A rabbit hops through gateways new,
LLMs whisper wisdom true,
With traces glowing, dashboards bright,
AI agents dance through the night!
From Anthropic to Bedrock's call,
This infrastructure captures all. 🌟

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/k8s-ai

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd6e69 and 5f8e45e.

📒 Files selected for processing (46)
  • gitops/argocd/apps/ai/Chart.yaml (1 hunks)
  • gitops/argocd/apps/ai/values-aws-staging.yaml (1 hunks)
  • gitops/argocd/apps/ai/values-talos-homelab.yaml (1 hunks)
  • gitops/argocd/apps/ai/values.yaml (1 hunks)
  • gitops/argocd/apps/networking/values-aws-staging.yaml (1 hunks)
  • gitops/argocd/apps/networking/values-talos-homelab.yaml (1 hunks)
  • gitops/argocd/charts/crds/crds/Chart.yaml (2 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/Chart.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/README.md (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/dashboards/networking/kgateway-envoy.json (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/dashboards/networking/kgateway-operations.json (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/agent-gateway-configmap.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/anthropic-backend.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/anthropic-credentials.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/anthropic-httproute.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/bedrock-backend.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/bedrock-credentials.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/bedrock-httproute.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/configmap-dashboards.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/gateway.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/gatewayparameters.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/gemini-backend.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/gemini-credentials.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/gemini-httproute.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/logging-httplistenerpolicy.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/openai-backend.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/openai-credentials.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/openai-httproute.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/templates/tracing-httplistenerpolicy.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/trafficpolicy.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/values-talos-homelab.yaml (1 hunks)
  • gitops/argocd/charts/gateway-api/kgateway/values.yaml (1 hunks)
  • gitops/argocd/charts/mcp/kagent/Chart.yaml (1 hunks)
  • gitops/argocd/charts/mcp/kagent/README.md (1 hunks)
  • gitops/argocd/charts/mcp/kagent/templates/anthropic-credentials.yaml (1 hunks)
  • gitops/argocd/charts/mcp/kagent/templates/bedrock-credentials.yaml (1 hunks)
  • gitops/argocd/charts/mcp/kagent/templates/gemini-credentials.yaml (1 hunks)
  • gitops/argocd/charts/mcp/kagent/templates/github-credentials.yaml (1 hunks)
  • gitops/argocd/charts/mcp/kagent/templates/github-mcp.yaml (1 hunks)
  • gitops/argocd/charts/mcp/kagent/templates/openai-credentials.yaml (1 hunks)
  • gitops/argocd/charts/mcp/kagent/values-talos-homelab.yaml (1 hunks)
  • gitops/argocd/charts/mcp/kagent/values.yaml (1 hunks)
  • gitops/argocd/charts/opentelemetry/opentelemetry-collector/templates/logging-referencegrant.yaml (1 hunks)
  • gitops/argocd/charts/opentelemetry/opentelemetry-collector/templates/tracing-referencegrant.yaml (1 hunks)
  • gitops/argocd/charts/opentelemetry/opentelemetry-collector/values.yaml (1 hunks)
  • gitops/argocd/stacks/values-talos-homelab-ai.yaml (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the size/xl Size XL label Nov 6, 2025
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
@nlamirault nlamirault changed the title AI Gateway AI on Kubernetes Nov 6, 2025
@nlamirault nlamirault marked this pull request as ready for review November 6, 2025 13:48
@nlamirault nlamirault merged commit 378cdcf into master Nov 6, 2025
10 checks passed
@nlamirault nlamirault deleted the feat/k8s-ai branch November 6, 2025 13:48
@dosubot dosubot bot added cloud/aws Cloud Provider / Amazon AWS cloud/homelab Cloud Provider / Homelab dependency/argo Dependency Argo priority/medium This issue or PR may be useful, and needs some attention labels Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/kubernetes Kubernetes cloud/aws Cloud Provider / Amazon AWS cloud/homelab Cloud Provider / Homelab dependency/argo Dependency Argo kind/feature Categorizes issue or PR as related to a new feature lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/high After critical issues are fixed, these should be dealt with before any further issues priority/medium This issue or PR may be useful, and needs some attention size/xl Size XL size/XXL status/in_progress This issue or PR is being worked on, and has someone assigned

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants