Skip to content

Latest commit

 

History

History
422 lines (306 loc) · 18.8 KB

File metadata and controls

422 lines (306 loc) · 18.8 KB

How to Deploy with Helm

This guide provides step-by-step instructions for deploying the Smart Traffic Intersection Agent application using Helm.

Prerequisites

Before you begin, ensure that you have the following prerequisites:

  • Kubernetes cluster set up and running.
  • The cluster must support dynamic provisioning of Persistent Volumes (PV). Refer to the Kubernetes Dynamic Provisioning Guide for more details.
  • Install kubectl on your system. Refer to the Installation Guide. Ensure access to the Kubernetes cluster.
  • Helm installed on your system: Installation Guide.
  • A running Smart Intersection deployment (provides MQTT broker, camera pipelines, and scene analytics). See Step 4 below.
  • The SceneScape CA certificate file (scenescape-ca.pem) for TLS connections to the MQTT broker (created during the Smart Intersection installation).
  • (Optional) A Hugging Face API token if the VLM model requires authentication.
  • Storage Requirement: The VLM model cache PVC requests 20 GiB by default. Ensure the cluster has sufficient storage available.
  • (Optional — GPU inference) To run VLM inference on an Intel GPU:
    • An Intel integrated, Arc, or Data Center GPU must be available on at least one worker node.
    • The Intel GPU device plugin for Kubernetes must be installed so that GPU resources (e.g., gpu.intel.com/i915 or gpu.intel.com/xe) are advertised to the scheduler. Verify by running:
      kubectl describe node <gpu-node> | grep gpu.intel.com
    • The /dev/dri/renderD* device must be accessible inside containers. The Helm chart automatically adds the correct supplementalGroups entry for the render group.

Steps to Deploy with Helm

The following steps walk through deploying the Smart Traffic Intersection Agent application using Helm. You can install from source code or pull the chart from a registry.

Steps 1 to 3 vary depending on whether you prefer to build or pull the Helm chart.

Option 1: Install from a Registry

Step 1: Pull the Chart

Use the following command to pull the Helm chart:

helm pull oci://registry-1.docker.io/intel/smart-traffic-intersection-agent --version 1.0.0-helm

Step 2: Extract the .tgz File

After pulling the chart, extract the .tgz file:

tar -xvf smart-traffic-intersection-agent-1.0.0-helm.tgz

Navigate to the extracted directory:

cd smart-traffic-intersection-agent

Step 3: Configure the values.yaml File

Edit the values.yaml file to set the necessary environment variables. Refer to the values reference table below.


Option 2: Install from Source

Step 1: Clone the Repository

Clone the repository containing the Helm chart:

# Clone the release branch
git clone https://github.com/open-edge-platform/edge-ai-suites.git -b release-2026.0.0

Step 2: Change to the Chart Directory

Navigate to the chart directory:

cd edge-ai-suites/metro-ai-suite/smart-traffic-intersection-agent/chart

Step 3: Configure the values.yaml File

Edit the values.yaml file located in the chart directory to set the necessary environment variables. Refer to the values reference table below.


Common Steps After Configuration

Step 4: Deploy Smart Intersection

The Smart Traffic Intersection Agent depends on a running Smart Intersection deployment, which includes SceneScape. It provides the MQTT broker, camera pipelines, and scene analytics that the Traffic Agent consumes.

Follow the Smart Intersection Helm Deployment Guide to deploy it. Once all Smart Intersection pods are running and the MQTT broker is reachable, proceed to the next step.

Step 5: Configure GPU Support (Optional)

By default, the chart deploys VLM inference on an Intel GPU. To change graph or verify GPU configuration, edit the following values in values.yaml:

Value Description Default
vlmServing.gpu.enabled Enable Intel GPU for VLM inference. When true, VLM_DEVICE is automatically set to GPU and workers are forced to 1. true
vlmServing.gpu.resourceName Kubernetes GPU resource name exposed by the Intel device plugin. Use gpu.intel.com/i915 for integrated/Arc GPUs, gpu.intel.com/xe for Data Center GPU Flex/Max. gpu.intel.com/i915
vlmServing.gpu.resourceLimit Number of GPU devices to request 1
vlmServing.gpu.renderGroupIds List of render group GIDs for /dev/dri access. Defaults cover all common distros. [44, 109, 992]
vlmServing.nodeSelector Pin VLM pod to nodes with GPUs (e.g., intel.feature.node.kubernetes.io/gpu: "true") {}

Identify your cluster's GPU resource key by running:

kubectl describe node <gpu-node> | grep gpu.intel.com

To deploy on CPU instead, set:

helm install stia . -n <your-namespace> --create-namespace \
  --set vlmServing.gpu.enabled=false

Note: The OV_CONFIG environment variable is automatically set based on the device. When GPU is enabled, CPU-only options like INFERENCE_NUM_THREADS are excluded to avoid runtime errors.

Step 6: Deploy the Helm Chart

Deploy the Smart Traffic Intersection Agent Helm chart:

helm install stia . -n <your-namespace> --create-namespace

Note: Please make sure to use the same namespace as the Smart Intersection application. Default namespace for Smart Intersection is smart-intersection.

Note: The VLM OpenVINO Serving pod will download and convert the model on first startup. This may take several minutes depending on network speed and model size. To avoid re-downloading the model on every install cycle, set vlmServing.persistence.keepOnUninstall to true (the default). This tells Helm to retain the model cache PVC on uninstall.

Step 7: Verify the Deployment

Check the status of the deployed resources to ensure everything is running correctly:

kubectl get pods -n <your-namespace>
kubectl get services -n <your-namespace>

You should see two pods:

Pod Description
stia-traffic-agent-* The traffic intersection agent (backend + Gradio UI)
stia-vlm-openvino-serving-* The VLM inference server

Wait until both pods show Running and READY 1/1:

kubectl wait --for=condition=ready pod -l app.kubernetes.io/instance=stia -n <your-namespace> --timeout=600s

Step 8: Access the Application

Using NodePort (default)

The chart deploys services as NodePort by default. Retrieve the allocated ports and a node IP:

# Get the NodePort values
kubectl get svc stia-traffic-agent -n <your-namespace>

# Get the node IP
kubectl get nodes -o wide
# Use the INTERNAL-IP of any node

Then open your browser at:

http://<node-ip>:<backend-node-port>   # Backend API (default NodePort: 30881)
http://<node-ip>:<ui-node-port>         # Gradio UI   (default NodePort: 30860)

Note: If you are behind a corporate proxy, make sure the node IPs are included in your no_proxy / browser proxy exceptions.

Using Port-Forward (ClusterIP)

If you changed the service type to ClusterIP in values.yaml:

# Traffic Agent Backend API
kubectl port-forward svc/stia-traffic-agent 8081:8081 -n <your-namespace> &

# Traffic Agent Gradio UI
kubectl port-forward svc/stia-traffic-agent 7860:7860 -n <your-namespace> &

Then open your browser at:

  • Backend API: http://127.0.0.1:8081/docs
  • Gradio UI: http://127.0.0.1:7860

Step 9: Uninstall the Helm Chart

To uninstall the deployed Helm chart:

helm uninstall stia -n <your-namespace>

Note: When vlmServing.persistence.keepOnUninstall is true (the default), the VLM model cache PVC is retained after uninstall to avoid re-downloading the model. This is recommended during development and testing. To fully clean up all PVCs:

kubectl get pvc -n <your-namespace>
kubectl delete pvc <pvc-name> -n <your-namespace>

To have Helm delete the PVC automatically on uninstall, set vlmServing.persistence.keepOnUninstall=false before deploying.


values.yaml Reference

Global Settings

Key Description Default
global.proxy.httpProxy HTTP proxy URL ""
global.proxy.httpsProxy HTTPS proxy URL ""
global.proxy.noProxy Comma-separated no-proxy list ""

Traffic Agent Settings

Key Description Default
trafficAgent.image.repository Traffic agent container image repository intel/smart-traffic-intersection-agent
trafficAgent.image.tag Image tag 1.0.0
trafficAgent.service.type Kubernetes service type (NodePort or ClusterIP) NodePort
trafficAgent.service.backendPort Backend API port 8081
trafficAgent.service.backendNodePort NodePort for backend API (only used when type is NodePort) 30881
trafficAgent.service.uiPort Gradio UI port 7860
trafficAgent.service.uiNodePort NodePort for Gradio UI (only used when type is NodePort) 30860
trafficAgent.intersection.name Unique intersection identifier intersection_1
trafficAgent.intersection.latitude Intersection latitude 37.51358
trafficAgent.intersection.longitude Intersection longitude -122.25591
trafficAgent.env.logLevel Application log level INFO
trafficAgent.env.refreshInterval Dashboard refresh interval (seconds) 15
trafficAgent.env.weatherMock Use mock weather data (true/false) false
trafficAgent.env.vlmTimeoutSeconds Timeout for VLM inference requests (seconds) 600
trafficAgent.mqtt.host MQTT broker hostname (SceneScape K8s service name) smart-intersection-broker
trafficAgent.mqtt.port MQTT broker port 1883
trafficAgent.traffic.highDensityThreshold Object count for high-density classification 10
trafficAgent.traffic.moderateDensityThreshold Object count for moderate-density classification ""
trafficAgent.traffic.bufferDuration Traffic analysis buffer window ""
trafficAgent.persistence.enabled Enable persistent storage for agent data true
trafficAgent.persistence.size PVC size for agent data 1Gi
trafficAgent.persistence.storageClass Storage class (empty = cluster default) ""

VLM OpenVINO Serving Settings

Key Description Default
vlmServing.image.repository VLM serving container image repository intel/vlm-openvino-serving
vlmServing.image.tag Image tag 1.3.2
vlmServing.service.type Kubernetes service type (NodePort or ClusterIP) NodePort
vlmServing.service.port VLM HTTP API port 8000
vlmServing.service.nodePort NodePort for VLM API (only used when type is NodePort) 30800
vlmServing.env.modelName Hugging Face model identifier microsoft/Phi-3.5-vision-instruct
vlmServing.env.compressionWeightFormat Model weight format (int4, int8, fp16) int4
vlmServing.env.device OpenVINO inference device when GPU is disabled (CPU or GPU). Ignored when vlmServing.gpu.enabled=true (auto-set to GPU). CPU
vlmServing.env.maxCompletionTokens Max tokens per completion 1500
vlmServing.env.workers Number of serving workers. Forced to 1 when GPU is enabled. 1
vlmServing.env.logLevel VLM serving log level info
vlmServing.env.openvinoLogLevel OpenVINO runtime log level 1
vlmServing.env.accessLogFile Access log file path (/dev/null to suppress) /dev/null
vlmServing.env.seed Random seed for reproducible inference 42
vlmServing.env.ovConfigCpu OpenVINO config JSON for CPU mode (supports INFERENCE_NUM_THREADS) {"PERFORMANCE_HINT": "LATENCY", "INFERENCE_NUM_THREADS": 32}
vlmServing.env.ovConfigGpu OpenVINO config JSON for GPU mode (includes GPU model cache) {"PERFORMANCE_HINT": "LATENCY", "CACHE_DIR": "/app/ov-model/gpu-cache"}
vlmServing.huggingfaceToken Hugging Face API token (stored as a Secret) ""
vlmServing.gpu.enabled Enable Intel GPU for VLM inference. Auto-sets VLM_DEVICE=GPU and WORKERS=1. true
vlmServing.gpu.resourceName Kubernetes GPU resource name exposed by the Intel device plugin (gpu.intel.com/i915 or gpu.intel.com/xe) gpu.intel.com/i915
vlmServing.gpu.resourceLimit Number of GPU devices to request 1
vlmServing.gpu.renderGroupIds List of GIDs for the render group added to supplementalGroups for /dev/dri access. All common distro values are included by default (44, 109, 992). [44, 109, 992]
vlmServing.nodeSelector Pin VLM pod to GPU nodes (e.g., intel.feature.node.kubernetes.io/gpu: "true") {}
vlmServing.persistence.enabled Enable persistent storage for model cache true
vlmServing.persistence.size PVC size for model cache 20Gi
vlmServing.persistence.storageClass Storage class (empty = cluster default) ""
vlmServing.persistence.keepOnUninstall Retain PVC on helm uninstall to avoid re-downloading the model true

TLS / Secrets Settings

Key Description Default
tls.caCert PEM-encoded CA certificate for the MQTT broker (base64-encoded in the Secret) ""
tls.caCertSecretName Name of an existing Secret containing the CA cert (overrides tls.caCert) smart-intersection-broker-rootcert
tls.caCertKey Key name inside the external secret (required when caCertSecretName is set) root-cert

Example: Minimal Deployment

# values-override.yaml
global:
  proxy:
    httpProxy: "http://proxy.example.com:8080"
    httpsProxy: "http://proxy.example.com:8080"
    noProxy: "localhost,127.0.0.1,10.0.0.0/8,.example.com"

trafficAgent:
  intersection:
    name: "intersection_main_st"
    latitude: "37.7749"
    longitude: "-122.4194"
  mqtt:
    host: "smart-intersection-broker"

tls:
  caCert: |
    -----BEGIN CERTIFICATE-----
    MIIDxTCCA...
    -----END CERTIFICATE-----
helm install stia . -n traffic -f values-override.yaml --create-namespace

Example: GPU Deployment

To deploy VLM inference on an Intel GPU (the default), ensure vlmServing.gpu.enabled is true and the GPU resource name matches your cluster:

# values-gpu-override.yaml
vlmServing:
  gpu:
    enabled: true
    # Use "gpu.intel.com/i915" for integrated / Arc A-series
    # Use "gpu.intel.com/xe" for Data Center GPU Flex / Max
    resourceName: "gpu.intel.com/i915"
    resourceLimit: 1
    # All common render group GIDs included by default — works across distros
    renderGroupIds:
      - 44
      - 109
      - 992
  # Optional: pin to GPU nodes
  nodeSelector:
    intel.feature.node.kubernetes.io/gpu: "true"
  persistence:
    keepOnUninstall: true
helm install stia . -n traffic -f values-override.yaml -f values-gpu-override.yaml --create-namespace

Example: CPU-Only Deployment

To run VLM inference on CPU:

helm install stia . -n traffic -f values-override.yaml \
  --set vlmServing.gpu.enabled=false \
  --create-namespace

Verification

  • Ensure that all pods are running and the services are accessible.

  • Access the Gradio UI and verify that it is showing the traffic intersection dashboard.

  • Check the backend API at /docs for the interactive Swagger documentation.

  • Verify that the traffic agent is receiving MQTT messages from SceneScape by checking the logs:

    kubectl logs -l app=stia-traffic-agent -n <your-namespace> -f

Troubleshooting

  • If you encounter any issues during the deployment process, check the Kubernetes logs for errors:

    kubectl logs <pod-name> -n <your-namespace>
  • VLM pod stuck in CrashLoopBackOff: The model download may have failed. Check logs and verify proxy settings (global.proxy.httpProxy / global.proxy.httpsProxy) and huggingfaceToken if the model requires authentication.

  • VLM model download stuck or not progressing: Verify that proxy environment variables are correctly set inside the pod. A common cause is a mismatch between values.yaml key names and the template references (e.g., http_proxy vs httpProxy). Check with:

    kubectl exec <vlm-pod-name> -n <your-namespace> -- env | grep -i proxy
  • Option not found: INFERENCE_NUM_THREADS error on GPU: This occurs when the OV_CONFIG contains CPU-only options while running on GPU. Ensure vlmServing.env.ovConfigGpu does not include INFERENCE_NUM_THREADS. The chart automatically selects the correct config (ovConfigCpu or ovConfigGpu) based on vlmServing.gpu.enabled.

  • GPU not detected / VLM pod Pending: Verify the Intel GPU device plugin is installed and the GPU resource is available:

    kubectl describe node <gpu-node> | grep gpu.intel.com

    If no GPU resource is listed, install the Intel GPU device plugin for Kubernetes. Also verify that vlmServing.gpu.resourceName matches the resource key reported by the device plugin (gpu.intel.com/i915 for integrated/Arc, gpu.intel.com/xe for Data Center GPUs).

  • GPU permission denied (/dev/dri access): The chart includes all common render group GIDs (44, 109, 992) by default. If your distro uses a different GID, find it with getent group render on the node and override:

    helm install stia . --set-json 'vlmServing.gpu.renderGroupIds=[<your-gid>]'
  • Traffic agent cannot connect to MQTT broker: Verify that the SceneScape deployment is reachable from the cluster, the trafficAgent.mqtt.host value is correct, and the CA certificate is provided via tls.caCert or tls.caCertSecretName.

  • PVC not cleaned up after uninstall: When vlmServing.persistence.keepOnUninstall is true (the default), the model cache PVC is intentionally retained. To reclaim storage, delete it manually:

    # List the PVCs present in the given namespace
    kubectl get pvc -n <your-namespace>
    
    # Delete the required PVC from the namespace
    kubectl delete pvc <pvc-name> -n <your-namespace>

Related Links