diff --git a/config/charts/inferencepool/Chart.yaml b/config/charts/inferencepool/Chart.yaml index f98153c50..b412e37ce 100644 --- a/config/charts/inferencepool/Chart.yaml +++ b/config/charts/inferencepool/Chart.yaml @@ -7,3 +7,9 @@ type: application version: 0.0.0 appVersion: "0.0.0" + +dependencies: + - name: jaeger + version: "2.11.0" + repository: "https://jaegertracing.github.io/helm-charts" + condition: jaeger.enabled diff --git a/config/charts/inferencepool/README.md b/config/charts/inferencepool/README.md index 9fbbfb9bf..366da4e36 100644 --- a/config/charts/inferencepool/README.md +++ b/config/charts/inferencepool/README.md @@ -237,6 +237,93 @@ inferenceExtension: Make sure that the `otelExporterEndpoint` points to your OpenTelemetry collector endpoint. Current only the `parentbased_traceidratio` sampler is supported. You can adjust the base sampling ratio using the `samplerArg` (e.g., 0.1 means 10% of traces will be sampled). +#### Jaeger Tracing Backend + +GAIE provides an opt-in Jaeger all-in-one deployment as a sub-chart for easy trace collection and visualization. This is particularly useful for development, testing, and understanding how inference requests are processed (filtered, scored) and forwarded to vLLM models. + +**Quick Start with Jaeger:** + +To install the InferencePool with Jaeger tracing enabled: + +```bash +# Update Helm dependencies to fetch Jaeger chart +helm dependency update ./config/charts/inferencepool + +# Install with Jaeger enabled +helm install vllm-llama3-8b-instruct ./config/charts/inferencepool \ + --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \ + --set inferenceExtension.tracing.enabled=true \ + --set jaeger.enabled=true +``` + +Or using a `values.yaml` file: + +```yaml +inferenceExtension: + tracing: + enabled: true + sampling: + sampler: "parentbased_traceidratio" + samplerArg: "1.0" # 100% sampling for development + +jaeger: + enabled: true +``` + +Then install: + +```bash +helm dependency update ./config/charts/inferencepool +helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml +``` + +**Accessing Jaeger UI:** + +Once deployed, you can access the Jaeger UI to visualize traces: + +```bash +# Port-forward to access Jaeger UI +kubectl port-forward svc/vllm-llama3-8b-instruct-jaeger-query 16686:16686 + +# Open browser to http://localhost:16686 +``` + +In the Jaeger UI, you can: +- Search for traces by service name (`gateway-api-inference-extension`) +- View detailed span information showing filter and scorer execution +- Analyze request routing decisions and latency +- Understand the complete inference request flow + +**Configuration Options:** + +The Jaeger sub-chart supports the following configuration: + +| **Parameter Name** | **Description** | **Default** | +|---------------------------------------|-----------------------------------------------------------------------------------------------------|----------------------------------| +| `jaeger.enabled` | Enable Jaeger all-in-one deployment | `false` | +| `jaeger.allInOne.enabled` | Enable all-in-one deployment mode | `true` | +| `jaeger.allInOne.image.repository` | Jaeger all-in-one image repository | `jaegertracing/all-in-one` | +| `jaeger.allInOne.image.tag` | Jaeger image tag | `1.62` | +| `jaeger.allInOne.resources.limits` | Resource limits for Jaeger pod | `cpu: 500m, memory: 512Mi` | +| `jaeger.allInOne.resources.requests` | Resource requests for Jaeger pod | `cpu: 100m, memory: 128Mi` | +| `jaeger.query.service.type` | Jaeger UI service type | `ClusterIP` | +| `jaeger.query.service.port` | Jaeger UI port | `16686` | +| `jaeger.collector.service.otlp.grpc.port` | OTLP gRPC collector port | `4317` | +| `jaeger.storage.type` | Storage backend type (memory, elasticsearch, cassandra, etc.) | `memory` | + +**Important Notes:** + +1. **Development vs Production**: The all-in-one deployment uses in-memory storage and is suitable for development and testing. For production use, consider: + - Using a persistent storage backend (Elasticsearch, Cassandra, etc.) + - Deploying Jaeger components separately for better scalability + - Refer to [Jaeger Production Deployment](https://www.jaegertracing.io/docs/latest/deployment/) for best practices + +2. **Automatic Configuration**: When `jaeger.enabled=true`, the OTLP exporter endpoint is automatically configured to point to the Jaeger collector. You don't need to manually set `inferenceExtension.tracing.otelExporterEndpoint`. + +3. **Sampling Rate**: For development, you may want to set `samplerArg: "1.0"` to capture all traces. For production, use a lower value like `"0.1"` (10%) to reduce overhead. + +4. **Resource Requirements**: Adjust the resource limits based on your trace volume and cluster capacity. + ## Notes This chart will only deploy an InferencePool and its corresponding EndpointPicker extension. Before install the chart, please make sure that the inference extension CRDs are installed in the cluster. For more details, please refer to the [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/). diff --git a/config/charts/inferencepool/templates/epp-deployment.yaml b/config/charts/inferencepool/templates/epp-deployment.yaml index 10eb2907a..73566b8d0 100644 --- a/config/charts/inferencepool/templates/epp-deployment.yaml +++ b/config/charts/inferencepool/templates/epp-deployment.yaml @@ -114,7 +114,11 @@ spec: - name: OTEL_SERVICE_NAME value: "gateway-api-inference-extension" - name: OTEL_EXPORTER_OTLP_ENDPOINT + {{- if .Values.jaeger.enabled }} + value: "http://{{ .Release.Name }}-jaeger-collector:4317" + {{- else }} value: {{ .Values.inferenceExtension.tracing.otelExporterEndpoint | quote }} + {{- end }} - name: OTEL_TRACES_EXPORTER value: "otlp" - name: OTEL_RESOURCE_ATTRIBUTES_NODE_NAME diff --git a/config/charts/inferencepool/values.yaml b/config/charts/inferencepool/values.yaml index 8b3385ab1..49b1688f1 100644 --- a/config/charts/inferencepool/values.yaml +++ b/config/charts/inferencepool/values.yaml @@ -58,6 +58,8 @@ inferenceExtension: enabled: false tracing: enabled: false + # When jaeger.enabled is true, this will automatically point to the Jaeger collector + # Otherwise, you can specify your own OpenTelemetry collector endpoint otelExporterEndpoint: "http://localhost:4317" sampling: sampler: "parentbased_traceidratio" @@ -94,4 +96,43 @@ istio: trafficPolicy: {} # connectionPool: # http: - # maxRequestsPerConnection: 256000 \ No newline at end of file + # maxRequestsPerConnection: 256000 + +# Jaeger tracing backend configuration +# When enabled, deploys Jaeger all-in-one for trace collection and visualization +jaeger: + enabled: false + # Use the all-in-one deployment mode for simplicity + # For production, consider using a more robust deployment with separate components + allInOne: + enabled: true + image: + repository: jaegertracing/all-in-one + tag: "2.11" + pullPolicy: IfNotPresent + resources: + limits: + cpu: 500m + memory: 512Mi + requests: + cpu: 100m + memory: 128Mi + # Expose Jaeger UI service + query: + service: + type: ClusterIP + port: 16686 + # Collector configuration for OTLP + collector: + service: + otlp: + grpc: + port: 4317 + http: + port: 4318 + # Storage configuration - use in-memory for simplicity + storage: + type: memory + # Agent configuration + agent: + enabled: false \ No newline at end of file