Skip to content

Commit 7886d5c

Browse files
docs: Add opentelemetry docs (feast-dev#5048)
1 parent 9265cfc commit 7886d5c

File tree

3 files changed

+154
-0
lines changed

3 files changed

+154
-0
lines changed

docs/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
* [Batch Materialization Engine](getting-started/components/batch-materialization-engine.md)
3838
* [Provider](getting-started/components/provider.md)
3939
* [Authorization Manager](getting-started/components/authz_manager.md)
40+
* [OpenTelemetry Integration](getting-started/components/open-telemetry.md)
4041
* [Third party integrations](getting-started/third-party-integrations.md)
4142
* [FAQ](getting-started/faq.md)
4243

docs/getting-started/components/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,7 @@
2727
{% content-ref url="authz_manager.md" %}
2828
[authz_manager.md](authz_manager.md)
2929
{% endcontent-ref %}
30+
31+
{% content-ref url="open-telemetry.md" %}
32+
[open-telemetry.md](open-telemetry.md)
33+
{% endcontent-ref %}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# OpenTelemetry Integration
2+
3+
The OpenTelemetry integration in Feast provides comprehensive monitoring and observability capabilities for your feature serving infrastructure. This component enables you to track key metrics, traces, and logs from your Feast deployment.
4+
5+
## Motivation
6+
7+
Monitoring and observability are critical for production machine learning systems. The OpenTelemetry integration addresses these needs by:
8+
9+
1. **Performance Monitoring:** Track CPU and memory usage of feature servers
10+
2. **Operational Insights:** Collect metrics to understand system behavior and performance
11+
3. **Troubleshooting:** Enable effective debugging through distributed tracing
12+
4. **Resource Optimization:** Monitor resource utilization to optimize deployments
13+
5. **Production Readiness:** Provide enterprise-grade observability capabilities
14+
15+
## Architecture
16+
17+
The OpenTelemetry integration in Feast consists of several components working together:
18+
19+
- **OpenTelemetry Collector:** Receives, processes, and exports telemetry data
20+
- **Prometheus Integration:** Enables metrics collection and monitoring
21+
- **Instrumentation:** Automatic Python instrumentation for tracking metrics
22+
- **Exporters:** Components that send telemetry data to monitoring systems
23+
24+
## Key Features
25+
26+
1. **Automated Instrumentation:** Python auto-instrumentation for comprehensive metric collection
27+
2. **Metric Collection:** Track key performance indicators including:
28+
- Memory usage
29+
- CPU utilization
30+
- Request latencies
31+
- Feature retrieval statistics
32+
3. **Flexible Configuration:** Customizable metric collection and export settings
33+
4. **Kubernetes Integration:** Native support for Kubernetes deployments
34+
5. **Prometheus Compatibility:** Integration with Prometheus for metrics visualization
35+
36+
## Setup and Configuration
37+
38+
To add monitoring to the Feast Feature Server, follow these steps:
39+
40+
### 1. Deploy Prometheus Operator
41+
Follow the [Prometheus Operator documentation](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md) to install the operator.
42+
43+
### 2. Deploy OpenTelemetry Operator
44+
Before installing the OpenTelemetry Operator:
45+
1. Install `cert-manager`
46+
2. Validate that the `pods` are running
47+
3. Apply the OpenTelemetry operator:
48+
```bash
49+
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
50+
```
51+
52+
For additional installation steps, refer to the [OpenTelemetry Operator documentation](https://github.com/open-telemetry/opentelemetry-operator).
53+
54+
### 3. Configure OpenTelemetry Collector
55+
Add the OpenTelemetry Collector configuration under the metrics section in your values.yaml file:
56+
57+
```yaml
58+
metrics:
59+
enabled: true
60+
otelCollector:
61+
endpoint: "otel-collector.default.svc.cluster.local:4317" # sample
62+
headers:
63+
api-key: "your-api-key"
64+
```
65+
66+
### 4. Add Instrumentation Configuration
67+
Add the following annotations and environment variables to your deployment.yaml:
68+
69+
```yaml
70+
template:
71+
metadata:
72+
annotations:
73+
instrumentation.opentelemetry.io/inject-python: "true"
74+
```
75+
76+
```yaml
77+
- name: OTEL_EXPORTER_OTLP_ENDPOINT
78+
value: http://{{ .Values.service.name }}-collector.{{ .Release.namespace }}.svc.cluster.local:{{ .Values.metrics.endpoint.port}}
79+
- name: OTEL_EXPORTER_OTLP_INSECURE
80+
value: "true"
81+
```
82+
83+
### 5. Add Metric Checks
84+
Add metric checks to all manifests and deployment files:
85+
86+
```yaml
87+
{{ if .Values.metrics.enabled }}
88+
apiVersion: opentelemetry.io/v1alpha1
89+
kind: Instrumentation
90+
metadata:
91+
name: feast-instrumentation
92+
spec:
93+
exporter:
94+
endpoint: http://{{ .Values.service.name }}-collector.{{ .Release.Namespace }}.svc.cluster.local:4318
95+
env:
96+
propagators:
97+
- tracecontext
98+
- baggage
99+
python:
100+
env:
101+
- name: OTEL_METRICS_EXPORTER
102+
value: console,otlp_proto_http
103+
- name: OTEL_LOGS_EXPORTER
104+
value: otlp_proto_http
105+
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
106+
value: "true"
107+
{{end}}
108+
```
109+
110+
### 6. Add Required Manifests
111+
Add the following components to your chart:
112+
- Instrumentation
113+
- OpenTelemetryCollector
114+
- ServiceMonitors
115+
- Prometheus Instance
116+
- RBAC rules
117+
118+
### 7. Deploy Feast
119+
Deploy Feast with metrics enabled:
120+
121+
```bash
122+
helm install feast-release infra/charts/feast-feature-server --set metric=true --set feature_store_yaml_base64=""
123+
```
124+
125+
## Usage
126+
127+
To enable OpenTelemetry monitoring in your Feast deployment:
128+
129+
1. Set `metrics.enabled=true` in your Helm values
130+
2. Configure the OpenTelemetry Collector endpoint
131+
3. Deploy with proper annotations and environment variables
132+
133+
Example configuration:
134+
```yaml
135+
metrics:
136+
enabled: true
137+
otelCollector:
138+
endpoint: "otel-collector.default.svc.cluster.local:4317"
139+
```
140+
141+
## Monitoring
142+
143+
Once configured, you can monitor various metrics including:
144+
145+
- `feast_feature_server_memory_usage`: Memory utilization of the feature server
146+
- `feast_feature_server_cpu_usage`: CPU usage statistics
147+
- Additional custom metrics based on your configuration
148+
149+
These metrics can be visualized using Prometheus and other compatible monitoring tools.

0 commit comments

Comments
 (0)