Status	Active
Owner	HyperFleet Adapter Team
Last Updated	2026-02-24

HyperFleet Adapter Metrics - MVP

Overview
CloudEvent Data Structure
Metrics Format
Required Metrics (MVP)
Implementation Guidelines
Metrics Endpoint
Dashboard Queries (PromQL)
Alerting Rules (Examples)
Baseline Metrics (Expected Values)
Post-MVP Improvements
Implementation Checklist
References

Overview

This document defines the minimum set of metrics that all HyperFleet adapters must expose for observability. These metrics enable baseline measurement and identify areas for post-MVP improvement.

Related Documentation:

HyperFleet Metrics Standard - Cross-component metrics conventions
Adapter Framework Design - Framework architecture
adapter-observability-config-template.yaml - Observability configuration template
Adapter Deployment Guide - Deployment and operations

CloudEvent Data Structure

The adapter processes CloudEvents with the following structure:

specversion: "1.0"
type: "com.hyperfleet.nodepool.reconcile.v1"
source: "sentinel"
id: "00000000-0000-0000-0000-000000000000"
time: "2025-10-23T12:00:00Z"
datacontenttype: "application/json"
data:
  id: "11111111-1111-1111-1111-111111111111"
  kind: "NodePool"  # or "Cluster"
  href: "https://api.hyperfleet.com/v1/clusters/111.../nodepools/222..."
  generation: 5
  owner_references:
    id: "11111111-1111-1111-1111-111111111111"
    kind: "Cluster"
    href: "https://api.hyperfleet.com/v1/clusters/111..."

Key Fields for Metrics:

data.kind - Used as resource_kind label in metrics (e.g., "Cluster", "NodePool")
data.id - Resource identifier (not used in metrics to avoid high cardinality)
data.generation - Resource generation (not used in metrics to avoid high cardinality)

Metrics Format

Standard: Prometheus format (OpenMetrics compatible) Endpoint: /metrics Port: 9090 Protocol: HTTP

Required Labels: All metrics MUST include component and version labels as defined in the Metrics Standard.

For complete health and readiness endpoint standards, see Health Endpoints Specification.

Required Metrics (MVP)

1. Event Processing Metrics

`hyperfleet_adapter_events_processed_total`

Type: Counter
Purpose: Total number of CloudEvents processed by the adapter

Labels:

adapter_name - Name of the adapter (e.g., "validation", "dns")
resource_kind - Kind of resource being processed from event.data.kind (e.g., "Cluster", "NodePool")
status - Processing outcome: success, error, skipped

Example:

hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success"} 1523
hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="error"} 12
hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="skipped"} 89
hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="NodePool",status="success"} 342

Usage:

Track overall event throughput
Identify error rates
Measure skip frequency (preconditions not met)

`hyperfleet_adapter_event_processing_duration_seconds`

Type: Histogram
Purpose: Time taken to process a CloudEvent (end-to-end)

Labels:

adapter_name - Name of the adapter
resource_kind - Kind of resource being processed from event.data.kind
status - Processing outcome: success, error, skipped

Buckets: 0.1, 0.5, 1, 2, 5, 10, 30, 60, 120 (seconds)

Example:

hyperfleet_adapter_event_processing_duration_seconds_bucket{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success",le="0.5"} 0
hyperfleet_adapter_event_processing_duration_seconds_bucket{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success",le="1"} 5
hyperfleet_adapter_event_processing_duration_seconds_bucket{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success",le="5"} 142
hyperfleet_adapter_event_processing_duration_seconds_sum{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success"} 456.78
hyperfleet_adapter_event_processing_duration_seconds_count{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success"} 150

Usage:

Identify slow event processing
Track p50, p95, p99 latencies
Detect performance degradation

2. Resource Management Metrics

`hyperfleet_adapter_resources_created_total`

Type: Counter
Purpose: Total number of Kubernetes resources created by the adapter

Labels:

adapter_name - Name of the adapter
resource_type - Kubernetes resource kind (e.g., "Job", "Deployment", "ConfigMap")
status - Creation outcome: success, error

Example:

hyperfleet_adapter_resources_created_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_type="Job",status="success"} 45
hyperfleet_adapter_resources_created_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_type="ConfigMap",status="success"} 45
hyperfleet_adapter_resources_created_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_type="Job",status="error"} 2

Usage:

Track resource creation activity
Identify resource creation failures

`hyperfleet_adapter_resources_deleted_total`

Type: Counter
Purpose: Total number of Kubernetes resources deleted by the adapter

Labels:

adapter_name - Name of the adapter
resource_type - Kubernetes resource kind
status - Deletion outcome: success, error

Example:

hyperfleet_adapter_resources_deleted_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_type="Job",status="success"} 23
hyperfleet_adapter_resources_deleted_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_type="Job",status="error"} 1

Usage:

Track cleanup operations
Identify deletion failures

3. API Call Metrics

`hyperfleet_adapter_api_requests_total`

Type: Counter
Purpose: Total number of API calls made by the adapter

Labels:

adapter_name - Name of the adapter
api - API being called: hyperfleet, kubernetes, external
method - HTTP method: GET, POST, PATCH, DELETE
endpoint - API endpoint (sanitized, no IDs): e.g., /clusters/{id}, /statuses
status_code - HTTP status code: 200, 404, 500, etc.

Example:

hyperfleet_adapter_api_requests_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",api="hyperfleet",method="GET",endpoint="/clusters/{id}",status_code="200"} 1523
hyperfleet_adapter_api_requests_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",api="hyperfleet",method="POST",endpoint="/statuses",status_code="200"} 1487
hyperfleet_adapter_api_requests_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",api="kubernetes",method="POST",endpoint="/namespaces/{ns}/jobs",status_code="201"} 1432
hyperfleet_adapter_api_requests_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",api="kubernetes",method="GET",endpoint="/namespaces/{ns}/jobs/{name}",status_code="200"} 2145

Usage:

Track API call volume
Identify failed API calls
Monitor API usage patterns

`hyperfleet_adapter_api_request_duration_seconds`

Type: Histogram
Purpose: Time taken for API requests

Labels:

adapter_name - Name of the adapter
api - API being called
method - HTTP method
endpoint - API endpoint (sanitized)

Buckets: 0.01, 0.05, 0.1, 0.5, 1, 2, 5 (seconds)

Example:

hyperfleet_adapter_api_request_duration_seconds_bucket{component="adapter-validation",version="v1.0.0",adapter_name="validation",api="hyperfleet",method="GET",endpoint="/clusters/{id}",le="0.1"} 1200
hyperfleet_adapter_api_request_duration_seconds_bucket{component="adapter-validation",version="v1.0.0",adapter_name="validation",api="hyperfleet",method="GET",endpoint="/clusters/{id}",le="0.5"} 1500
hyperfleet_adapter_api_request_duration_seconds_sum{component="adapter-validation",version="v1.0.0",adapter_name="validation",api="hyperfleet",method="GET",endpoint="/clusters/{id}"} 156.78
hyperfleet_adapter_api_request_duration_seconds_count{component="adapter-validation",version="v1.0.0",adapter_name="validation",api="hyperfleet",method="GET",endpoint="/clusters/{id}"} 1523

Usage:

Identify slow API calls
Track API latency percentiles
Detect API performance issues

4. Precondition Metrics

`hyperfleet_adapter_preconditions_evaluated_total`

Type: Counter
Purpose: Total number of precondition evaluations

Labels:

adapter_name - Name of the adapter
precondition_name - Name of the precondition from config (e.g., "clusterStatus", "validationAvailable")
result - Evaluation result: pass, fail, error

Example:

hyperfleet_adapter_preconditions_evaluated_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",precondition_name="clusterStatus",result="pass"} 1523
hyperfleet_adapter_preconditions_evaluated_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",precondition_name="validationAvailable",result="fail"} 89
hyperfleet_adapter_preconditions_evaluated_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",precondition_name="quotaStatus",result="error"} 3

Usage:

Track precondition success/failure rates
Identify problematic preconditions
Monitor dependency health

5. Status Reporting Metrics

`hyperfleet_adapter_status_reports_total`

Type: Counter
Purpose: Total number of status reports sent to HyperFleet API

Labels:

adapter_name - Name of the adapter
status - Report outcome: success, error
applied - Applied condition value: true, false
available - Available condition value: true, false

Example:

hyperfleet_adapter_status_reports_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",status="success",applied="true",available="true"} 834
hyperfleet_adapter_status_reports_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",status="success",applied="true",available="false"} 612
hyperfleet_adapter_status_reports_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",status="success",applied="false",available="false"} 89
hyperfleet_adapter_status_reports_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",status="error",applied="false",available="false"} 7

Usage:

Track status reporting success rate
Monitor condition distribution
Identify reporting failures

6. Error Metrics

`hyperfleet_adapter_errors_total`

Type: Counter
Purpose: Total number of errors encountered by the adapter

Labels:

adapter_name - Name of the adapter
error_type - Error category: api_error, k8s_error, config_error, precondition_error, processing_error
error_component - Internal component where error occurred: event_processor, precondition_evaluator, resource_manager, status_reporter

Example:

hyperfleet_adapter_errors_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",error_type="api_error",error_component="precondition_evaluator"} 12
hyperfleet_adapter_errors_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",error_type="k8s_error",error_component="resource_manager"} 5
hyperfleet_adapter_errors_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",error_type="processing_error",error_component="event_processor"} 3

Usage:

Track overall error rates
Identify error patterns
Monitor adapter health

7. Workload Monitoring Metrics

`hyperfleet_adapter_workload_status_total`

Type: Counter
Purpose: Total number of workload status checks performed

Labels:

adapter_name - Name of the adapter
workload_type - Type of workload: Job, Deployment, StatefulSet
status - Workload status: running, succeeded, failed, unknown

Example:

hyperfleet_adapter_workload_status_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",workload_type="Job",status="running"} 412
hyperfleet_adapter_workload_status_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",workload_type="Job",status="succeeded"} 834
hyperfleet_adapter_workload_status_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",workload_type="Job",status="failed"} 23

Usage:

Track workload success/failure rates
Monitor workload execution patterns
Identify workload issues

8. Health Metrics

`hyperfleet_adapter_last_processed_timestamp_seconds`

Type: Gauge
Purpose: Unix timestamp of the last successfully processed event. Used as a "Dead Man's Switch" to detect if the adapter has silently stopped processing events.

Labels:

adapter_name - Name of the adapter

Example:

hyperfleet_adapter_last_processed_timestamp_seconds{component="adapter-validation",version="v1.0.0",adapter_name="validation"} 1698057600

Usage:

Detect broken broker connections
Identify "zombie" adapters that are running but not processing
Alert if timestamp is too old (e.g., > 5 minutes)

Implementation Guidelines

1. Metric Naming Convention

Follow Prometheus naming best practices and HyperFleet standards:

Use hyperfleet_adapter_ prefix for all adapter metrics (see Metrics Standard)
Use snake_case for metric names
Use descriptive names that indicate what is being measured
Use consistent label names across metrics

2. Label Best Practices

DO:

Use labels for dimensions that need to be filtered/aggregated
Keep label cardinality low (avoid unique IDs like cluster IDs)
Use consistent label names across metrics
Sanitize endpoint paths (replace IDs with {id}, {name}, etc.)

DON'T:

Don't use high-cardinality labels (e.g., timestamp, user ID, cluster ID)
Don't include sensitive information in labels
Don't use labels for data that changes frequently

Example of Sanitized Endpoints:

✅ Good: /clusters/{id}
❌ Bad:  /clusters/cls-abc123

✅ Good: /namespaces/{ns}/jobs/{name}
❌ Bad:  /namespaces/cluster-cls-123/jobs/validation-job-gen5

3. Metric Collection Points

// Example: Instrument event processing
func (a *Adapter) ProcessEvent(event CloudEvent) error {
    startTime := time.Now()

    // Process event
    err := a.processEventInternal(event)

    // Record metrics
    status := "success"
    if err != nil {
        status = "error"
    }

    // Metric: hyperfleet_adapter_events_processed_total
    a.metrics.eventsProcessed.WithLabelValues(
        a.config.Name,
        event.Data.Kind, // e.g., "Cluster", "NodePool"
        status,
    ).Inc()

    // Metric: hyperfleet_adapter_event_processing_duration_seconds
    a.metrics.eventDuration.WithLabelValues(
        a.config.Name,
        event.Data.Kind,
        status,
    ).Observe(time.Since(startTime).Seconds())

    return err
}

4. Histogram Bucket Configuration

Event Processing Duration:

Buckets: 0.1, 0.5, 1, 2, 5, 10, 30, 60, 120 seconds
Rationale: Events can range from quick skips (< 1s) to long workload monitoring (> 60s)

API Request Duration:

Buckets: 0.01, 0.05, 0.1, 0.5, 1, 2, 5 seconds
Rationale: API calls should be fast, most completing in < 1s

5. Metric Export

Prometheus Format:

# HELP hyperfleet_adapter_events_processed_total Total number of CloudEvents processed by the adapter
# TYPE hyperfleet_adapter_events_processed_total counter
hyperfleet_adapter_events_processed_total{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success"} 1523

# HELP hyperfleet_adapter_event_processing_duration_seconds Time taken to process a CloudEvent
# TYPE hyperfleet_adapter_event_processing_duration_seconds histogram
hyperfleet_adapter_event_processing_duration_seconds_bucket{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success",le="0.1"} 0
hyperfleet_adapter_event_processing_duration_seconds_bucket{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success",le="0.5"} 5
hyperfleet_adapter_event_processing_duration_seconds_bucket{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success",le="+Inf"} 150
hyperfleet_adapter_event_processing_duration_seconds_sum{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success"} 456.78
hyperfleet_adapter_event_processing_duration_seconds_count{component="adapter-validation",version="v1.0.0",adapter_name="validation",resource_kind="Cluster",status="success"} 150

Metrics Endpoint

Health and Metrics Endpoints

Health Endpoints (Port 8080):

GET /healthz - Liveness probe, returns 200 OK if adapter is alive
GET /readyz - Readiness probe, returns 200 OK if adapter is ready to serve traffic

Metrics Endpoint (Port 9090):

GET /metrics - Returns Prometheus-formatted metrics

Example Service Configuration

apiVersion: v1
kind: Service
metadata:
  name: validation-adapter
  namespace: hyperfleet-system
  labels:
    app: validation-adapter
spec:
  selector:
    app: validation-adapter
  ports:
    - name: health
      port: 8080
      targetPort: 8080
      protocol: TCP
    - name: metrics
      port: 9090
      targetPort: 9090
      protocol: TCP
  type: ClusterIP

Example ServiceMonitor (Prometheus Operator)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: validation-adapter
  namespace: hyperfleet-system
  labels:
    app: validation-adapter
spec:
  selector:
    matchLabels:
      app: validation-adapter
  endpoints:
    - port: metrics
      path: /metrics
      interval: 30s
      scrapeTimeout: 10s

Dashboard Queries (PromQL)

Event Processing Rate

# Events processed per second (by status)
rate(hyperfleet_adapter_events_processed_total[5m])

# Success rate percentage
(
  sum(rate(hyperfleet_adapter_events_processed_total{status="success"}[5m]))
  /
  sum(rate(hyperfleet_adapter_events_processed_total[5m]))
) * 100

Event Processing Latency

# p95 event processing time
histogram_quantile(0.95,
  rate(hyperfleet_adapter_event_processing_duration_seconds_bucket[5m])
)

# Average event processing time
rate(hyperfleet_adapter_event_processing_duration_seconds_sum[5m])
/
rate(hyperfleet_adapter_event_processing_duration_seconds_count[5m])

Resource Creation Rate

# Resources created per minute
sum(rate(hyperfleet_adapter_resources_created_total{status="success"}[5m])) * 60

# Resource creation success rate
(
  sum(rate(hyperfleet_adapter_resources_created_total{status="success"}[5m]))
  /
  sum(rate(hyperfleet_adapter_resources_created_total[5m]))
) * 100

API Call Performance

# p99 API latency by endpoint
histogram_quantile(0.99,
  sum by(endpoint, le) (rate(hyperfleet_adapter_api_request_duration_seconds_bucket[5m]))
)

# API error rate by endpoint
sum by(endpoint) (rate(hyperfleet_adapter_api_requests_total{status_code=~"5.."}[5m]))

Precondition Pass Rate

# Precondition pass rate percentage
(
  sum(rate(hyperfleet_adapter_preconditions_evaluated_total{result="pass"}[5m]))
  /
  sum(rate(hyperfleet_adapter_preconditions_evaluated_total[5m]))
) * 100

Error Rate

# Total error rate
sum(rate(hyperfleet_adapter_errors_total[5m]))

# Error rate by adapter deployment (component label)
sum by(component) (rate(hyperfleet_adapter_errors_total[5m]))

# Error rate by internal error source (error_component label)
sum by(error_component) (rate(hyperfleet_adapter_errors_total[5m]))

Alerting Rules (Examples)

Silent Failure (Dead Man's Switch)

- alert: AdapterNotProcessing
  expr: |
    (time() - hyperfleet_adapter_last_processed_timestamp_seconds) > 300
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Adapter {{ $labels.adapter_name }} has stopped processing events"
    description: "Last successful event was processed {{ $value | humanizeDuration }} ago (threshold: 5m)"

High Error Rate

- alert: AdapterHighErrorRate
  expr: |
    (
      sum(rate(hyperfleet_adapter_events_processed_total{status="error"}[5m]))
      /
      sum(rate(hyperfleet_adapter_events_processed_total[5m]))
    ) > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Adapter {{ $labels.adapter_name }} has high error rate"
    description: "Error rate is {{ $value | humanizePercentage }} (threshold: 5%)"

Slow Event Processing

- alert: AdapterSlowEventProcessing
  expr: |
    histogram_quantile(0.95,
      rate(hyperfleet_adapter_event_processing_duration_seconds_bucket[5m])
    ) > 60
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Adapter {{ $labels.adapter_name }} is processing events slowly"
    description: "p95 processing time is {{ $value }}s (threshold: 60s)"

API Errors

- alert: AdapterHighAPIErrorRate
  expr: |
    (
      sum(rate(hyperfleet_adapter_api_requests_total{status_code=~"5.."}[5m]))
      /
      sum(rate(hyperfleet_adapter_api_requests_total[5m]))
    ) > 0.01
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Adapter {{ $labels.adapter_name }} has high API error rate"
    description: "API error rate is {{ $value | humanizePercentage }} (threshold: 1%)"

Baseline Metrics (Expected Values)

These are rough estimates for baseline metrics to help identify anomalies:

Metric	Expected Range	Notes
Event processing duration (p95)	2-10s	For events with preconditions met
Event processing duration (p95, skipped)	< 1s	For events skipped due to preconditions
API request duration (p95)	100-500ms	For HyperFleet API calls
Kubernetes API duration (p95)	50-200ms	For K8s resource operations
Event success rate	> 95%	Percentage of successfully processed events
Precondition pass rate	60-80%	Many events skipped due to preconditions
Resource creation success rate	> 98%	K8s resource creation should rarely fail
Status report success rate	> 99%	Status reporting should be very reliable

Note: These are initial estimates. Actual baselines should be established during MVP testing.

Post-MVP Improvements

After establishing baselines, consider these additional metrics:

Detailed Workload Metrics:
- Job execution time distribution
- Pod restart counts
- Container failure reasons
CEL Expression Metrics:
- Expression evaluation time
- Expression evaluation errors
- Expression cache hit rate
Message Broker Metrics:
- Message acknowledgment latency
- Message redelivery count
- Queue depth
Memory and CPU Metrics:
- Heap memory usage
- GC pause time
- CPU utilization
Business Metrics:
- Clusters processed by phase
- Adapter availability by cluster
- Generation lag (event generation vs processed generation)

Implementation Checklist

For each adapter, ensure:

All required metrics are implemented
Metrics endpoint is exposed on /metrics
ServiceMonitor is configured for Prometheus scraping
Labels follow naming conventions
Endpoint paths are sanitized (no high-cardinality values)
Histogram buckets are appropriate for the metric
Basic alerting rules are configured
Grafana dashboard is created for the adapter

References

HyperFleet Metrics Standard - Cross-component metrics conventions
Prometheus Best Practices - Metric naming conventions
Prometheus Go Client - Go client library
OpenMetrics Specification - Metrics format specification

FilesExpand file tree

adapter-metrics.md

Latest commit

History

adapter-metrics.md

File metadata and controls

HyperFleet Adapter Metrics - MVP

Table of Contents

Overview

CloudEvent Data Structure

Metrics Format

Required Metrics (MVP)

1. Event Processing Metrics

hyperfleet_adapter_events_processed_total

hyperfleet_adapter_event_processing_duration_seconds

2. Resource Management Metrics

hyperfleet_adapter_resources_created_total

hyperfleet_adapter_resources_deleted_total

3. API Call Metrics

hyperfleet_adapter_api_requests_total

hyperfleet_adapter_api_request_duration_seconds

4. Precondition Metrics

hyperfleet_adapter_preconditions_evaluated_total

5. Status Reporting Metrics

hyperfleet_adapter_status_reports_total

6. Error Metrics

hyperfleet_adapter_errors_total

7. Workload Monitoring Metrics

hyperfleet_adapter_workload_status_total

8. Health Metrics

hyperfleet_adapter_last_processed_timestamp_seconds

Implementation Guidelines

1. Metric Naming Convention

2. Label Best Practices

3. Metric Collection Points

4. Histogram Bucket Configuration

5. Metric Export

Metrics Endpoint

Health and Metrics Endpoints

Example Service Configuration

Example ServiceMonitor (Prometheus Operator)

Dashboard Queries (PromQL)

Event Processing Rate

Event Processing Latency

Resource Creation Rate

API Call Performance

Precondition Pass Rate

Error Rate

Alerting Rules (Examples)

Silent Failure (Dead Man's Switch)

High Error Rate

Slow Event Processing

API Errors

Baseline Metrics (Expected Values)

Post-MVP Improvements

Implementation Checklist

References

`hyperfleet_adapter_events_processed_total`

`hyperfleet_adapter_event_processing_duration_seconds`

`hyperfleet_adapter_resources_created_total`

`hyperfleet_adapter_resources_deleted_total`

`hyperfleet_adapter_api_requests_total`

`hyperfleet_adapter_api_request_duration_seconds`

`hyperfleet_adapter_preconditions_evaluated_total`

`hyperfleet_adapter_status_reports_total`

`hyperfleet_adapter_errors_total`

`hyperfleet_adapter_workload_status_total`

`hyperfleet_adapter_last_processed_timestamp_seconds`