PrometheusRule Alerting Rules

Add a `PrometheusRule` CRD with baseline alerting rules.

**Alerts:**

```yaml
groups:
- name: wva.rules
  rules:
  - alert: WVAHighErrorRate
    expr: rate(wva_errors_total[5m]) > 0.1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "WVA error rate elevated"

  - alert: WVAOptimizationLoopStalled
    expr: rate(wva_models_processed_total[10m]) == 0
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: "WVA optimization loop has stopped processing models"

  - alert: WVAMetricsCollectionFailing
    expr: rate(wva_metrics_collection_errors_total[5m]) > 0.5
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "WVA metrics collection failing frequently"

  - alert: WVAGPUResourceExhausted
    expr: wva_available_gpus == 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "No GPUs available for WVA scaling"

  - alert: WVAReplicaScalingThrashing
    expr: rate(wva_replica_scaling_total[10m]) > 2
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "WVA scaling decisions changing rapidly (possible thrashing)"
```

**Implementation:**
- Add `PrometheusRule` CRD in `config/prometheus/`
- Include in Helm chart as optional (enabled via `values.yaml` flag)
- Thresholds should be configurable in Helm values

**Acceptance Criteria:**
- [ ] PrometheusRule CRD deploys with `make deploy` and Helm chart
- [ ] Alerts fire correctly when conditions are met (manual verification)
- [ ] Helm chart allows enabling/disabling and threshold overrides

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PrometheusRule Alerting Rules #919

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PrometheusRule Alerting Rules #919

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions