Use recordings for postgres_mixin/alerts

## Proposal

Use recording rules for postgres_mixin/alerts

### Explanation

Evaluating complex Prometheus alerts directly on raw metrics without recording rules can be very resource-intensive, especially in production environments with multiple PostgreSQL instances. This can lead to slow queries in Prometheus itself, delayed alerts, and unnecessary load on the exporter and database.

Benefits of Recording Rules:
- Precompute expensive queries once, store results, and reference them in multiple alerts.
- Reduce Prometheus CPU and memory usage.
- Enable more stable and reliable alerting.
- Simplify complex alert expressions and reduce duplication.

An example:

```yaml
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: postgres-alerts
  namespace: monitoring
  labels:
    app: postgres
spec:
  groups:
  - name: postgres-recording
    interval: 30s
    rules:

      # Transactions per second
      - record: pg:transactions:rate5m
        expr: |
          sum by (instance, datname) (
            irate(pg_stat_database_xact_commit{job="postgres"}[5m]) +
            irate(pg_stat_database_xact_rollback{job="postgres"}[5m])
          )

      # Cache hit ratio
      - record: pg:cache_hit_ratio:5m
        expr: |
          sum by (instance, datname) (
            rate(pg_stat_database_blks_hit{job="postgres"}[5m])
          )
          /
          sum by (instance, datname) (
            rate(pg_stat_database_blks_hit{job="postgres"}[5m]) +
            rate(pg_stat_database_blks_read{job="postgres"}[5m])
          )

      # Connection usage
      - record: pg:connections:used_ratio
        expr: |
          sum by (instance) (pg_stat_activity_count{job="postgres"})
          /
          sum by (instance) (pg_settings_max_connections{job="postgres"})

      # Rollback ratio
      - record: pg:rollback_ratio:5m
        expr: |
          sum by (instance, datname) (
            rate(pg_stat_database_xact_rollback{job="postgres"}[5m])
          )
          /
          sum by (instance, datname) (
            rate(pg_stat_database_xact_commit{job="postgres"}[5m]) +
            rate(pg_stat_database_xact_rollback{job="postgres"}[5m])
          )

      # Deadlocks per minute
      - record: pg:deadlocks:rate5m
        expr: |
          sum by (instance, datname) (
            rate(pg_stat_database_deadlocks{job="postgres"}[5m]) * 60
          )

      # Replication lag
      - record: pg:replication_lag
        expr: |
          max by (instance) (
            pg_replication_lag{job="postgres"}
          )

      # Timed Checkpoints
      - record: pg:checkpoints_timed_ratio:5m
        expr: |
          sum by (instance) (
            rate(pg_stat_bgwriter_checkpoints_timed_total{job="postgres"}[5m])
          )
          /
          sum by (instance) (
            rate(pg_stat_bgwriter_checkpoints_timed_total{job="postgres"}[5m]) +
            rate(pg_stat_bgwriter_checkpoints_req_total{job="postgres"}[5m])
          )

  - name: postgres-alerts
    rules:

    - alert: PostgreSQLDown
      expr: max by (instance) (pg_up{job="postgres"}) != 1
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "PostgreSQL is down"
        description: "{{ $labels.instance }} is not responding"

    - alert: PostgreSQLMaxConnectionsReached
      expr: pg:connections:used_ratio > 0.95
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "Postgres connections count is over the maximum amount."
        description: "{{ $labels.instance }} exceeds max connections"

    - alert: PostgreSQLHighConnections
      expr: pg:connections:used_ratio > 0.8
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "High connection usage"
        description: "{{ $labels.instance }} > 80% connections used"

    - alert: PostgreSQLQPS
      expr: pg:transactions:rate5m > 10000
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High QPS"
        description: "{{ $labels.instance }} DB {{ $labels.datname }} high throughput"

    - alert: PostgreSQLCacheHitRatioLow
      expr: pg:cache_hit_ratio:5m < 0.98
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Low cache hit ratio"
        description: "{{ $labels.instance }} DB {{ $labels.datname }} cache hit ratio low"

    - alert: PostgresTooManyRollbacks
      expr: pg:rollback_ratio:5m > 0.1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Too many rollbacks"
        description: "{{ $labels.instance }} high rollback ratio"

    - alert: PostgresHighDeadlocks
      expr: pg:deadlocks:rate5m > 5
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Deadlocks detected"
        description: "{{ $labels.instance }} DB {{ $labels.datname }} high deadlock rate"

    - alert: PostgresReplicationLag
      expr: pg:replication_lag > 300
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Replication lag"
        description: "{{ $labels.instance }} replication lag > 5 min"

    - alert: PostgresExporterErrors
      expr: pg_exporter_last_scrape_error{job="postgres"} > 0
      for: 30m
      labels:
        severity: critical
      annotations:
        summary: "Exporter errors"
        description: "{{ $labels.instance }} exporter failing"

    - alert: PostgresTooManyCheckpointsRequested
      expr: pg:checkpoints_timed_ratio:5m < 0.5
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Too many checkpoints requested"
        description: "{{ $labels.instance }} checkpoint pressure high"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use recordings for postgres_mixin/alerts #1278

Proposal

Explanation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use recordings for postgres_mixin/alerts #1278

Description

Proposal

Explanation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions