Skip to content

Latest commit

 

History

History
334 lines (265 loc) · 6.37 KB

File metadata and controls

334 lines (265 loc) · 6.37 KB

Monitoring

Monitor NornicDB health, performance, and security.

Endpoints

Endpoint Auth Required Description
/health No Basic health check
/status Yes Detailed status
/metrics Yes Prometheus metrics

Health Check

Basic Health

curl http://localhost:7474/health

Response:

{
  "status": "healthy"
}

Kubernetes Probes

livenessProbe:
  httpGet:
    path: /health
    port: 7474
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /health
    port: 7474
  initialDelaySeconds: 5
  periodSeconds: 5

Status Endpoint

Detailed Status

curl http://localhost:7474/status \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "status": "healthy",
  "server": {
    "version": "0.1.4",
    "uptime": "24h15m30s",
    "started_at": "2024-12-01T00:00:00Z"
  },
  "database": {
    "nodes": 150000,
    "edges": 450000,
    "data_size": "2.5GB"
  },
  "embeddings": {
    "enabled": true,
    "provider": "ollama",
    "model": "mxbai-embed-large",
    "pending": 0
  }
}

Prometheus Metrics

Enable Metrics

Metrics are available at /metrics (requires authentication).

curl http://localhost:7474/metrics \
  -H "Authorization: Bearer $TOKEN"

Available Metrics

# Request metrics
nornicdb_http_requests_total{method="GET",path="/health",status="200"} 1234
nornicdb_http_request_duration_seconds{method="POST",path="/db/nornicdb/tx/commit"} 0.045

# Database metrics
nornicdb_nodes_total 150000
nornicdb_edges_total 450000
nornicdb_storage_bytes 2684354560

# Query metrics
nornicdb_query_duration_seconds{type="cypher"} 0.023
nornicdb_queries_total{type="cypher",status="success"} 5678

# Embedding metrics
nornicdb_embeddings_pending 0
nornicdb_embeddings_processed_total 10000
nornicdb_embedding_duration_seconds 0.15

# Rate limiting
nornicdb_rate_limit_hits_total 42

Prometheus Configuration

# prometheus.yml
scrape_configs:
  - job_name: 'nornicdb'
    static_configs:
      - targets: ['localhost:7474']
    bearer_token: 'your-auth-token'
    metrics_path: '/metrics'

Grafana Dashboard

Example Dashboard JSON

{
  "title": "NornicDB",
  "panels": [
    {
      "title": "Request Rate",
      "type": "graph",
      "targets": [
        {
          "expr": "rate(nornicdb_http_requests_total[5m])"
        }
      ]
    },
    {
      "title": "Query Duration",
      "type": "graph",
      "targets": [
        {
          "expr": "histogram_quantile(0.95, nornicdb_query_duration_seconds)"
        }
      ]
    },
    {
      "title": "Node Count",
      "type": "stat",
      "targets": [
        {
          "expr": "nornicdb_nodes_total"
        }
      ]
    }
  ]
}

Alerting

Prometheus Alerts

# alerts.yml
groups:
  - name: nornicdb
    rules:
      - alert: NornicDBDown
        expr: up{job="nornicdb"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "NornicDB is down"

      - alert: HighErrorRate
        expr: rate(nornicdb_http_requests_total{status=~"5.."}[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"

      - alert: SlowQueries
        expr: histogram_quantile(0.95, nornicdb_query_duration_seconds) > 1
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Slow queries detected"

      - alert: RateLimitHits
        expr: rate(nornicdb_rate_limit_hits_total[5m]) > 10
        for: 5m
        labels:
          severity: info
        annotations:
          summary: "Rate limiting active"

Logging

Log Levels

# Set log level
export NORNICDB_LOG_LEVEL=info  # debug, info, warn, error

Log Format

{
  "time": "2024-12-01T10:30:00.123Z",
  "level": "info",
  "msg": "Query executed",
  "query_type": "cypher",
  "duration_ms": 23,
  "rows": 100
}

Log Aggregation

# Docker logging
logging:
  driver: "fluentd"
  options:
    fluentd-address: "localhost:24224"
    tag: "nornicdb"

Performance Monitoring

Query Performance

# Enable query logging
nornicdb serve --log-queries

Search Timing Diagnostics

Enable detailed search timing logs when tuning search latency:

# Search-service stage timing (vector/BM25/fusion/candidates)
export NORNICDB_SEARCH_LOG_TIMINGS=true

# HTTP handler timing breakdown (embed_total/search_total/embed_calls)
export NORNICDB_SEARCH_DIAG_TIMINGS=true

You will see two complementary log lines per search:

  • ⏱️ Search timing: stage-level search-service timings (vector_ms, bm25_ms, fusion_ms, candidate counts, fallback).
  • 🔎 Search timing db=...: request-path timings (embed_total, search_total, embed_calls, chunk info).

Field reference (Apple M3 Max, 64GB RAM, Feb 2026):

  • Embedding-query path (best collected):
    • Sequential varied queries: p50 11.28ms, p95 25.84ms
    • Concurrent (8 workers): p50 76.36ms, p95 87.41ms
    • Typical diagnostic pattern: embed_total dominates request time.
  • Fulltext-only path (best collected):
    • Sequential varied queries: p50 0.57ms, p95 2.77ms
    • Diagnostic pattern: embed_calls=0, embed_total=0s, handler internal timing in tens of microseconds.

Slow Query Log

{
  "level": "warn",
  "msg": "Slow query",
  "query": "MATCH (n)-[r*1..5]->(m) RETURN n, r, m",
  "duration_ms": 1500,
  "threshold_ms": 1000
}

Security Monitoring

Failed Login Alerts

Failed logins are logged and can trigger alerts:

{
  "level": "warn",
  "msg": "Login failed",
  "username": "admin",
  "ip": "192.168.1.100",
  "reason": "invalid_password"
}

Audit Log Monitoring

# Monitor audit log for security events
tail -f /var/log/nornicdb/audit.log | jq 'select(.type == "LOGIN_FAILED")'

Health Check Script

#!/bin/bash
# health-check.sh

HEALTH=$(curl -s http://localhost:7474/health)
STATUS=$(echo $HEALTH | jq -r '.status')

if [ "$STATUS" != "healthy" ]; then
  echo "NornicDB unhealthy: $HEALTH"
  exit 1
fi

echo "NornicDB healthy"
exit 0

See Also