Skip to content

Latest commit

 

History

History
1296 lines (966 loc) · 42.1 KB

File metadata and controls

1296 lines (966 loc) · 42.1 KB

pgwarden

A deterministic, policy‑enforcing PostgreSQL access proxy with an optional control plane and AI drift‑detection signals


1. Project Overview

pgwarden is a PostgreSQL wire‑protocol proxy that enforces least‑privilege, context‑aware access to sensitive data at the database boundary rather than inside application code.

It is designed for environments where:

  • multiple workloads (apps, developers, AI systems) access the same database
  • PII exposure must be strictly controlled
  • auditability and reproducibility matter
  • policy enforcement must survive infrastructure restarts

pgwarden is shipped as a single product name (“pgwarden”) for Docker and Kubernetes distribution. Internally, it is composed of orthogonal services that can be deployed together or separately:

  1. Data Plane (Proxy / Enforcement Layer)
  2. Control Plane (Policy Definition & Compilation)
  3. Auth Plane (Identity / Session Attestation)
  4. Signal Plane (Audit, Drift & Anomaly Detection – optional; orthogonal)

Deterministic enforcement is always the source of truth. ML‑based systems never gate access.


2. Design Principles

  • Infrastructure‑level enforcement over application‑level conventions
  • Deterministic, auditable behavior for all access decisions
  • Zero trust by default between workloads
  • No query rewriting and no ORM requirements
  • No raw PII stored outside the upstream database
  • Stateless data plane, stateful control plane
  • AI systems treated as first‑class but constrained actors

3. Data Plane: Postgres Access Proxy (Enforcement Layer)

3.1 Responsibilities

The data plane is a PostgreSQL‑compatible wire‑protocol proxy that:

  • Accepts inbound Postgres connections via multiple DSNs
  • Enforces Postgres TLS on ingress by default (sslmode=require compatible); plaintext may be explicitly disabled for dev only
  • Optionally enforces mTLS on ingress
  • Maps each DSN to a connection context (e.g. prod app, dev, AI)
  • Authenticates clients (inbound) and terminates authentication at the proxy
  • Connects upstream using proxy‑managed credentials mapped to the DSN context
  • Enforces visibility rules using Postgres roles, schemas, and views
  • Emits structured audit metadata for every session

The proxy does not:

  • parse or rewrite SQL
  • store query contents
  • make probabilistic access decisions

3.2 Context‑Bound DSNs

Each inbound DSN represents a policy surface, not a database.

Examples:

  • app_rw
  • app_ro
  • developer_sanitized
  • ai_inference

Each DSN maps to:

  • a specific Postgres role
  • a constrained schema/view set
  • fixed session defaults
  • optional WardenSense enablement (default OFF)
  • optional join‑leakage allowance (default OFF)

3.2.1 Security posture

In v1, authorization is primarily DSN/context‑based.

  • TLS is required by default; mTLS is optional.
  • Client username/password may still be required for the inbound connection, but privileges are determined by DSN context.

Upstream credential isolation (v1):

  • The proxy terminates inbound auth.
  • The proxy does not embed upstream DB credentials in application repos.
  • The proxy obtains authorization to connect upstream using a short‑lived token issued by the control plane.

This reduces the blast radius of credential leakage from source repos.

3.3 Enforcement Model

Access control is enforced by:

  • role‑based permissions
  • schema isolation
  • curated views for sensitive fields
  • column masking at the view level (policy‑selectable)

Masking strategies (v1):

  • partial reveal (parameterized): reveal N characters, mask the rest with *.
    • parameters may differ by field type (e.g., reveal last 4 digits for phone/account)
    • should support separate handling for letters vs digits
  • default value (static placeholder): e.g., "John Smith", "(555) 555-5555"

No salts/keys are required for these strategies.

The proxy ensures clients cannot escape their assigned context, even if client credentials leak.

3.4 Credential rotation (v1)

Credential and certificate rotation is manual and declarative in v1:

  • to rotate upstream DB credentials (stored in control plane), update them via admin portal and redeploy/reload as needed
  • to rotate proxy TLS/mTLS materials, reprovision certs and reload

3.5 Upstream leases, pooling, and seamless refresh (v1)

Upstream credentials are never embedded into application repos. Instead, the proxy obtains authorization to create upstream connections via short‑lived leases issued by the control plane.

Lease model (connection-creation only):

  • Leases are used only to authorize creation of upstream connections (not per query).
  • The proxy maintains an upstream pool per DSN.

Defaults (global; per‑DSN override):

  • lease_ttl: 30m
  • lease_refresh_window: 5m
  • lease_refresh_jitter: 20% (randomized)
  • pool.max_upstream_conns: 20
  • pool.min_idle_upstream_conns: 2
  • pool.idle_upstream_timeout: 5m
  • pool.max_conn_lifetime: 30m

Seamless refresh:

  • Proxy refreshes leases proactively within the refresh window.
  • Proxy swaps active_lease to the newly fetched lease.
  • Reseed strategy: after swapping, the proxy proactively creates upstream connections to restore min_idle_upstream_conns using the new lease.
  • Pool draining: upstream connections created under the previous lease are marked draining:
    • not used for new sessions
    • closed when idle or when max_conn_lifetime is reached

Failure behavior:

  • If the control plane is unreachable, existing upstream connections continue serving traffic.
  • New upstream connections require a valid lease; if none is available, the proxy fails closed for new upstream creation while maintaining existing sessions.

4. Control Plane: Policy Definition & Compilation

4.1 Purpose

The control plane exists to make database access policies:

  • easy to define
  • hard to bypass
  • reproducible
  • durable across restarts and redeployments

It coexists with existing production database administration.

  • pgwarden is not the sole authority for all production changes.
  • However, pgwarden is authoritative for pgwarden‑managed artifacts (roles/grants/views it creates or owns).

The control plane does not replace SQL and does not expose raw database access to operators.

4.2 Core Responsibilities

  • Manage multiple upstream databases from a single control plane
  • Manage inbound DSNs per database
  • Define context‑aware access policies
  • Compile policies into deterministic Postgres artifacts
  • Securely manage credentials and secrets

4.3 Policy Model (Conceptual)

Policies describe:

  • Who (context / workload)
  • What (schemas, tables, views)
  • How (read/write, masking, row filtering)
  • Where (target database)

Policies are declarative and compiled into:

  • Postgres roles
  • grants/revocations
  • generated views

4.4 Compilation Flow

  1. Operator defines or updates a policy
  2. Control plane validates policy constraints
  3. Policy is compiled into deterministic Postgres artifacts
  4. Artifacts are applied idempotently to the target database(s)
  5. Proxy reloads mappings without downtime

Compiler permissions (explicit):

  • May create and manage roles (pgwarden‑scoped)
  • May grant / revoke privileges
  • May drop / replace pgwarden‑managed views

Coexistence rule: pgwarden must avoid clobbering non‑pgwarden objects. In practice, this is achieved by a clear ownership boundary (naming convention + metadata table) and by only mutating objects it owns.

4.5 State & Persistence

  • Policies are stored centrally (e.g., Postgres / etcd / CRD‑like store)
  • Secrets are never logged
  • The data plane can restart safely using the last known good compiled state

4.6 Artifact Ownership & Reconciliation

pgwarden must coexist with normal DB operations while remaining authoritative for pgwarden‑managed objects.

Ownership boundary (recommended):

  • Naming convention for managed objects, e.g. pgw_ prefix for roles/views/schemas
  • Optional metadata catalog table (recommended) to record:
    • target id
    • object name/type
    • desired definition hash
    • last applied timestamp
    • last applied by deployment id

Reconcile behavior (desired):

  • If pgwarden‑managed objects are modified manually, reconcile should overwrite back to the declared/compiled state.
  • Reconcile must avoid mutating objects it does not own.

Rationale: overwrite‑back provides the least surprising path to “working state” for managed objects, while still allowing coexistence for everything else.


5. Audit & Observability

5.1 Deterministic Audit Signals

For every session, pgwarden emits structured metadata such as:

  • context / DSN
  • role
  • source identity
  • connection timing
  • statement counts
  • error rates

No query payloads or PII are recorded.

5.2 Compliance & Forensics

Audit data is designed to support:

  • access reviews
  • incident response
  • compliance reporting

6. WardenSense: Activity Drift & Anomaly Detection (Optional, Orthogonal)

WardenSense is pgwarden’s optional ML/heuristics signal service. It is toggle‑able per DSN connection context and is OFF by default.

6.1 Purpose

WardenSense detects unexpected patterns in database access behavior—especially from AI‑driven workloads—without influencing enforcement.

It exists solely to:

  • surface early warnings
  • reduce mean‑time‑to‑investigation

6.2 Deployment Shape

WardenSense runs as its own binary/service with its own database.

  • It consumes pgwarden audit events (via telemetry sink)
  • It stores derived features, baselines, and alert state in its own persistence layer

6.3 Inputs

Signals are derived exclusively from deterministic audit metadata, such as:

  • request/session rate changes
  • time‑of‑day shifts
  • schema/DSN access expansion
  • error pattern changes

6.4 Constraints

  • No raw queries
  • No PII
  • No inline blocking
  • No automated policy mutation

6.5 Outputs

  • Alerts
  • dashboards
  • forensic annotations

6.6 Relationship to Enforcement

WardenSense never gates access. Policy changes are always deterministic and human‑controlled.

All access control remains policy‑driven and deterministic.

6.7 DSN‑Scoped Toggle (Default OFF)

WardenSense is enabled per DSN connection context:

  • default: enabled=false
  • configurable: enabled=true for specific DSNs (e.g., ai_inference)

Idiomatic routing:

  • Proxy emits all audit events to OTLP.
  • Each event includes attributes: dsn_name, context_id, and wardensense_enabled.
  • OTel Collector routes events to WardenSense only when wardensense_enabled=true, while still exporting all events to the control plane event store.

7. Threat Model Summary

Protected against:

  • accidental PII exposure
  • credential reuse across contexts
  • over‑privileged AI workloads
  • lateral access escalation

Not intended to protect against:

  • malicious superusers
  • compromised upstream databases

8. Non‑Goals

  • Query rewriting or SQL linting
  • ORM replacement
  • Data loss prevention inside queries
  • Automated policy learning

9. Deployment Model (High Level)

  • pgwarden proxy deployed as stateless service
  • Control plane deployed as stateful service
  • Optional signal processors consume audit streams

Works with:

  • Kubernetes
  • Nomad
  • VM‑based deployments

9.1 Graceful Shutdown & Stateless Container Semantics

9.2 Operational Semantics (Authoritative)

This section defines non-negotiable runtime behavior for pgwarden components. These rules exist to prevent orchestrator-induced outages and data loss.

9.2.1 Last-known-good (LKG) configuration (Option A – selected)

The pgwarden proxy MUST persist a last-known-good (LKG) runtime configuration locally and treat it as a cache, not a source of truth.

Characteristics:

  • LKG is written only after successful config validation
  • LKG is immutable until replaced by a newer valid config
  • LKG is never mutated incrementally

Persistence mechanisms (any one acceptable):

  • Mounted volume (Docker volume / Kubernetes emptyDir or PVC)
  • Serialized snapshot derived from ConfigMap
  • Encrypted local file on disk

Guarantees:

  • Proxy can start and serve traffic without control plane availability
  • Proxy never starts with "zero config"
  • Control plane remains the sole authority for desired state

9.2.2 Startup ordering

Proxy startup rules:

  1. Proxy starts independently of control plane availability
  2. Proxy loads LKG configuration from local persistence
  3. Proxy binds listener sockets and prepares TLS materials
  4. Proxy enters readiness state if LKG is valid
  5. Proxy begins polling control plane opportunistically

Hard rule:

The proxy MUST NOT block startup on control plane connectivity.


9.2.3 Readiness vs liveness semantics

pgwarden exposes explicit endpoints for liveness and readiness probes.

9.2.3.1 Probe endpoints (per component)

Proxy

  • GET /livez — liveness only
  • GET /readyz — readiness only
  • GET /healthz — convenience endpoint (same as /readyz for the proxy)

Control plane

  • GET /livez
  • GET /readyz
  • GET /healthz — convenience endpoint (same as /readyz)

WardenSense

  • GET /livez
  • GET /readyz
  • GET /healthz — convenience endpoint (same as /readyz)

Response contract (all components):

  • 200 OK with a short JSON body (e.g., { "status": "ok" }) when healthy
  • 503 Service Unavailable when not ready (for /readyz) or not live (for /livez)

Critical shutdown rule:

  • On receipt of SIGTERM, /readyz MUST immediately return 503 for the remainder of the process lifetime, even while /livez continues to return 200 during graceful drain.

These endpoints MUST NOT leak secrets, policy contents, or upstream connection strings.

Liveness probe ("should this process be restarted?")

Liveness MUST return true if:

  • process is running
  • event loop is responsive
  • configuration subsystem is loaded (even LKG)

Liveness MUST NOT depend on:

  • control plane reachability
  • upstream database availability
  • lease availability
  • WardenSense availability

Liveness indicates only irrecoverable failure.


Readiness probe ("should this pod receive traffic?")

Readiness MUST return true only if:

  • configuration is loaded and valid (current or LKG)
  • TLS materials are loaded and valid
  • listener socket is bound

Readiness MUST return false when:

  • process receives SIGTERM
  • configuration validation fails
  • TLS materials are invalid or expired

Readiness MUST NOT depend on:

  • control plane reachability
  • WardenSense health

9.2.4 Degraded-mode behavior

When the control plane is unavailable:

  • Existing client sessions continue
  • New inbound client connections are allowed only if upstream pool capacity exists
  • Creation of new upstream connections is blocked if no valid lease exists

This preserves safety while avoiding unnecessary outages.


9.2.5 Rolling updates & shutdown coordination

On receipt of SIGTERM:

  • Readiness flips to false immediately
  • Liveness remains true
  • Lease refresh is disabled
  • No new upstream connections are created
  • Existing sessions drain normally

This aligns with Kubernetes terminationGracePeriodSeconds semantics.


9.2.6 Lease refresh race avoidance

During shutdown or pod termination:

  • Lease refresh logic MUST be disabled
  • Active lease remains valid until natural expiry
  • No new leases are requested

This prevents orphaned leases and audit noise.


9.2.7 WardenSense isolation

WardenSense health MUST NOT affect:

  • proxy readiness
  • proxy liveness
  • control plane readiness

ML systems are strictly observational and must never impact availability.


9.2.8 TLS failure semantics

If TLS materials expire or fail validation:

  • Readiness MUST return false
  • Liveness MUST remain true
  • Connections MUST fail closed

This prevents restart loops while enforcing security guarantees.


9.2.9 Control plane restarts

During control plane restarts or rolling deploys:

  • Proxies continue serving using LKG state
  • Config polling backs off with retry
  • Lease issuance failures do not terminate existing sessions

Control plane availability MUST NOT be a prerequisite for steady-state data plane operation.


9.2.10 Allowed failure matrix (v1)

The matrix below defines which failures are tolerated and what behavior is expected.

Failure / Condition Proxy behavior Control plane behavior WardenSense behavior User-visible impact
Control plane down Serve using LKG; continue existing sessions; allow new inbound only if pool has capacity; block new upstream if no valid lease Unavailable No effect Possible inability to create new upstream conns once pool is exhausted; otherwise none
WardenSense down No gating; continue normal enforcement; continue emitting audit to OTLP No effect Unavailable No drift alerts; enforcement unaffected
OTel Collector down Continue serving; best-effort emit may drop; no backpressure to data path Continue operating Continue operating Loss of portal event visibility/alerts until restored
Upstream DB down Surface native PG errors; keep proxy live/ready based on config + TLS No effect No effect DB unavailable (as normal)
Lease issuance fails (CP reachable but refusing) Existing sessions continue; new upstream creation blocked; readiness remains true Returns error No effect New connections may fail once pool exhausted
Invalid config published Keep LKG; emit proxy.config.invalid; continue serving Record deploy failure; do not override proxies No effect None (stays on last good)
TLS cert expired/invalid (proxy ingress) Fail closed; /readyz false; /livez true No effect No effect New inbound connections fail until fixed
Proxy pod termination (rolling update) Readiness false; drain sessions; no lease refresh; exit before grace timeout No effect No effect Zero or minimal disruption if grace period adequate
Network partition proxy↔CP Serve LKG; continue; retries w/backoff Continue; sees proxy missing No effect Same as control plane down for that proxy
Network partition proxy↔Upstream DB Surface native PG errors/timeouts; keep liveness true No effect No effect DB connectivity failure (as normal)

Notes:

  • The proxy must never crash-loop due to upstream DB unavailability.
  • Observability pipelines must not introduce backpressure into the data path.

pgwarden components are designed to run as stateless containers and must behave correctly under orchestrator‑driven termination (Docker, Kubernetes).

9.1.1 Signal handling

All pgwarden processes MUST:

  • Explicitly handle SIGTERM and SIGINT
  • Treat receipt of SIGTERM as a graceful shutdown request, not an immediate exit

SIGKILL is assumed to be unrecoverable and is out of scope.

9.1.2 Proxy graceful shutdown semantics

On receipt of SIGTERM, a pgwarden proxy must:

  1. Stop accepting new client connections immediately
  2. Continue serving existing client sessions
  3. Stop creating new upstream connections
  4. Allow all in‑flight queries to complete
  5. Drain upstream connection pools cleanly
  6. Flush buffered audit / telemetry events to the OpenTelemetry Collector (best effort)
  7. Exit only when:
    • all client sessions have closed, or
    • a configurable shutdown timeout is reached

This guarantees:

  • no dropped transactions
  • no partial writes
  • no credential leakage
  • safe rolling restarts

9.1.3 Control plane shutdown semantics

On receipt of SIGTERM, the control plane must:

  • Stop accepting new admin or API requests
  • Allow in‑flight requests to complete
  • Persist any in‑progress policy deployment or reconciliation state
  • Exit cleanly

9.1.4 WardenSense shutdown semantics

On receipt of SIGTERM, WardenSense must:

  • Stop ingesting new events
  • Flush in‑memory buffers (if any)
  • Exit without blocking proxy or control plane availability

9.1.5 Orchestrator alignment

pgwarden should expose configurable shutdown timeouts to align with:

  • Kubernetes terminationGracePeriodSeconds
  • Docker stop timeout behavior

Defaults should be conservative and safe (e.g. 30–60 seconds).

No pgwarden component may rely on local disk state for correctness.

  • pgwarden proxy deployed as stateless service
  • Control plane deployed as stateful service
  • Optional signal processors consume audit streams

Works with:

  • Kubernetes
  • Nomad
  • VM‑based deployments

10. Distribution & Deployment (Kubernetes + Docker prioritized)

pgwarden prioritizes:

  • Kubernetes deployments (Helm or Kustomize‑friendly)
  • Docker deployments (including Docker Compose)

The distribution unit is always pgwarden (product name), even though it may include multiple deployable services (proxy, control plane, auth plane, optional WardenSense).


10.1 Control Plane API Surface (Intent)

The control plane needs APIs for:

  • managing databases/targets
  • defining DSNs and their mapped contexts
  • authoring policies
  • compiling/deploying artifacts
  • distributing runtime config to proxies
  • audit plumbing and WardenSense configuration
  • upstream credential leasing for proxies

The control plane should expose a versioned HTTP API (REST or JSON‑over‑HTTP) for operators and automation.

10.1.1 Authentication & Authorization (MVP)

  • Control plane admin endpoints require auth (OIDC for humans)
  • Proxy↔control-plane internal endpoints (config poll, leases) use mTLS

10.1.2 Endpoint Inventory (v1)

The shapes below are intent; exact payloads can be finalized during implementation.

Health & meta

  • GET /v1/healthz
  • GET /v1/version

Targets (multiple upstream DBs)

  • GET /v1/targets
  • POST /v1/targets
  • GET /v1/targets/{targetId}
  • PATCH /v1/targets/{targetId}
  • DELETE /v1/targets/{targetId}

DSNs / Contexts (per target)

  • GET /v1/targets/{targetId}/dsns
  • POST /v1/targets/{targetId}/dsns
  • GET /v1/targets/{targetId}/dsns/{dsnId}
  • PATCH /v1/targets/{targetId}/dsns/{dsnId}
  • DELETE /v1/targets/{targetId}/dsns/{dsnId}

Fields should include at minimum:

  • name (stable DSN name)
  • mapped_role (pgwarden managed role)
  • defaults (session defaults)
  • wardensense_enabled (bool; default false)
  • masking (strategy + parameters)
  • pooling (optional overrides)

Policy authoring

  • GET /v1/policies
  • POST /v1/policies
  • GET /v1/policies/{policyId}
  • PATCH /v1/policies/{policyId}
  • DELETE /v1/policies/{policyId}

Policy deploy (MVP: admin-only)

For MVP simplicity, the control plane may run in a single-admin model where any authenticated admin can create and deploy policies.

  • POST /v1/policies/{policyId}/deploy (may implicitly compile)

Compilation & deployment

  • POST /v1/policies/{policyId}/compile (dry run)
  • POST /v1/policies/{policyId}/deploy
  • GET /v1/deployments (history)
  • GET /v1/deployments/{deploymentId}

Artifact ownership & reconciliation

  • GET /v1/targets/{targetId}/artifacts (pgwarden-owned objects)
  • POST /v1/targets/{targetId}/reconcile (idempotent apply)

Proxy fleet management (config distribution)

  • GET /v1/proxies (registered instances)
  • POST /v1/proxies/register (bootstrapping)
  • GET /v1/proxies/{proxyId}/config (rendered runtime config)
    • should support ETag so proxies can poll efficiently
    • proxies use If-None-Match to avoid full downloads

Defaults:

  • poll interval: 5s
  • invalid config: keep last-known-good

Control plane credential lease (for upstream connect)

Leases authorize proxies to create upstream connections without embedding DB credentials in application repos.

  • POST /v1/leases (issue lease for {targetId, dsnName, proxyId})
  • POST /v1/leases/redeem (single-step redeem; returns everything needed to connect)

MVP behavior:

  • Redeem returns plaintext upstream connection material over mTLS.
  • Credentials are kept in memory only by the proxy and never logged.
  • Leases are used only for upstream connection creation (not per query).

Redeem response (conceptual):

  • upstream host / port / dbname
  • username / password
  • sslmode (default require)
  • lease expires_at

This model is intentionally simple and portable for v1.

Audit access (metadata only)

  • GET /v1/audit/sessions
  • GET /v1/audit/sessions/{sessionId}
  • GET /v1/audit/metrics (aggregates)

Human-readable event review (portal support)

Endpoints:

  • GET /v1/events (filter by time, target, dsn, severity, type)
  • GET /v1/events/{eventId}

Batteries-included ingestion:

  • POST /v1/events/ingest (private; intended for OTel Collector export)
    • Authenticate via shared secret, mTLS, or internal network.
    • Payload is structured; no raw SQL and no PII.

Audit access (metadata only)

  • GET /v1/audit/sessions
  • GET /v1/audit/sessions/{sessionId}
  • GET /v1/audit/metrics (aggregates)

Human-readable event review (portal support)

To support a portal view of “what happened,” the control plane should expose read APIs over an event store that contains:

  • control plane events (policy changes, deploys, reconcile results)
  • proxy audit events (session metadata)
  • WardenSense alerts/events

Endpoints:

  • GET /v1/events (filter by time, target, dsn, severity, type)
  • GET /v1/events/{eventId}

Batteries-included ingestion:

  • POST /v1/events/ingest (private; intended for OTel Collector export)
    • The control plane should authenticate this path (shared secret, mTLS, or internal network only).
    • Payload is structured; no raw SQL and no PII.

WardenSense config & status

  • GET /v1/wardensense/status
  • GET /v1/wardensense/alerts
  • GET /v1/wardensense/alerts/{alertId}

10.2 WardenSense gRPC Surface (Intent)

WardenSense is a separate service; pgwarden needs a stable contract for:

  • shipping audit events or pulling them
  • querying alert state
  • (optionally) receiving model/baseline updates

Two viable patterns:

  • Push: pgwarden (or an audit forwarder) streams events to WardenSense
  • Pull: WardenSense reads from an audit sink (Kafka/NATS/OTLP) and only exposes query APIs

To keep coupling low, prefer Pull for production. However, a simple Push gRPC stream can be useful for early versions.

10.2.1 gRPC methods (v1)

Service: WardenSense

  • rpc IngestAuditEvent(stream AuditEvent) returns (IngestAck) (optional; push mode)
  • rpc GetStatus(StatusRequest) returns (StatusResponse)
  • rpc ListAlerts(ListAlertsRequest) returns (ListAlertsResponse)
  • rpc GetAlert(GetAlertRequest) returns (Alert)

Notes:

  • AuditEvent must never contain raw SQL or PII; only the structured metadata emitted by the proxy.
  • DSN/context fields must be present so WardenSense can honor the per‑DSN enablement toggle.

10.3 Logging & Audit Intent

10.3.1 Event envelope (v1)

All machine-ingested events conform to a common envelope. This keeps portal queries stable while allowing new event types to be added over time.

Required fields:

  • event_id (string; globally unique; ULID recommended)
  • event_time (RFC3339 timestamp)
  • event_type (string)
  • severity (DEBUG|INFO|WARN|ERROR)
  • schema_version (int; start at 1)

Source:

  • plane (proxy|control_plane|wardensense|collector)
  • instance_id (string)
  • version (string)

Correlation:

  • request_id (string, optional)
  • session_id (string, optional)
  • deployment_id (string, optional)

Scope (portal filtering):

  • target_id (string, optional)
  • dsn_name (string, optional)
  • context_id (string, optional)

Attributes:

  • attributes (object; event-type specific; no SQL, no PII)

10.3.2 Event types (v1 defaults)

Proxy lifecycle & config

  • proxy.instance.started
  • proxy.instance.heartbeat
  • proxy.config.applied
  • proxy.config.invalid

Connection/session audit

  • db.session.opened
  • db.session.closed
  • db.session.auth.failed
  • db.session.upstream.connect.succeeded
  • db.session.upstream.connect.failed
  • db.session.summary

Control plane governance

  • cp.target.created
  • cp.target.updated
  • cp.dsn.created
  • cp.dsn.updated
  • cp.policy.created
  • cp.policy.updated
  • cp.policy.compiled
  • cp.policy.deployed
  • cp.reconcile.started
  • cp.reconcile.completed
  • cp.reconcile.overwrote.drift

Credential lease flow

  • cp.upstream.lease.issued
  • cp.upstream.lease.redeemed
  • proxy.upstream.lease.refreshed
  • proxy.upstream.lease.refresh_failed

WardenSense

  • ws.pipeline.started
  • ws.ingest.health
  • ws.baseline.updated
  • ws.alert.raised
  • ws.alert.cleared

10.3.3 OTLP representation (MVP)

  • Audit events are represented as OTLP Logs.
  • Components emit OTLP to an OpenTelemetry Collector.

10.3.4 Event store indexing (recommended defaults)

For a Postgres-backed event store:

  • primary key: event_id
  • INDEX events_time_desc (event_time DESC)
  • INDEX events_target_time_desc (target_id, event_time DESC)
  • INDEX events_target_dsn_time_desc (target_id, dsn_name, event_time DESC)
  • INDEX events_type_time_desc (event_type, event_time DESC)
  • partial indexes for correlation IDs:
    • (session_id, event_time DESC) WHERE session_id IS NOT NULL
    • (request_id, event_time DESC) WHERE request_id IS NOT NULL

10.3.5 Idempotency

  • event_id must be unique.
  • /v1/events/ingest should use INSERT ... ON CONFLICT DO NOTHING to tolerate retries.

10.3.6 Batteries-included OpenTelemetry Collector

pgwarden ships with a reference OpenTelemetry Collector configuration (not embedded) intended to work out-of-the-box.

Reference files:

  • examples/otel/collector.yaml
  • examples/docker-compose.yaml (includes collector service)
  • charts/pgwarden/values.yaml (optional collector deployment)

Default behavior (when enabled):

  • receive OTLP logs from pgwarden components
  • export all events to the control plane event store via POST /v1/events/ingest
  • export filtered events (wardensense_enabled=true) to WardenSense

Deployment model:

  • Collector always runs as a separate container/pod.
  • Docker/Compose examples enable it by default.
  • Helm chart exposes otel.enabled to deploy it optionally.

Configuration contract:

  • pgwarden components only require OTEL_EXPORTER_OTLP_ENDPOINT.
  • Users may disable the reference Collector and point pgwarden to an existing Collector.

10.4

pgwarden ships with a batteries-included OpenTelemetry Collector configuration intended to work out-of-the-box for most users.

Design goals:

  • zero required observability expertise
  • no mandatory external dependencies
  • identical behavior across Docker and Kubernetes

Default behavior (when enabled):

  • receive OTLP logs from pgwarden components
  • export all events to the control plane event store via POST /v1/events/ingest
  • export filtered events (wardensense_enabled=true) to WardenSense

Deployment model:

  • The Collector always runs as a separate container/pod (never embedded in pgwarden binaries).
  • In Docker/Compose, the Collector is started automatically as part of the example deployment.
  • In Kubernetes, the Helm chart can optionally deploy the Collector as a Deployment.

Configuration:

  • pgwarden components only require OTEL_EXPORTER_OTLP_ENDPOINT.
  • Advanced users may disable the bundled Collector and point pgwarden at an existing Collector instead.

10.3.6 Defaults and opt-out

  • Default (Compose examples): Collector enabled.
  • Default (Helm values): Collector disabled, opt-in via otel.enabled=true.
  • When disabled, pgwarden continues to emit structured logs to stdout.

10.4 Suggested K8s Resource Mapping (Conceptual)

  • pgwarden-proxy: Deployment + Service
  • pgwarden-control-plane: Deployment + Service + DB (if needed)
  • pgwarden-auth: Deployment + Service (or external IdP integration)
  • pgwarden-wardensense (optional): Deployment + Service + its own DB

10.5 Configuration Schema (v1)

pgwarden should support a single config file schema (YAML/JSON) that works across Docker and Kubernetes.

Loading order (recommended):

  1. PGWARDEN_CONFIG points to a YAML file
  2. Environment variables override specific fields (optional)

Kubernetes persistence note (secrets/config):

  • When deployed on Kubernetes, config should be provided via ConfigMap and secrets via Secret volumes.
  • These are persisted in the cluster control plane datastore and will remount correctly if pods reschedule to different nodes.
  • Avoid relying on node-local storage or writing secrets into container filesystems at runtime.

10.5.1 Core keys

proxy:
  listen_addr: "0.0.0.0:5432"
  tls:
    mode: "required"   # required | mtls | disabled
    cert_file: "/etc/pgwarden/tls/server.crt"
    key_file: "/etc/pgwarden/tls/server.key"
    client_ca_file: "/etc/pgwarden/tls/client-ca.crt"  # required when mode=mtls

  # Global defaults; DSNs may override
  pooling_defaults:
    max_upstream_conns: 20
    min_idle_upstream_conns: 2
    idle_upstream_timeout: "5m"
    max_conn_lifetime: "30m"

  lease_defaults:
    ttl: "30m"
    refresh_window: "5m"
    refresh_jitter: 0.20
    reseed_on_refresh: true
    drain_old_conns: true

control_plane:
  http_listen_addr: "0.0.0.0:8080"  # HTTP only; HTTPS terminated externally

  # Proxy↔control-plane internal APIs (config poll, leases)
  internal_mtls:
    cert_file: "/etc/pgwarden/internal-mtls/client.crt"
    key_file: "/etc/pgwarden/internal-mtls/client.key"
    ca_file: "/etc/pgwarden/internal-mtls/ca.crt"

  auth:
    mode: "oidc"
    oidc:
      issuer_url: "https://issuer.example.com"
      client_id: "pgwarden"
      redirect_url: "https://pgwarden.example.com/callback"

targets:
  - name: "prod"
    upstream:
      host: "prod-db.example.com"
      port: 5432
      dbname: "app"
    dsns:
      - name: "app_rw"
        mapped_role: "pgw_app_rw"
        wardensense_enabled: false
        masking:
          strategy: "partial_reveal"  # partial_reveal | default_value
          params:
            reveal_last: 4
            mask_char: "*"
        pooling:
          max_upstream_conns: 50
      - name: "ai_inference"
        mapped_role: "pgw_ai_ro"
        wardensense_enabled: true
        masking:
          strategy: "default_value"
          params:
            value: "(555) 555-5555"

signals:
  enabled: false
  audit:
    sink: "otlp"   # stdout | otlp
proxy:
  listen_addr: "0.0.0.0:5432"
  # TLS is ON by default
  tls:
    mode: "required"   # required | mtls | disabled
    cert_file: "/etc/pgwarden/tls/server.crt"
    key_file: "/etc/pgwarden/tls/server.key"
    client_ca_file: "/etc/pgwarden/tls/client-ca.crt"  # required when mode=mtls

control_plane:
  http_listen_addr: "0.0.0.0:8080"  # HTTP only; HTTPS terminated externally
  auth:
    mode: "oidc"  # oidc | mtls | disabled (dev only)
    oidc:
      issuer_url: "https://issuer.example.com"
      client_id: "pgwarden"
      # client_secret should be injected via env/secret, not plaintext
      redirect_url: "https://pgwarden.example.com/callback"
  rbac:
    roles:
      - name: "policy_admin"
      - name: "policy_reviewer"
      - name: "policy_deployer"

targets:
  # One control plane can manage many databases
  - name: "prod"
    upstream:
      host: "prod-db.example.com"
      port: 5432
      dbname: "app"
      # upstream auth injected via secret
    dsns:
      - name: "app_rw"
        mapped_role: "pgw_app_rw"
      - name: "developer_sanitized"
        mapped_role: "pgw_dev_sanitized"

signals:
  enabled: false
  # signals service is separate and has its own DB; this section only configures emission
  audit:
    sink: "stdout"   # stdout | http | nats | kafka | otlp

Notes:

  • proxy.tls.mode defaults to required if omitted.
  • When proxy.tls.mode=required, the proxy MUST refuse plaintext connections and clients must connect with sslmode=require (or stronger).
  • When proxy.tls.mode=mtls, client_ca_file is required and the proxy MUST require valid client certs.
  • The control plane HTTP endpoint is intentionally HTTP‑only at the service level; HTTPS is terminated by an ingress/reverse proxy.

10.4 Deployment Examples

These are minimal examples intended as starter templates.

10.4.1 Docker Compose (TLS required default)

services:
  pgwarden:
    image: pgwarden:latest
    ports:
      - "5432:5432"   # Postgres proxy (TLS required)
      - "8080:8080"   # Control plane HTTP (terminate HTTPS upstream if needed)
    environment:
      - PGWARDEN_CONFIG=/etc/pgwarden/config.yaml
      # secrets should be injected via env or docker secrets
    volumes:
      - ./config/config.yaml:/etc/pgwarden/config.yaml:ro
      - ./tls/server.crt:/etc/pgwarden/tls/server.crt:ro
      - ./tls/server.key:/etc/pgwarden/tls/server.key:ro

Client example (app):

  • postgres://...@pgwarden-host:5432/db?sslmode=require

10.4.2 Docker Compose (mTLS)

services:
  pgwarden:
    image: pgwarden:latest
    ports:
      - "5432:5432"
      - "8080:8080"
    environment:
      - PGWARDEN_CONFIG=/etc/pgwarden/config.yaml
    volumes:
      - ./config/config.yaml:/etc/pgwarden/config.yaml:ro
      - ./tls/server.crt:/etc/pgwarden/tls/server.crt:ro
      - ./tls/server.key:/etc/pgwarden/tls/server.key:ro
      - ./tls/client-ca.crt:/etc/pgwarden/tls/client-ca.crt:ro

Client example (mTLS):

  • sslmode=verify-full + client cert/key in the client driver (language dependent)

10.4.3 Kubernetes (proxy terminates Postgres TLS; HTTPS terminated externally)

ConfigMap (non‑secrets):

apiVersion: v1
kind: ConfigMap
metadata:
  name: pgwarden-config
data:
  config.yaml: |
    proxy:
      listen_addr: "0.0.0.0:5432"
      tls:
        mode: "required"
        cert_file: "/etc/pgwarden/tls/server.crt"
        key_file: "/etc/pgwarden/tls/server.key"
    control_plane:
      http_listen_addr: "0.0.0.0:8080"
      auth:
        mode: "oidc"
        oidc:
          issuer_url: "https://issuer.example.com"
          client_id: "pgwarden"
          redirect_url: "https://pgwarden.example.com/callback"

Secret (proxy TLS cert/key):

apiVersion: v1
kind: Secret
metadata:
  name: pgwarden-proxy-tls
type: Opaque
data:
  server.crt: <base64>
  server.key: <base64>
  # for mTLS add client-ca.crt

Deployment (mount config + tls):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgwarden
spec:
  replicas: 2
  selector:
    matchLabels:
      app: pgwarden
  template:
    metadata:
      labels:
        app: pgwarden
    spec:
      containers:
        - name: pgwarden
          image: pgwarden:latest
          env:
            - name: PGWARDEN_CONFIG
              value: /etc/pgwarden/config.yaml
          ports:
            - containerPort: 5432
            - containerPort: 8080
          volumeMounts:
            - name: config
              mountPath: /etc/pgwarden
              readOnly: true
            - name: proxy-tls
              mountPath: /etc/pgwarden/tls
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: pgwarden-config
            items:
              - key: config.yaml
                path: config.yaml
        - name: proxy-tls
          secret:
            secretName: pgwarden-proxy-tls

Ingress (control plane HTTPS termination):

  • Terminate HTTPS at Ingress (cert-manager / external CA / certbot), route to service port 8080.

11. Auth Plane: Identity, Admin Access, and Session Attestation

11.1 Purpose

Auth applies to two distinct surfaces:

  1. Control Plane (Admin/API/UI)
  • Provides authentication + authorization for operators
  • Required for any policy creation, approval, or deployment actions
  1. Data Plane Ingress (Client → Proxy connections)
  • Provides transport security and optional client identity guarantees
  • Enforces TLS requirements for inbound connections

11.2 Non‑Goals

  • It does not rewrite SQL
  • It does not bypass the proxy enforcement model
  • It does not introduce probabilistic decision‑making into access enforcement

11.3 Control Plane Auth (Required)

The control plane MUST be protected by authentication and authorization.

Recommended baseline:

  • OIDC for interactive users (admin UI)
  • OIDC client credentials or mTLS for automation (CI/CD, GitOps agents)

Authorization can be enforced via roles such as:

  • policy_admin
  • policy_reviewer
  • policy_deployer

Policy changes MUST be human approved (e.g., reviewer gate) before deployment to production targets.

11.4 Data Plane Ingress TLS (Default: TLS required; Optional: mTLS)

The proxy MUST support a mode where TLS is mandatory on the Postgres ingress. This should be enabled by default.

Supported TLS modes:

  • TLS required (default): server certificate presented; clients must use sslmode=require (or stronger)
  • mTLS required (optional): server + client certificates required
  • Disabled (dev only): explicit opt‑out; not recommended

Enforcement behavior:

  • When TLS is required, the proxy MUST refuse plaintext connections.
  • Configuration should make the secure default easy and the insecure mode explicit.

Kubernetes note:

  • Postgres TLS should be terminated at the pgwarden proxy by default (certs provided via Secret mount).

11.5 Control Plane HTTPS Responsibility Boundary

The pgwarden control plane should expose an HTTP admin endpoint and rely on the deployment environment for HTTPS termination and certificate lifecycle.

Recommended approaches:

  • Kubernetes Ingress + a third‑party CA (e.g., cert‑manager) or certbot‑managed termination
  • Reverse proxy / load balancer TLS termination

pgwarden is not responsible for certificate issuance/renewal for the admin endpoint.

11.6 Deterministic Identity Mapping

When identity is available (e.g., mTLS SAN, JWT claims), the proxy must have a deterministic mapping from authenticated identitypgwarden context/role.


12. Open Design Questions

Remaining open questions (implementation details):

  • Event envelope + required attributes for /v1/events/ingest and /v1/events filtering
  • Masking implementation location and how compiler generates SQL for partial_reveal and default_value
  • Lease protocol details: request/response shapes, TTL rules, and whether redeem returns creds directly or via a secondary fetch
  • OTel Collector reference configs (Compose + Helm) for batteries-included OTLP routing to event store and WardenSense