pgwarden

A deterministic, policy‑enforcing PostgreSQL access proxy with an optional control plane and AI drift‑detection signals

1. Project Overview

pgwarden is a PostgreSQL wire‑protocol proxy that enforces least‑privilege, context‑aware access to sensitive data at the database boundary rather than inside application code.

It is designed for environments where:

multiple workloads (apps, developers, AI systems) access the same database
PII exposure must be strictly controlled
auditability and reproducibility matter
policy enforcement must survive infrastructure restarts

pgwarden is shipped as a single product name (“pgwarden”) for Docker and Kubernetes distribution. Internally, it is composed of orthogonal services that can be deployed together or separately:

Data Plane (Proxy / Enforcement Layer)
Control Plane (Policy Definition & Compilation)
Auth Plane (Identity / Session Attestation)
Signal Plane (Audit, Drift & Anomaly Detection – optional; orthogonal)

Deterministic enforcement is always the source of truth. ML‑based systems never gate access.

2. Design Principles

Infrastructure‑level enforcement over application‑level conventions
Deterministic, auditable behavior for all access decisions
Zero trust by default between workloads
No query rewriting and no ORM requirements
No raw PII stored outside the upstream database
Stateless data plane, stateful control plane
AI systems treated as first‑class but constrained actors

3. Data Plane: Postgres Access Proxy (Enforcement Layer)

3.1 Responsibilities

The data plane is a PostgreSQL‑compatible wire‑protocol proxy that:

Accepts inbound Postgres connections via multiple DSNs
Enforces Postgres TLS on ingress by default (sslmode=require compatible); plaintext may be explicitly disabled for dev only
Optionally enforces mTLS on ingress
Maps each DSN to a connection context (e.g. prod app, dev, AI)
Authenticates clients (inbound) and terminates authentication at the proxy
Connects upstream using proxy‑managed credentials mapped to the DSN context
Enforces visibility rules using Postgres roles, schemas, and views
Emits structured audit metadata for every session

The proxy does not:

parse or rewrite SQL
store query contents
make probabilistic access decisions

3.2 Context‑Bound DSNs

Each inbound DSN represents a policy surface, not a database.

Examples:

app_rw
app_ro
developer_sanitized
ai_inference

Each DSN maps to:

a specific Postgres role
a constrained schema/view set
fixed session defaults
optional WardenSense enablement (default OFF)
optional join‑leakage allowance (default OFF)

3.2.1 Security posture

In v1, authorization is primarily DSN/context‑based.

TLS is required by default; mTLS is optional.
Client username/password may still be required for the inbound connection, but privileges are determined by DSN context.

Upstream credential isolation (v1):

The proxy terminates inbound auth.
The proxy does not embed upstream DB credentials in application repos.
The proxy obtains authorization to connect upstream using a short‑lived token issued by the control plane.

This reduces the blast radius of credential leakage from source repos.

3.3 Enforcement Model

Access control is enforced by:

role‑based permissions
schema isolation
curated views for sensitive fields
column masking at the view level (policy‑selectable)

Masking strategies (v1):

partial reveal (parameterized): reveal N characters, mask the rest with *.
- parameters may differ by field type (e.g., reveal last 4 digits for phone/account)
- should support separate handling for letters vs digits
default value (static placeholder): e.g., "John Smith", "(555) 555-5555"

No salts/keys are required for these strategies.

The proxy ensures clients cannot escape their assigned context, even if client credentials leak.

3.4 Credential rotation (v1)

Credential and certificate rotation is manual and declarative in v1:

to rotate upstream DB credentials (stored in control plane), update them via admin portal and redeploy/reload as needed
to rotate proxy TLS/mTLS materials, reprovision certs and reload

3.5 Upstream leases, pooling, and seamless refresh (v1)

Upstream credentials are never embedded into application repos. Instead, the proxy obtains authorization to create upstream connections via short‑lived leases issued by the control plane.

Lease model (connection-creation only):

Leases are used only to authorize creation of upstream connections (not per query).
The proxy maintains an upstream pool per DSN.

Defaults (global; per‑DSN override):

lease_ttl: 30m
lease_refresh_window: 5m
lease_refresh_jitter: 20% (randomized)
pool.max_upstream_conns: 20
pool.min_idle_upstream_conns: 2
pool.idle_upstream_timeout: 5m
pool.max_conn_lifetime: 30m

Seamless refresh:

Proxy refreshes leases proactively within the refresh window.
Proxy swaps active_lease to the newly fetched lease.
Reseed strategy: after swapping, the proxy proactively creates upstream connections to restore min_idle_upstream_conns using the new lease.
Pool draining: upstream connections created under the previous lease are marked draining:
- not used for new sessions
- closed when idle or when max_conn_lifetime is reached

Failure behavior:

If the control plane is unreachable, existing upstream connections continue serving traffic.
New upstream connections require a valid lease; if none is available, the proxy fails closed for new upstream creation while maintaining existing sessions.

4. Control Plane: Policy Definition & Compilation

4.1 Purpose

The control plane exists to make database access policies:

easy to define
hard to bypass
reproducible
durable across restarts and redeployments

It coexists with existing production database administration.

pgwarden is not the sole authority for all production changes.
However, pgwarden is authoritative for pgwarden‑managed artifacts (roles/grants/views it creates or owns).

The control plane does not replace SQL and does not expose raw database access to operators.

4.2 Core Responsibilities

Manage multiple upstream databases from a single control plane
Manage inbound DSNs per database
Define context‑aware access policies
Compile policies into deterministic Postgres artifacts
Securely manage credentials and secrets

4.3 Policy Model (Conceptual)

Policies describe:

Who (context / workload)
What (schemas, tables, views)
How (read/write, masking, row filtering)
Where (target database)

Policies are declarative and compiled into:

Postgres roles
grants/revocations
generated views

4.4 Compilation Flow

Operator defines or updates a policy
Control plane validates policy constraints
Policy is compiled into deterministic Postgres artifacts
Artifacts are applied idempotently to the target database(s)
Proxy reloads mappings without downtime

Compiler permissions (explicit):

May create and manage roles (pgwarden‑scoped)
May grant / revoke privileges
May drop / replace pgwarden‑managed views

Coexistence rule: pgwarden must avoid clobbering non‑pgwarden objects. In practice, this is achieved by a clear ownership boundary (naming convention + metadata table) and by only mutating objects it owns.

4.5 State & Persistence

Policies are stored centrally (e.g., Postgres / etcd / CRD‑like store)
Secrets are never logged
The data plane can restart safely using the last known good compiled state

4.6 Artifact Ownership & Reconciliation

pgwarden must coexist with normal DB operations while remaining authoritative for pgwarden‑managed objects.

Ownership boundary (recommended):

Naming convention for managed objects, e.g. pgw_ prefix for roles/views/schemas
Optional metadata catalog table (recommended) to record:
- target id
- object name/type
- desired definition hash
- last applied timestamp
- last applied by deployment id

Reconcile behavior (desired):

If pgwarden‑managed objects are modified manually, reconcile should overwrite back to the declared/compiled state.
Reconcile must avoid mutating objects it does not own.

Rationale: overwrite‑back provides the least surprising path to “working state” for managed objects, while still allowing coexistence for everything else.

5. Audit & Observability

5.1 Deterministic Audit Signals

For every session, pgwarden emits structured metadata such as:

context / DSN
role
source identity
connection timing
statement counts
error rates

No query payloads or PII are recorded.

5.2 Compliance & Forensics

Audit data is designed to support:

access reviews
incident response
compliance reporting

6. WardenSense: Activity Drift & Anomaly Detection (Optional, Orthogonal)

WardenSense is pgwarden’s optional ML/heuristics signal service. It is toggle‑able per DSN connection context and is OFF by default.

6.1 Purpose

WardenSense detects unexpected patterns in database access behavior—especially from AI‑driven workloads—without influencing enforcement.

It exists solely to:

surface early warnings
reduce mean‑time‑to‑investigation

6.2 Deployment Shape

WardenSense runs as its own binary/service with its own database.

It consumes pgwarden audit events (via telemetry sink)
It stores derived features, baselines, and alert state in its own persistence layer

6.3 Inputs

Signals are derived exclusively from deterministic audit metadata, such as:

request/session rate changes
time‑of‑day shifts
schema/DSN access expansion
error pattern changes

6.4 Constraints

No raw queries
No PII
No inline blocking
No automated policy mutation

6.5 Outputs

Alerts
dashboards
forensic annotations

6.6 Relationship to Enforcement

WardenSense never gates access. Policy changes are always deterministic and human‑controlled.

All access control remains policy‑driven and deterministic.

6.7 DSN‑Scoped Toggle (Default OFF)

WardenSense is enabled per DSN connection context:

default: enabled=false
configurable: enabled=true for specific DSNs (e.g., ai_inference)

Idiomatic routing:

Proxy emits all audit events to OTLP.
Each event includes attributes: dsn_name, context_id, and wardensense_enabled.
OTel Collector routes events to WardenSense only when wardensense_enabled=true, while still exporting all events to the control plane event store.

7. Threat Model Summary

Protected against:

accidental PII exposure
credential reuse across contexts
over‑privileged AI workloads
lateral access escalation

Not intended to protect against:

malicious superusers
compromised upstream databases

8. Non‑Goals

Query rewriting or SQL linting
ORM replacement
Data loss prevention inside queries
Automated policy learning

9. Deployment Model (High Level)

pgwarden proxy deployed as stateless service
Control plane deployed as stateful service
Optional signal processors consume audit streams

Works with:

Kubernetes
Nomad
VM‑based deployments

9.1 Graceful Shutdown & Stateless Container Semantics

9.2 Operational Semantics (Authoritative)

This section defines non-negotiable runtime behavior for pgwarden components. These rules exist to prevent orchestrator-induced outages and data loss.

9.2.1 Last-known-good (LKG) configuration (Option A – selected)

The pgwarden proxy MUST persist a last-known-good (LKG) runtime configuration locally and treat it as a cache, not a source of truth.

Characteristics:

LKG is written only after successful config validation
LKG is immutable until replaced by a newer valid config
LKG is never mutated incrementally

Persistence mechanisms (any one acceptable):

Mounted volume (Docker volume / Kubernetes emptyDir or PVC)
Serialized snapshot derived from ConfigMap
Encrypted local file on disk

Guarantees:

Proxy can start and serve traffic without control plane availability
Proxy never starts with "zero config"
Control plane remains the sole authority for desired state

9.2.2 Startup ordering

Proxy startup rules:

Proxy starts independently of control plane availability
Proxy loads LKG configuration from local persistence
Proxy binds listener sockets and prepares TLS materials
Proxy enters readiness state if LKG is valid
Proxy begins polling control plane opportunistically

Hard rule:

The proxy MUST NOT block startup on control plane connectivity.

9.2.3 Readiness vs liveness semantics

pgwarden exposes explicit endpoints for liveness and readiness probes.

9.2.3.1 Probe endpoints (per component)

Proxy

GET /livez — liveness only
GET /readyz — readiness only
GET /healthz — convenience endpoint (same as /readyz for the proxy)

Control plane

GET /livez
GET /readyz
GET /healthz — convenience endpoint (same as /readyz)

WardenSense

GET /livez
GET /readyz
GET /healthz — convenience endpoint (same as /readyz)

Response contract (all components):

200 OK with a short JSON body (e.g., { "status": "ok" }) when healthy
503 Service Unavailable when not ready (for /readyz) or not live (for /livez)

Critical shutdown rule:

On receipt of SIGTERM, /readyz MUST immediately return 503 for the remainder of the process lifetime, even while /livez continues to return 200 during graceful drain.

These endpoints MUST NOT leak secrets, policy contents, or upstream connection strings.

Liveness probe ("should this process be restarted?")

Liveness MUST return true if:

process is running
event loop is responsive
configuration subsystem is loaded (even LKG)

Liveness MUST NOT depend on:

control plane reachability
upstream database availability
lease availability
WardenSense availability

Liveness indicates only irrecoverable failure.

Readiness probe ("should this pod receive traffic?")

Readiness MUST return true only if:

configuration is loaded and valid (current or LKG)
TLS materials are loaded and valid
listener socket is bound

Readiness MUST return false when:

process receives SIGTERM
configuration validation fails
TLS materials are invalid or expired

Readiness MUST NOT depend on:

control plane reachability
WardenSense health

9.2.4 Degraded-mode behavior

When the control plane is unavailable:

Existing client sessions continue
New inbound client connections are allowed only if upstream pool capacity exists
Creation of new upstream connections is blocked if no valid lease exists

This preserves safety while avoiding unnecessary outages.

9.2.5 Rolling updates & shutdown coordination

On receipt of SIGTERM:

Readiness flips to false immediately
Liveness remains true
Lease refresh is disabled
No new upstream connections are created
Existing sessions drain normally

This aligns with Kubernetes terminationGracePeriodSeconds semantics.

9.2.6 Lease refresh race avoidance

During shutdown or pod termination:

Lease refresh logic MUST be disabled
Active lease remains valid until natural expiry
No new leases are requested

This prevents orphaned leases and audit noise.

9.2.7 WardenSense isolation

WardenSense health MUST NOT affect:

proxy readiness
proxy liveness
control plane readiness

ML systems are strictly observational and must never impact availability.

9.2.8 TLS failure semantics

If TLS materials expire or fail validation:

Readiness MUST return false
Liveness MUST remain true
Connections MUST fail closed

This prevents restart loops while enforcing security guarantees.

9.2.9 Control plane restarts

During control plane restarts or rolling deploys:

Proxies continue serving using LKG state
Config polling backs off with retry
Lease issuance failures do not terminate existing sessions

Control plane availability MUST NOT be a prerequisite for steady-state data plane operation.

9.2.10 Allowed failure matrix (v1)

The matrix below defines which failures are tolerated and what behavior is expected.

Failure / Condition	Proxy behavior	Control plane behavior	WardenSense behavior	User-visible impact
Control plane down	Serve using LKG; continue existing sessions; allow new inbound only if pool has capacity; block new upstream if no valid lease	Unavailable	No effect	Possible inability to create new upstream conns once pool is exhausted; otherwise none
WardenSense down	No gating; continue normal enforcement; continue emitting audit to OTLP	No effect	Unavailable	No drift alerts; enforcement unaffected
OTel Collector down	Continue serving; best-effort emit may drop; no backpressure to data path	Continue operating	Continue operating	Loss of portal event visibility/alerts until restored
Upstream DB down	Surface native PG errors; keep proxy live/ready based on config + TLS	No effect	No effect	DB unavailable (as normal)
Lease issuance fails (CP reachable but refusing)	Existing sessions continue; new upstream creation blocked; readiness remains true	Returns error	No effect	New connections may fail once pool exhausted
Invalid config published	Keep LKG; emit `proxy.config.invalid`; continue serving	Record deploy failure; do not override proxies	No effect	None (stays on last good)
TLS cert expired/invalid (proxy ingress)	Fail closed; `/readyz` false; `/livez` true	No effect	No effect	New inbound connections fail until fixed
Proxy pod termination (rolling update)	Readiness false; drain sessions; no lease refresh; exit before grace timeout	No effect	No effect	Zero or minimal disruption if grace period adequate
Network partition proxy↔CP	Serve LKG; continue; retries w/backoff	Continue; sees proxy missing	No effect	Same as control plane down for that proxy
Network partition proxy↔Upstream DB	Surface native PG errors/timeouts; keep liveness true	No effect	No effect	DB connectivity failure (as normal)

Notes:

The proxy must never crash-loop due to upstream DB unavailability.
Observability pipelines must not introduce backpressure into the data path.

pgwarden components are designed to run as stateless containers and must behave correctly under orchestrator‑driven termination (Docker, Kubernetes).

9.1.1 Signal handling

All pgwarden processes MUST:

Explicitly handle SIGTERM and SIGINT
Treat receipt of SIGTERM as a graceful shutdown request, not an immediate exit

SIGKILL is assumed to be unrecoverable and is out of scope.

9.1.2 Proxy graceful shutdown semantics

On receipt of SIGTERM, a pgwarden proxy must:

Stop accepting new client connections immediately
Continue serving existing client sessions
Stop creating new upstream connections
Allow all in‑flight queries to complete
Drain upstream connection pools cleanly
Flush buffered audit / telemetry events to the OpenTelemetry Collector (best effort)
Exit only when:
- all client sessions have closed, or
- a configurable shutdown timeout is reached

This guarantees:

no dropped transactions
no partial writes
no credential leakage
safe rolling restarts

9.1.3 Control plane shutdown semantics

On receipt of SIGTERM, the control plane must:

Stop accepting new admin or API requests
Allow in‑flight requests to complete
Persist any in‑progress policy deployment or reconciliation state
Exit cleanly

9.1.4 WardenSense shutdown semantics

On receipt of SIGTERM, WardenSense must:

Stop ingesting new events
Flush in‑memory buffers (if any)
Exit without blocking proxy or control plane availability

9.1.5 Orchestrator alignment

pgwarden should expose configurable shutdown timeouts to align with:

Kubernetes terminationGracePeriodSeconds
Docker stop timeout behavior

Defaults should be conservative and safe (e.g. 30–60 seconds).

No pgwarden component may rely on local disk state for correctness.

pgwarden proxy deployed as stateless service
Control plane deployed as stateful service
Optional signal processors consume audit streams

Works with:

Kubernetes
Nomad
VM‑based deployments

10. Distribution & Deployment (Kubernetes + Docker prioritized)

pgwarden prioritizes:

Kubernetes deployments (Helm or Kustomize‑friendly)
Docker deployments (including Docker Compose)

The distribution unit is always pgwarden (product name), even though it may include multiple deployable services (proxy, control plane, auth plane, optional WardenSense).

10.1 Control Plane API Surface (Intent)

The control plane needs APIs for:

managing databases/targets
defining DSNs and their mapped contexts
authoring policies
compiling/deploying artifacts
distributing runtime config to proxies
audit plumbing and WardenSense configuration
upstream credential leasing for proxies

The control plane should expose a versioned HTTP API (REST or JSON‑over‑HTTP) for operators and automation.

10.1.1 Authentication & Authorization (MVP)

Control plane admin endpoints require auth (OIDC for humans)
Proxy↔control-plane internal endpoints (config poll, leases) use mTLS

10.1.2 Endpoint Inventory (v1)

The shapes below are intent; exact payloads can be finalized during implementation.

Health & meta

GET /v1/healthz
GET /v1/version

Targets (multiple upstream DBs)

GET /v1/targets
POST /v1/targets
GET /v1/targets/{targetId}
PATCH /v1/targets/{targetId}
DELETE /v1/targets/{targetId}

DSNs / Contexts (per target)

GET /v1/targets/{targetId}/dsns
POST /v1/targets/{targetId}/dsns
GET /v1/targets/{targetId}/dsns/{dsnId}
PATCH /v1/targets/{targetId}/dsns/{dsnId}
DELETE /v1/targets/{targetId}/dsns/{dsnId}

Fields should include at minimum:

name (stable DSN name)
mapped_role (pgwarden managed role)
defaults (session defaults)
wardensense_enabled (bool; default false)
masking (strategy + parameters)
pooling (optional overrides)

Policy authoring

GET /v1/policies
POST /v1/policies
GET /v1/policies/{policyId}
PATCH /v1/policies/{policyId}
DELETE /v1/policies/{policyId}

Policy deploy (MVP: admin-only)

For MVP simplicity, the control plane may run in a single-admin model where any authenticated admin can create and deploy policies.

POST /v1/policies/{policyId}/deploy (may implicitly compile)

Compilation & deployment

POST /v1/policies/{policyId}/compile (dry run)
POST /v1/policies/{policyId}/deploy
GET /v1/deployments (history)
GET /v1/deployments/{deploymentId}

Artifact ownership & reconciliation

GET /v1/targets/{targetId}/artifacts (pgwarden-owned objects)
POST /v1/targets/{targetId}/reconcile (idempotent apply)

Proxy fleet management (config distribution)

GET /v1/proxies (registered instances)
POST /v1/proxies/register (bootstrapping)
GET /v1/proxies/{proxyId}/config (rendered runtime config)
- should support ETag so proxies can poll efficiently
- proxies use If-None-Match to avoid full downloads

Defaults:

poll interval: 5s
invalid config: keep last-known-good

Control plane credential lease (for upstream connect)

Leases authorize proxies to create upstream connections without embedding DB credentials in application repos.

POST /v1/leases (issue lease for {targetId, dsnName, proxyId})
POST /v1/leases/redeem (single-step redeem; returns everything needed to connect)

MVP behavior:

Redeem returns plaintext upstream connection material over mTLS.
Credentials are kept in memory only by the proxy and never logged.
Leases are used only for upstream connection creation (not per query).

Redeem response (conceptual):

upstream host / port / dbname
username / password
sslmode (default require)
lease expires_at

This model is intentionally simple and portable for v1.

Audit access (metadata only)

GET /v1/audit/sessions
GET /v1/audit/sessions/{sessionId}
GET /v1/audit/metrics (aggregates)

Human-readable event review (portal support)

Endpoints:

GET /v1/events (filter by time, target, dsn, severity, type)
GET /v1/events/{eventId}

Batteries-included ingestion:

POST /v1/events/ingest (private; intended for OTel Collector export)
- Authenticate via shared secret, mTLS, or internal network.
- Payload is structured; no raw SQL and no PII.

Audit access (metadata only)

GET /v1/audit/sessions
GET /v1/audit/sessions/{sessionId}
GET /v1/audit/metrics (aggregates)

Human-readable event review (portal support)

To support a portal view of “what happened,” the control plane should expose read APIs over an event store that contains:

control plane events (policy changes, deploys, reconcile results)
proxy audit events (session metadata)
WardenSense alerts/events

Endpoints:

GET /v1/events (filter by time, target, dsn, severity, type)
GET /v1/events/{eventId}

Batteries-included ingestion:

POST /v1/events/ingest (private; intended for OTel Collector export)
- The control plane should authenticate this path (shared secret, mTLS, or internal network only).
- Payload is structured; no raw SQL and no PII.

WardenSense config & status

GET /v1/wardensense/status
GET /v1/wardensense/alerts
GET /v1/wardensense/alerts/{alertId}

10.2 WardenSense gRPC Surface (Intent)

WardenSense is a separate service; pgwarden needs a stable contract for:

shipping audit events or pulling them
querying alert state
(optionally) receiving model/baseline updates

Two viable patterns:

Push: pgwarden (or an audit forwarder) streams events to WardenSense
Pull: WardenSense reads from an audit sink (Kafka/NATS/OTLP) and only exposes query APIs

To keep coupling low, prefer Pull for production. However, a simple Push gRPC stream can be useful for early versions.

10.2.1 gRPC methods (v1)

Service: WardenSense

rpc IngestAuditEvent(stream AuditEvent) returns (IngestAck) (optional; push mode)
rpc GetStatus(StatusRequest) returns (StatusResponse)
rpc ListAlerts(ListAlertsRequest) returns (ListAlertsResponse)
rpc GetAlert(GetAlertRequest) returns (Alert)

Notes:

AuditEvent must never contain raw SQL or PII; only the structured metadata emitted by the proxy.
DSN/context fields must be present so WardenSense can honor the per‑DSN enablement toggle.

10.3 Logging & Audit Intent

10.3.1 Event envelope (v1)

All machine-ingested events conform to a common envelope. This keeps portal queries stable while allowing new event types to be added over time.

Required fields:

event_id (string; globally unique; ULID recommended)
event_time (RFC3339 timestamp)
event_type (string)
severity (DEBUG|INFO|WARN|ERROR)
schema_version (int; start at 1)

Source:

plane (proxy|control_plane|wardensense|collector)
instance_id (string)
version (string)

Correlation:

request_id (string, optional)
session_id (string, optional)
deployment_id (string, optional)

Scope (portal filtering):

target_id (string, optional)
dsn_name (string, optional)
context_id (string, optional)

Attributes:

attributes (object; event-type specific; no SQL, no PII)

10.3.2 Event types (v1 defaults)

Proxy lifecycle & config

proxy.instance.started
proxy.instance.heartbeat
proxy.config.applied
proxy.config.invalid

Connection/session audit

db.session.opened
db.session.closed
db.session.auth.failed
db.session.upstream.connect.succeeded
db.session.upstream.connect.failed
db.session.summary

Control plane governance

cp.target.created
cp.target.updated
cp.dsn.created
cp.dsn.updated
cp.policy.created
cp.policy.updated
cp.policy.compiled
cp.policy.deployed
cp.reconcile.started
cp.reconcile.completed
cp.reconcile.overwrote.drift

Credential lease flow

cp.upstream.lease.issued
cp.upstream.lease.redeemed
proxy.upstream.lease.refreshed
proxy.upstream.lease.refresh_failed

WardenSense

ws.pipeline.started
ws.ingest.health
ws.baseline.updated
ws.alert.raised
ws.alert.cleared

10.3.3 OTLP representation (MVP)

Audit events are represented as OTLP Logs.
Components emit OTLP to an OpenTelemetry Collector.

10.3.4 Event store indexing (recommended defaults)

For a Postgres-backed event store:

primary key: event_id
INDEX events_time_desc (event_time DESC)
INDEX events_target_time_desc (target_id, event_time DESC)
INDEX events_target_dsn_time_desc (target_id, dsn_name, event_time DESC)
INDEX events_type_time_desc (event_type, event_time DESC)
partial indexes for correlation IDs:
- (session_id, event_time DESC) WHERE session_id IS NOT NULL
- (request_id, event_time DESC) WHERE request_id IS NOT NULL

10.3.5 Idempotency

event_id must be unique.
/v1/events/ingest should use INSERT ... ON CONFLICT DO NOTHING to tolerate retries.

10.3.6 Batteries-included OpenTelemetry Collector

pgwarden ships with a reference OpenTelemetry Collector configuration (not embedded) intended to work out-of-the-box.

Reference files:

examples/otel/collector.yaml
examples/docker-compose.yaml (includes collector service)
charts/pgwarden/values.yaml (optional collector deployment)

Default behavior (when enabled):

receive OTLP logs from pgwarden components
export all events to the control plane event store via POST /v1/events/ingest
export filtered events (wardensense_enabled=true) to WardenSense

Deployment model:

Collector always runs as a separate container/pod.
Docker/Compose examples enable it by default.
Helm chart exposes otel.enabled to deploy it optionally.

Configuration contract:

pgwarden components only require OTEL_EXPORTER_OTLP_ENDPOINT.
Users may disable the reference Collector and point pgwarden to an existing Collector.

10.4

pgwarden ships with a batteries-included OpenTelemetry Collector configuration intended to work out-of-the-box for most users.

Design goals:

zero required observability expertise
no mandatory external dependencies
identical behavior across Docker and Kubernetes

Default behavior (when enabled):

receive OTLP logs from pgwarden components
export all events to the control plane event store via POST /v1/events/ingest
export filtered events (wardensense_enabled=true) to WardenSense

Deployment model:

The Collector always runs as a separate container/pod (never embedded in pgwarden binaries).
In Docker/Compose, the Collector is started automatically as part of the example deployment.
In Kubernetes, the Helm chart can optionally deploy the Collector as a Deployment.

Configuration:

pgwarden components only require OTEL_EXPORTER_OTLP_ENDPOINT.
Advanced users may disable the bundled Collector and point pgwarden at an existing Collector instead.

10.3.6 Defaults and opt-out

Default (Compose examples): Collector enabled.
Default (Helm values): Collector disabled, opt-in via otel.enabled=true.
When disabled, pgwarden continues to emit structured logs to stdout.

10.4 Suggested K8s Resource Mapping (Conceptual)

pgwarden-proxy: Deployment + Service
pgwarden-control-plane: Deployment + Service + DB (if needed)
pgwarden-auth: Deployment + Service (or external IdP integration)
pgwarden-wardensense (optional): Deployment + Service + its own DB

10.5 Configuration Schema (v1)

pgwarden should support a single config file schema (YAML/JSON) that works across Docker and Kubernetes.

Loading order (recommended):

PGWARDEN_CONFIG points to a YAML file
Environment variables override specific fields (optional)

Kubernetes persistence note (secrets/config):

When deployed on Kubernetes, config should be provided via ConfigMap and secrets via Secret volumes.
These are persisted in the cluster control plane datastore and will remount correctly if pods reschedule to different nodes.
Avoid relying on node-local storage or writing secrets into container filesystems at runtime.

10.5.1 Core keys

proxy:
  listen_addr: "0.0.0.0:5432"
  tls:
    mode: "required"   # required | mtls | disabled
    cert_file: "/etc/pgwarden/tls/server.crt"
    key_file: "/etc/pgwarden/tls/server.key"
    client_ca_file: "/etc/pgwarden/tls/client-ca.crt"  # required when mode=mtls

  # Global defaults; DSNs may override
  pooling_defaults:
    max_upstream_conns: 20
    min_idle_upstream_conns: 2
    idle_upstream_timeout: "5m"
    max_conn_lifetime: "30m"

  lease_defaults:
    ttl: "30m"
    refresh_window: "5m"
    refresh_jitter: 0.20
    reseed_on_refresh: true
    drain_old_conns: true

control_plane:
  http_listen_addr: "0.0.0.0:8080"  # HTTP only; HTTPS terminated externally

  # Proxy↔control-plane internal APIs (config poll, leases)
  internal_mtls:
    cert_file: "/etc/pgwarden/internal-mtls/client.crt"
    key_file: "/etc/pgwarden/internal-mtls/client.key"
    ca_file: "/etc/pgwarden/internal-mtls/ca.crt"

  auth:
    mode: "oidc"
    oidc:
      issuer_url: "https://issuer.example.com"
      client_id: "pgwarden"
      redirect_url: "https://pgwarden.example.com/callback"

targets:
  - name: "prod"
    upstream:
      host: "prod-db.example.com"
      port: 5432
      dbname: "app"
    dsns:
      - name: "app_rw"
        mapped_role: "pgw_app_rw"
        wardensense_enabled: false
        masking:
          strategy: "partial_reveal"  # partial_reveal | default_value
          params:
            reveal_last: 4
            mask_char: "*"
        pooling:
          max_upstream_conns: 50
      - name: "ai_inference"
        mapped_role: "pgw_ai_ro"
        wardensense_enabled: true
        masking:
          strategy: "default_value"
          params:
            value: "(555) 555-5555"

signals:
  enabled: false
  audit:
    sink: "otlp"   # stdout | otlp

proxy:
  listen_addr: "0.0.0.0:5432"
  # TLS is ON by default
  tls:
    mode: "required"   # required | mtls | disabled
    cert_file: "/etc/pgwarden/tls/server.crt"
    key_file: "/etc/pgwarden/tls/server.key"
    client_ca_file: "/etc/pgwarden/tls/client-ca.crt"  # required when mode=mtls

control_plane:
  http_listen_addr: "0.0.0.0:8080"  # HTTP only; HTTPS terminated externally
  auth:
    mode: "oidc"  # oidc | mtls | disabled (dev only)
    oidc:
      issuer_url: "https://issuer.example.com"
      client_id: "pgwarden"
      # client_secret should be injected via env/secret, not plaintext
      redirect_url: "https://pgwarden.example.com/callback"
  rbac:
    roles:
      - name: "policy_admin"
      - name: "policy_reviewer"
      - name: "policy_deployer"

targets:
  # One control plane can manage many databases
  - name: "prod"
    upstream:
      host: "prod-db.example.com"
      port: 5432
      dbname: "app"
      # upstream auth injected via secret
    dsns:
      - name: "app_rw"
        mapped_role: "pgw_app_rw"
      - name: "developer_sanitized"
        mapped_role: "pgw_dev_sanitized"

signals:
  enabled: false
  # signals service is separate and has its own DB; this section only configures emission
  audit:
    sink: "stdout"   # stdout | http | nats | kafka | otlp

Notes:

proxy.tls.mode defaults to required if omitted.
When proxy.tls.mode=required, the proxy MUST refuse plaintext connections and clients must connect with sslmode=require (or stronger).
When proxy.tls.mode=mtls, client_ca_file is required and the proxy MUST require valid client certs.
The control plane HTTP endpoint is intentionally HTTP‑only at the service level; HTTPS is terminated by an ingress/reverse proxy.

10.4 Deployment Examples

These are minimal examples intended as starter templates.

10.4.1 Docker Compose (TLS required default)

services:
  pgwarden:
    image: pgwarden:latest
    ports:
      - "5432:5432"   # Postgres proxy (TLS required)
      - "8080:8080"   # Control plane HTTP (terminate HTTPS upstream if needed)
    environment:
      - PGWARDEN_CONFIG=/etc/pgwarden/config.yaml
      # secrets should be injected via env or docker secrets
    volumes:
      - ./config/config.yaml:/etc/pgwarden/config.yaml:ro
      - ./tls/server.crt:/etc/pgwarden/tls/server.crt:ro
      - ./tls/server.key:/etc/pgwarden/tls/server.key:ro

Client example (app):

postgres://...@pgwarden-host:5432/db?sslmode=require

10.4.2 Docker Compose (mTLS)

services:
  pgwarden:
    image: pgwarden:latest
    ports:
      - "5432:5432"
      - "8080:8080"
    environment:
      - PGWARDEN_CONFIG=/etc/pgwarden/config.yaml
    volumes:
      - ./config/config.yaml:/etc/pgwarden/config.yaml:ro
      - ./tls/server.crt:/etc/pgwarden/tls/server.crt:ro
      - ./tls/server.key:/etc/pgwarden/tls/server.key:ro
      - ./tls/client-ca.crt:/etc/pgwarden/tls/client-ca.crt:ro

Client example (mTLS):

sslmode=verify-full + client cert/key in the client driver (language dependent)

10.4.3 Kubernetes (proxy terminates Postgres TLS; HTTPS terminated externally)

ConfigMap (non‑secrets):

apiVersion: v1
kind: ConfigMap
metadata:
  name: pgwarden-config
data:
  config.yaml: |
    proxy:
      listen_addr: "0.0.0.0:5432"
      tls:
        mode: "required"
        cert_file: "/etc/pgwarden/tls/server.crt"
        key_file: "/etc/pgwarden/tls/server.key"
    control_plane:
      http_listen_addr: "0.0.0.0:8080"
      auth:
        mode: "oidc"
        oidc:
          issuer_url: "https://issuer.example.com"
          client_id: "pgwarden"
          redirect_url: "https://pgwarden.example.com/callback"

Secret (proxy TLS cert/key):

apiVersion: v1
kind: Secret
metadata:
  name: pgwarden-proxy-tls
type: Opaque
data:
  server.crt: <base64>
  server.key: <base64>
  # for mTLS add client-ca.crt

Deployment (mount config + tls):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgwarden
spec:
  replicas: 2
  selector:
    matchLabels:
      app: pgwarden
  template:
    metadata:
      labels:
        app: pgwarden
    spec:
      containers:
        - name: pgwarden
          image: pgwarden:latest
          env:
            - name: PGWARDEN_CONFIG
              value: /etc/pgwarden/config.yaml
          ports:
            - containerPort: 5432
            - containerPort: 8080
          volumeMounts:
            - name: config
              mountPath: /etc/pgwarden
              readOnly: true
            - name: proxy-tls
              mountPath: /etc/pgwarden/tls
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: pgwarden-config
            items:
              - key: config.yaml
                path: config.yaml
        - name: proxy-tls
          secret:
            secretName: pgwarden-proxy-tls

Ingress (control plane HTTPS termination):

Terminate HTTPS at Ingress (cert-manager / external CA / certbot), route to service port 8080.

11. Auth Plane: Identity, Admin Access, and Session Attestation

11.1 Purpose

Auth applies to two distinct surfaces:

Control Plane (Admin/API/UI)

Provides authentication + authorization for operators
Required for any policy creation, approval, or deployment actions

Data Plane Ingress (Client → Proxy connections)

Provides transport security and optional client identity guarantees
Enforces TLS requirements for inbound connections

11.2 Non‑Goals

It does not rewrite SQL
It does not bypass the proxy enforcement model
It does not introduce probabilistic decision‑making into access enforcement

11.3 Control Plane Auth (Required)

The control plane MUST be protected by authentication and authorization.

Recommended baseline:

OIDC for interactive users (admin UI)
OIDC client credentials or mTLS for automation (CI/CD, GitOps agents)

Authorization can be enforced via roles such as:

policy_admin
policy_reviewer
policy_deployer

Policy changes MUST be human approved (e.g., reviewer gate) before deployment to production targets.

11.4 Data Plane Ingress TLS (Default: TLS required; Optional: mTLS)

The proxy MUST support a mode where TLS is mandatory on the Postgres ingress. This should be enabled by default.

Supported TLS modes:

TLS required (default): server certificate presented; clients must use sslmode=require (or stronger)
mTLS required (optional): server + client certificates required
Disabled (dev only): explicit opt‑out; not recommended

Enforcement behavior:

When TLS is required, the proxy MUST refuse plaintext connections.
Configuration should make the secure default easy and the insecure mode explicit.

Kubernetes note:

Postgres TLS should be terminated at the pgwarden proxy by default (certs provided via Secret mount).

11.5 Control Plane HTTPS Responsibility Boundary

The pgwarden control plane should expose an HTTP admin endpoint and rely on the deployment environment for HTTPS termination and certificate lifecycle.

Recommended approaches:

Kubernetes Ingress + a third‑party CA (e.g., cert‑manager) or certbot‑managed termination
Reverse proxy / load balancer TLS termination

pgwarden is not responsible for certificate issuance/renewal for the admin endpoint.

11.6 Deterministic Identity Mapping

When identity is available (e.g., mTLS SAN, JWT claims), the proxy must have a deterministic mapping from authenticated identity → pgwarden context/role.

12. Open Design Questions

Remaining open questions (implementation details):

Event envelope + required attributes for /v1/events/ingest and /v1/events filtering
Masking implementation location and how compiler generates SQL for partial_reveal and default_value
Lease protocol details: request/response shapes, TTL rules, and whether redeem returns creds directly or via a secondary fetch
OTel Collector reference configs (Compose + Helm) for batteries-included OTLP routing to event store and WardenSense

FilesExpand file tree

pgwarden_architecture_and_design_specs.md

Latest commit

History

pgwarden_architecture_and_design_specs.md

File metadata and controls

pgwarden

1. Project Overview

2. Design Principles

3. Data Plane: Postgres Access Proxy (Enforcement Layer)

3.1 Responsibilities

3.2 Context‑Bound DSNs

3.2.1 Security posture

3.3 Enforcement Model

3.4 Credential rotation (v1)

3.5 Upstream leases, pooling, and seamless refresh (v1)

4. Control Plane: Policy Definition & Compilation

4.1 Purpose

4.2 Core Responsibilities

4.3 Policy Model (Conceptual)

4.4 Compilation Flow

4.5 State & Persistence

4.6 Artifact Ownership & Reconciliation

5. Audit & Observability

5.1 Deterministic Audit Signals

5.2 Compliance & Forensics

6. WardenSense: Activity Drift & Anomaly Detection (Optional, Orthogonal)

6.1 Purpose

6.2 Deployment Shape

6.3 Inputs

6.4 Constraints

6.5 Outputs

6.6 Relationship to Enforcement

6.7 DSN‑Scoped Toggle (Default OFF)

7. Threat Model Summary

8. Non‑Goals

9. Deployment Model (High Level)

9.1 Graceful Shutdown & Stateless Container Semantics

9.2 Operational Semantics (Authoritative)

9.2.1 Last-known-good (LKG) configuration (Option A – selected)

9.2.2 Startup ordering

9.2.3 Readiness vs liveness semantics

9.2.3.1 Probe endpoints (per component)

Liveness probe ("should this process be restarted?")

Readiness probe ("should this pod receive traffic?")

9.2.4 Degraded-mode behavior

9.2.5 Rolling updates & shutdown coordination

9.2.6 Lease refresh race avoidance

9.2.7 WardenSense isolation

9.2.8 TLS failure semantics

9.2.9 Control plane restarts

9.2.10 Allowed failure matrix (v1)

9.1.1 Signal handling

9.1.2 Proxy graceful shutdown semantics

9.1.3 Control plane shutdown semantics

9.1.4 WardenSense shutdown semantics

9.1.5 Orchestrator alignment

10. Distribution & Deployment (Kubernetes + Docker prioritized)

10.1 Control Plane API Surface (Intent)

10.1.1 Authentication & Authorization (MVP)

10.1.2 Endpoint Inventory (v1)

Health & meta

Targets (multiple upstream DBs)

DSNs / Contexts (per target)

Policy authoring

Policy deploy (MVP: admin-only)

Compilation & deployment

Artifact ownership & reconciliation

Proxy fleet management (config distribution)

Control plane credential lease (for upstream connect)

Audit access (metadata only)

Human-readable event review (portal support)

Audit access (metadata only)

Human-readable event review (portal support)

WardenSense config & status

10.2 WardenSense gRPC Surface (Intent)

10.2.1 gRPC methods (v1)

10.3 Logging & Audit Intent

10.3.1 Event envelope (v1)

10.3.2 Event types (v1 defaults)