Skip to content

Commit 5c6d499

Browse files
add foundational support for durable storage
1 parent bf9fe11 commit 5c6d499

47 files changed

Lines changed: 4535 additions & 8 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

DEVELOPMENT.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,33 @@ make dev-frontend # start Vite dev server (port 5173) with HMR
2323
make dev-bundle # build UI, serve full bundled experience at port 8001 via uv run
2424
```
2525

26-
Standard development uses `dev-backend` + `dev-frontend` in separate terminals. The Vite dev server proxies nothing the frontend calls the backend at `http://localhost:8001` directly via CORS.
26+
Standard development uses `dev-backend` + `dev-frontend` in separate terminals. The Vite dev server proxies nothing; the frontend calls the backend at `http://localhost:8001` directly via CORS.
2727

2828
`dev-bundle` is useful for testing the bundled UI experience without building a wheel. It copies `ui/dist` into the source tree temporarily and cleans up when the server exits.
2929

30+
### Postgres backend (optional, for `/api/runs`)
31+
32+
The default in-memory backend keeps `make dev-backend` zero-config. To exercise the async run pipeline locally, bring up a Postgres alongside the app:
33+
34+
```bash
35+
make pg-up # start postgres:17-alpine in a docker container (port 5432, ephemeral via --rm)
36+
make migrate # apply the agentevals schema
37+
make dev-backend-pg # pg-up + migrate + serve --dev with backend=postgres wired up
38+
make pg-down # stop the container; data is discarded with --rm
39+
```
40+
41+
Override the defaults via `PG_PORT=5433 make pg-up` etc. The `migrate` target is idempotent (a second invocation is a no-op).
42+
43+
Once running, submit a run with:
44+
45+
```bash
46+
curl -X POST http://localhost:8001/api/runs \
47+
-H 'content-type: application/json' \
48+
-d '{"spec": {"approach": "trace_replay", "target": {"kind": "inline", "inline": {...}}, "evalConfig": {"metrics": ["tool_trajectory_avg_score"]}}}'
49+
```
50+
51+
Then poll `GET /api/runs/{runId}` and `GET /api/runs/{runId}/results`. Without `storage.backend=postgres`, the `/api/runs` endpoints return 503 with a hint pointing at the env var.
52+
3053
### Building
3154

3255
```bash

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ COPY src ./src
2424

2525
COPY --from=ui /build/ui/dist ./src/agentevals/_static
2626

27-
RUN uv sync --frozen --no-dev --extra live \
27+
RUN uv sync --frozen --no-dev --extra live --extra postgres \
2828
&& groupadd --gid 1000 app \
2929
&& useradd --uid 1000 --gid app --home-dir /app --no-log-init app \
3030
&& chown -R app:app /app

Makefile

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,14 @@ HELM_CHART_DIR ?= charts/agentevals
1515
HELM_CHART_OCI_URL ?= $(HELM_REPO)/helm
1616
HELM_CHART_VERSION ?= $(VERSION)
1717

18-
.PHONY: build build-bundle build-docker build-ui release clean dev-backend dev-frontend dev-bundle test test-unit test-integration test-e2e helm-lint helm-template helm-test helm-cleanup helm-package helm-publish
18+
.PHONY: build build-bundle build-docker build-ui release clean dev-backend dev-backend-pg dev-frontend dev-bundle pg-up pg-down migrate test test-unit test-integration test-e2e helm-lint helm-template helm-test helm-cleanup helm-package helm-publish
19+
20+
PG_CONTAINER ?= agentevals-pg
21+
PG_PORT ?= 5432
22+
PG_USER ?= agentevals
23+
PG_PASSWORD ?= agentevals
24+
PG_DATABASE ?= agentevals
25+
PG_DSN ?= postgresql://$(PG_USER):$(PG_PASSWORD)@localhost:$(PG_PORT)/$(PG_DATABASE)
1926

2027
build:
2128
uv build
@@ -53,6 +60,30 @@ release: clean build-ui
5360
dev-backend:
5461
uv run agentevals serve --dev
5562

63+
pg-up:
64+
@if [ -z "$$(docker ps -q -f name=^/$(PG_CONTAINER)$$)" ]; then \
65+
docker run -d --rm --name $(PG_CONTAINER) \
66+
-e POSTGRES_USER=$(PG_USER) \
67+
-e POSTGRES_PASSWORD=$(PG_PASSWORD) \
68+
-e POSTGRES_DB=$(PG_DATABASE) \
69+
-p $(PG_PORT):5432 postgres:17-alpine; \
70+
else \
71+
echo "container $(PG_CONTAINER) already running"; \
72+
fi
73+
@until docker exec $(PG_CONTAINER) pg_isready -U $(PG_USER) >/dev/null 2>&1; do sleep 1; done
74+
@echo "Postgres ready at $(PG_DSN)"
75+
76+
pg-down:
77+
-docker stop $(PG_CONTAINER)
78+
79+
migrate:
80+
AGENTEVALS_DATABASE_URL=$(PG_DSN) uv run agentevals migrate up
81+
82+
dev-backend-pg: pg-up migrate
83+
AGENTEVALS_STORAGE_BACKEND=postgres \
84+
AGENTEVALS_DATABASE_URL=$(PG_DSN) \
85+
uv run agentevals serve --dev
86+
5687
dev-frontend:
5788
cd ui && npm run dev
5889

README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,24 @@ The source for the chart lives in [`charts/agentevals/`](charts/agentevals/) if
286286

287287
See the [Kubernetes example](examples/kubernetes/README.md) for an end-to-end walkthrough deploying agentevals alongside kagent and an OTel Collector on Kubernetes.
288288

289+
#### Postgres backend (`/api/runs`)
290+
291+
By default the chart deploys agentevals with an in-memory backend; runs and results are not persisted. To enable the async `POST /api/runs` pipeline with durable Postgres-backed state:
292+
293+
```bash
294+
# Bundled Postgres (dev / evaluation only):
295+
helm install agentevals oci://ghcr.io/agentevals-dev/agentevals/helm/agentevals \
296+
--set storage.backend=postgres \
297+
--set database.postgres.bundled.enabled=true
298+
299+
# Or supply an external Postgres DSN:
300+
helm install agentevals oci://ghcr.io/agentevals-dev/agentevals/helm/agentevals \
301+
--set storage.backend=postgres \
302+
--set database.postgres.url='postgresql://user:pass@host:5432/dbname'
303+
```
304+
305+
When `storage.backend=postgres` the app applies any pending schema migrations on startup (advisory-lock protected, safe across replicas) and starts an in-process worker that processes the run queue. Without `storage.backend=postgres` the `/api/runs` endpoints return 503 with a hint pointing at the env var.
306+
289307
## MCP Server
290308

291309
Exposes evaluation tools to MCP clients. A `.mcp.json` at the project root lets Claude Code pick it up automatically.

charts/agentevals/templates/_helpers.tpl

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,43 @@ app.kubernetes.io/name: {{ include "agentevals.name" . }}
4848
app.kubernetes.io/instance: {{ .Release.Name }}
4949
{{- end }}
5050

51+
{{- /*
52+
Selector labels scoped to the main app Pod and its Service. Carries the
53+
``app.kubernetes.io/component: agentevals`` discriminator so the agentevals
54+
Service does not also match the bundled Postgres Pod (which carries
55+
``app.kubernetes.io/component: database`` instead).
56+
*/ -}}
57+
{{- define "agentevals.app.selectorLabels" -}}
58+
{{ include "agentevals.selectorLabels" . }}
59+
app.kubernetes.io/component: agentevals
60+
{{- end }}
61+
5162
{{- define "agentevals.serviceAccountName" -}}
5263
{{- if .Values.serviceAccount.create }}
5364
{{- default (include "agentevals.fullname" .) .Values.serviceAccount.name }}
5465
{{- else }}
5566
{{- default "default" .Values.serviceAccount.name }}
5667
{{- end }}
5768
{{- end }}
69+
70+
{{/*
71+
Service name for the bundled Postgres instance.
72+
*/}}
73+
{{- define "agentevals.postgresqlServiceName" -}}
74+
{{- printf "%s-postgresql" (include "agentevals.fullname" .) -}}
75+
{{- end -}}
76+
77+
{{/*
78+
Bundled Postgres image reference (registry/repository/name:tag).
79+
*/}}
80+
{{- define "agentevals.postgresql.image" -}}
81+
{{- $pg := .Values.database.postgres.bundled -}}
82+
{{- printf "%s/%s/%s:%s" $pg.image.registry $pg.image.repository $pg.image.name $pg.image.tag -}}
83+
{{- end -}}
84+
85+
{{/*
86+
Secret name holding POSTGRES_PASSWORD for the bundled Postgres instance.
87+
*/}}
88+
{{- define "agentevals.passwordSecretName" -}}
89+
{{- printf "%s-postgresql" (include "agentevals.fullname" .) -}}
90+
{{- end -}}

charts/agentevals/templates/deployment.yaml

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ spec:
99
replicas: {{ .Values.replicaCount }}
1010
selector:
1111
matchLabels:
12-
{{- include "agentevals.selectorLabels" . | nindent 6 }}
12+
{{- include "agentevals.app.selectorLabels" . | nindent 6 }}
1313
template:
1414
metadata:
1515
{{- with .Values.podAnnotations }}
1616
annotations:
1717
{{- toYaml . | nindent 8 }}
1818
{{- end }}
1919
labels:
20-
{{- include "agentevals.selectorLabels" . | nindent 8 }}
20+
{{- include "agentevals.app.selectorLabels" . | nindent 8 }}
2121
{{- with .Values.podLabels }}
2222
{{- toYaml . | nindent 8 }}
2323
{{- end }}
@@ -65,6 +65,29 @@ spec:
6565
- name: HOME
6666
value: "/tmp/agentevals-home"
6767
{{- end }}
68+
{{- if eq .Values.storage.backend "postgres" }}
69+
- name: AGENTEVALS_STORAGE_BACKEND
70+
value: "postgres"
71+
- name: AGENTEVALS_DATABASE_SCHEMA
72+
value: {{ .Values.database.postgres.schema | quote }}
73+
{{- if .Values.database.postgres.urlFile }}
74+
- name: AGENTEVALS_DATABASE_URL_FILE
75+
value: {{ .Values.database.postgres.urlFile | quote }}
76+
{{- else if .Values.database.postgres.url }}
77+
- name: AGENTEVALS_DATABASE_URL
78+
value: {{ .Values.database.postgres.url | quote }}
79+
{{- else if .Values.database.postgres.bundled.enabled }}
80+
- name: POSTGRES_PASSWORD
81+
valueFrom:
82+
secretKeyRef:
83+
name: {{ include "agentevals.passwordSecretName" . }}
84+
key: POSTGRES_PASSWORD
85+
- name: AGENTEVALS_DATABASE_URL
86+
value: {{ printf "postgresql://agentevals:$(POSTGRES_PASSWORD)@%s.%s.svc.cluster.local:5432/agentevals?sslmode=disable" (include "agentevals.postgresqlServiceName" .) (include "agentevals.namespace" .) | quote }}
87+
{{- else }}
88+
{{ fail "storage.backend=postgres requires database.postgres.url, database.postgres.urlFile, or database.postgres.bundled.enabled=true" }}
89+
{{- end }}
90+
{{- end }}
6891
{{- with .Values.env }}
6992
{{- toYaml . | nindent 12 }}
7093
{{- end }}
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{{- if and (eq .Values.storage.backend "postgres") .Values.database.postgres.bundled.enabled (not .Values.database.postgres.url) (not .Values.database.postgres.urlFile) }}
2+
apiVersion: v1
3+
kind: Secret
4+
metadata:
5+
name: {{ include "agentevals.passwordSecretName" . }}
6+
namespace: {{ include "agentevals.namespace" . }}
7+
labels:
8+
{{- include "agentevals.labels" . | nindent 4 }}
9+
app.kubernetes.io/component: database
10+
type: Opaque
11+
data:
12+
POSTGRES_PASSWORD: {{ "agentevals" | b64enc | quote }}
13+
{{- end }}
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
{{- if and (eq .Values.storage.backend "postgres") .Values.database.postgres.bundled.enabled (not .Values.database.postgres.url) (not .Values.database.postgres.urlFile) }}
2+
{{- $pg := .Values.database.postgres.bundled }}
3+
{{- $fullname := include "agentevals.postgresqlServiceName" . }}
4+
---
5+
apiVersion: v1
6+
kind: ServiceAccount
7+
metadata:
8+
name: {{ $fullname }}
9+
namespace: {{ include "agentevals.namespace" . }}
10+
labels:
11+
{{- include "agentevals.labels" . | nindent 4 }}
12+
app.kubernetes.io/component: database
13+
---
14+
apiVersion: v1
15+
kind: PersistentVolumeClaim
16+
metadata:
17+
name: {{ $fullname }}
18+
namespace: {{ include "agentevals.namespace" . }}
19+
labels:
20+
{{- include "agentevals.labels" . | nindent 4 }}
21+
app.kubernetes.io/component: database
22+
spec:
23+
accessModes:
24+
- ReadWriteOnce
25+
{{- if $pg.storageClassName }}
26+
storageClassName: {{ $pg.storageClassName | quote }}
27+
{{- end }}
28+
resources:
29+
requests:
30+
storage: {{ $pg.storage | quote }}
31+
---
32+
apiVersion: apps/v1
33+
kind: Deployment
34+
metadata:
35+
name: {{ $fullname }}
36+
namespace: {{ include "agentevals.namespace" . }}
37+
labels:
38+
{{- include "agentevals.labels" . | nindent 4 }}
39+
app.kubernetes.io/component: database
40+
spec:
41+
replicas: 1
42+
strategy:
43+
type: Recreate
44+
selector:
45+
matchLabels:
46+
{{- include "agentevals.selectorLabels" . | nindent 6 }}
47+
app.kubernetes.io/component: database
48+
template:
49+
metadata:
50+
labels:
51+
{{- include "agentevals.selectorLabels" . | nindent 8 }}
52+
app.kubernetes.io/component: database
53+
spec:
54+
{{- with .Values.imagePullSecrets }}
55+
imagePullSecrets:
56+
{{- toYaml . | nindent 8 }}
57+
{{- end }}
58+
serviceAccountName: {{ $fullname }}
59+
securityContext:
60+
fsGroup: 999
61+
runAsUser: 999
62+
runAsGroup: 999
63+
runAsNonRoot: true
64+
containers:
65+
- name: postgresql
66+
image: {{ include "agentevals.postgresql.image" . }}
67+
imagePullPolicy: {{ $pg.image.pullPolicy }}
68+
securityContext:
69+
allowPrivilegeEscalation: false
70+
ports:
71+
- name: postgresql
72+
containerPort: 5432
73+
protocol: TCP
74+
env:
75+
- name: POSTGRES_DB
76+
value: "agentevals"
77+
- name: POSTGRES_USER
78+
value: "agentevals"
79+
- name: POSTGRES_PASSWORD
80+
valueFrom:
81+
secretKeyRef:
82+
name: {{ include "agentevals.passwordSecretName" . }}
83+
key: POSTGRES_PASSWORD
84+
- name: PGDATA
85+
value: /var/lib/postgresql/data/pgdata
86+
livenessProbe:
87+
exec:
88+
command:
89+
- pg_isready
90+
- -U
91+
- agentevals
92+
- -d
93+
- agentevals
94+
initialDelaySeconds: 20
95+
periodSeconds: 10
96+
timeoutSeconds: 5
97+
failureThreshold: 6
98+
successThreshold: 1
99+
readinessProbe:
100+
exec:
101+
command:
102+
- pg_isready
103+
- -U
104+
- agentevals
105+
- -d
106+
- agentevals
107+
initialDelaySeconds: 5
108+
periodSeconds: 5
109+
timeoutSeconds: 3
110+
failureThreshold: 3
111+
successThreshold: 1
112+
{{- with $pg.resources }}
113+
resources:
114+
{{- toYaml . | nindent 12 }}
115+
{{- end }}
116+
volumeMounts:
117+
- name: data
118+
mountPath: /var/lib/postgresql/data
119+
volumes:
120+
- name: data
121+
persistentVolumeClaim:
122+
claimName: {{ $fullname }}
123+
---
124+
apiVersion: v1
125+
kind: Service
126+
metadata:
127+
name: {{ $fullname }}
128+
namespace: {{ include "agentevals.namespace" . }}
129+
labels:
130+
{{- include "agentevals.labels" . | nindent 4 }}
131+
app.kubernetes.io/component: database
132+
spec:
133+
type: ClusterIP
134+
ports:
135+
- name: postgresql
136+
port: 5432
137+
targetPort: postgresql
138+
protocol: TCP
139+
selector:
140+
{{- include "agentevals.selectorLabels" . | nindent 4 }}
141+
app.kubernetes.io/component: database
142+
{{- end }}

charts/agentevals/templates/service.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,4 @@ spec:
2525
targetPort: mcp
2626
protocol: TCP
2727
selector:
28-
{{- include "agentevals.selectorLabels" . | nindent 4 }}
28+
{{- include "agentevals.app.selectorLabels" . | nindent 4 }}

0 commit comments

Comments
 (0)