Skip to content

Commit 78f0d68

Browse files
Merge pull request #135 from agentevals-dev/feature/add-durable-storage
Add foundational support for durable storage
2 parents bf9fe11 + 6b41538 commit 78f0d68

48 files changed

Lines changed: 4622 additions & 8 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

DEVELOPMENT.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,38 @@ make dev-frontend # start Vite dev server (port 5173) with HMR
2323
make dev-bundle # build UI, serve full bundled experience at port 8001 via uv run
2424
```
2525

26-
Standard development uses `dev-backend` + `dev-frontend` in separate terminals. The Vite dev server proxies nothing the frontend calls the backend at `http://localhost:8001` directly via CORS.
26+
Standard development uses `dev-backend` + `dev-frontend` in separate terminals. The Vite dev server proxies nothing; the frontend calls the backend at `http://localhost:8001` directly via CORS.
2727

2828
`dev-bundle` is useful for testing the bundled UI experience without building a wheel. It copies `ui/dist` into the source tree temporarily and cleans up when the server exits.
2929

30+
### Postgres backend (optional, for `/api/runs`)
31+
32+
> **Preview.** The schema, the CLI surface, and `/api/runs` shape are still
33+
> stabilizing. Recreate the agentevals schema between minor version upgrades
34+
> until further notice; do not depend on persisted data surviving a
35+
> `git pull` of agentevals itself.
36+
37+
The default in-memory backend keeps `make dev-backend` zero-config. To exercise the async run pipeline locally, bring up a Postgres alongside the app:
38+
39+
```bash
40+
make pg-up # start postgres:18.3-alpine in a docker container (port 5432, ephemeral via --rm)
41+
make migrate # apply the agentevals schema
42+
make dev-backend-pg # pg-up + migrate + serve --dev with backend=postgres wired up
43+
make pg-down # stop the container; data is discarded with --rm
44+
```
45+
46+
Override the defaults via `PG_PORT=5433 make pg-up` etc. The `migrate` target is idempotent (a second invocation is a no-op).
47+
48+
Once running, submit a run with:
49+
50+
```bash
51+
curl -X POST http://localhost:8001/api/runs \
52+
-H 'content-type: application/json' \
53+
-d '{"spec": {"approach": "trace_replay", "target": {"kind": "inline", "inline": {...}}, "evalConfig": {"metrics": ["tool_trajectory_avg_score"]}}}'
54+
```
55+
56+
Then poll `GET /api/runs/{runId}` and `GET /api/runs/{runId}/results`. Without `storage.backend=postgres`, the `/api/runs` endpoints return 503 with a hint pointing at the env var.
57+
3058
### Building
3159

3260
```bash

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ COPY src ./src
2424

2525
COPY --from=ui /build/ui/dist ./src/agentevals/_static
2626

27-
RUN uv sync --frozen --no-dev --extra live \
27+
RUN uv sync --frozen --no-dev --extra live --extra postgres \
2828
&& groupadd --gid 1000 app \
2929
&& useradd --uid 1000 --gid app --home-dir /app --no-log-init app \
3030
&& chown -R app:app /app

Makefile

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,14 @@ HELM_CHART_DIR ?= charts/agentevals
1515
HELM_CHART_OCI_URL ?= $(HELM_REPO)/helm
1616
HELM_CHART_VERSION ?= $(VERSION)
1717

18-
.PHONY: build build-bundle build-docker build-ui release clean dev-backend dev-frontend dev-bundle test test-unit test-integration test-e2e helm-lint helm-template helm-test helm-cleanup helm-package helm-publish
18+
.PHONY: build build-bundle build-docker build-ui release clean dev-backend dev-backend-pg dev-frontend dev-bundle pg-up pg-down migrate test test-unit test-integration test-e2e helm-lint helm-template helm-test helm-cleanup helm-package helm-publish
19+
20+
PG_CONTAINER ?= agentevals-pg
21+
PG_PORT ?= 5432
22+
PG_USER ?= agentevals
23+
PG_PASSWORD ?= agentevals
24+
PG_DATABASE ?= agentevals
25+
PG_DSN ?= postgresql://$(PG_USER):$(PG_PASSWORD)@localhost:$(PG_PORT)/$(PG_DATABASE)
1926

2027
build:
2128
uv build
@@ -53,6 +60,30 @@ release: clean build-ui
5360
dev-backend:
5461
uv run agentevals serve --dev
5562

63+
pg-up:
64+
@if [ -z "$$(docker ps -q -f name=^/$(PG_CONTAINER)$$)" ]; then \
65+
docker run -d --rm --name $(PG_CONTAINER) \
66+
-e POSTGRES_USER=$(PG_USER) \
67+
-e POSTGRES_PASSWORD=$(PG_PASSWORD) \
68+
-e POSTGRES_DB=$(PG_DATABASE) \
69+
-p $(PG_PORT):5432 postgres:18.3-alpine; \
70+
else \
71+
echo "container $(PG_CONTAINER) already running"; \
72+
fi
73+
@until docker exec $(PG_CONTAINER) pg_isready -U $(PG_USER) >/dev/null 2>&1; do sleep 1; done
74+
@echo "Postgres ready at $(PG_DSN)"
75+
76+
pg-down:
77+
-docker stop $(PG_CONTAINER)
78+
79+
migrate:
80+
AGENTEVALS_DATABASE_URL=$(PG_DSN) uv run agentevals migrate up
81+
82+
dev-backend-pg: pg-up migrate
83+
AGENTEVALS_STORAGE_BACKEND=postgres \
84+
AGENTEVALS_DATABASE_URL=$(PG_DSN) \
85+
uv run agentevals serve --dev
86+
5687
dev-frontend:
5788
cd ui && npm run dev
5889

README.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,31 @@ The source for the chart lives in [`charts/agentevals/`](charts/agentevals/) if
286286

287287
See the [Kubernetes example](examples/kubernetes/README.md) for an end-to-end walkthrough deploying agentevals alongside kagent and an OTel Collector on Kubernetes.
288288

289+
#### Postgres backend (`/api/runs`)
290+
291+
> **Preview.** Persistent run history backed by Postgres is under active
292+
> development. The `storage.*` and `database.postgres.*` chart values, the
293+
> `/api/runs` HTTP surface, and the database schema may change incompatibly
294+
> in upcoming releases. Operators evaluating this feature should plan to
295+
> recreate the agentevals schema when upgrading between minor versions.
296+
> Default in-memory mode is unaffected.
297+
298+
By default the chart deploys agentevals with an in-memory backend; runs and results are not persisted. To enable the async `POST /api/runs` pipeline with durable Postgres-backed state:
299+
300+
```bash
301+
# Bundled Postgres (dev / evaluation only):
302+
helm install agentevals oci://ghcr.io/agentevals-dev/agentevals/helm/agentevals \
303+
--set storage.backend=postgres \
304+
--set database.postgres.bundled.enabled=true
305+
306+
# Or supply an external Postgres DSN:
307+
helm install agentevals oci://ghcr.io/agentevals-dev/agentevals/helm/agentevals \
308+
--set storage.backend=postgres \
309+
--set database.postgres.url='postgresql://user:pass@host:5432/dbname'
310+
```
311+
312+
When `storage.backend=postgres` the app applies any pending schema migrations on startup (advisory-lock protected, safe across replicas) and starts an in-process worker that processes the run queue. Without `storage.backend=postgres` the `/api/runs` endpoints return 503 with a hint pointing at the env var.
313+
289314
## MCP Server
290315

291316
Exposes evaluation tools to MCP clients. A `.mcp.json` at the project root lets Claude Code pick it up automatically.

charts/agentevals/templates/NOTES.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,11 @@ Get the Service URL:
1111
kubectl --namespace {{ include "agentevals.namespace" . }} port-forward $POD_NAME {{ .Values.service.http.port }}:{{ .Values.service.http.port }}
1212

1313
Health check: GET http://<pod-ip>:{{ .Values.service.http.containerPort }}/api/health
14+
15+
{{- if eq .Values.storage.backend "postgres" }}
16+
17+
NOTE: Postgres-backed storage is a preview feature. The storage.* and
18+
database.postgres.* values, the /api/runs HTTP surface, and the database
19+
schema may change incompatibly in upcoming releases. Recreate the
20+
agentevals schema when upgrading between minor versions.
21+
{{- end }}

charts/agentevals/templates/_helpers.tpl

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,43 @@ app.kubernetes.io/name: {{ include "agentevals.name" . }}
4848
app.kubernetes.io/instance: {{ .Release.Name }}
4949
{{- end }}
5050

51+
{{- /*
52+
Selector labels scoped to the main app Pod and its Service. Carries the
53+
``app.kubernetes.io/component: agentevals`` discriminator so the agentevals
54+
Service does not also match the bundled Postgres Pod (which carries
55+
``app.kubernetes.io/component: database`` instead).
56+
*/ -}}
57+
{{- define "agentevals.app.selectorLabels" -}}
58+
{{ include "agentevals.selectorLabels" . }}
59+
app.kubernetes.io/component: agentevals
60+
{{- end }}
61+
5162
{{- define "agentevals.serviceAccountName" -}}
5263
{{- if .Values.serviceAccount.create }}
5364
{{- default (include "agentevals.fullname" .) .Values.serviceAccount.name }}
5465
{{- else }}
5566
{{- default "default" .Values.serviceAccount.name }}
5667
{{- end }}
5768
{{- end }}
69+
70+
{{/*
71+
Service name for the bundled Postgres instance.
72+
*/}}
73+
{{- define "agentevals.postgresqlServiceName" -}}
74+
{{- printf "%s-postgresql" (include "agentevals.fullname" .) -}}
75+
{{- end -}}
76+
77+
{{/*
78+
Bundled Postgres image reference (registry/repository/name:tag).
79+
*/}}
80+
{{- define "agentevals.postgresql.image" -}}
81+
{{- $pg := .Values.database.postgres.bundled -}}
82+
{{- printf "%s/%s/%s:%s" $pg.image.registry $pg.image.repository $pg.image.name $pg.image.tag -}}
83+
{{- end -}}
84+
85+
{{/*
86+
Secret name holding POSTGRES_PASSWORD for the bundled Postgres instance.
87+
*/}}
88+
{{- define "agentevals.passwordSecretName" -}}
89+
{{- printf "%s-postgresql" (include "agentevals.fullname" .) -}}
90+
{{- end -}}

charts/agentevals/templates/deployment.yaml

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ spec:
99
replicas: {{ .Values.replicaCount }}
1010
selector:
1111
matchLabels:
12-
{{- include "agentevals.selectorLabels" . | nindent 6 }}
12+
{{- include "agentevals.app.selectorLabels" . | nindent 6 }}
1313
template:
1414
metadata:
1515
{{- with .Values.podAnnotations }}
1616
annotations:
1717
{{- toYaml . | nindent 8 }}
1818
{{- end }}
1919
labels:
20-
{{- include "agentevals.selectorLabels" . | nindent 8 }}
20+
{{- include "agentevals.app.selectorLabels" . | nindent 8 }}
2121
{{- with .Values.podLabels }}
2222
{{- toYaml . | nindent 8 }}
2323
{{- end }}
@@ -65,6 +65,29 @@ spec:
6565
- name: HOME
6666
value: "/tmp/agentevals-home"
6767
{{- end }}
68+
{{- if eq .Values.storage.backend "postgres" }}
69+
- name: AGENTEVALS_STORAGE_BACKEND
70+
value: "postgres"
71+
- name: AGENTEVALS_DATABASE_SCHEMA
72+
value: {{ .Values.database.postgres.schema | quote }}
73+
{{- if .Values.database.postgres.urlFile }}
74+
- name: AGENTEVALS_DATABASE_URL_FILE
75+
value: {{ .Values.database.postgres.urlFile | quote }}
76+
{{- else if .Values.database.postgres.url }}
77+
- name: AGENTEVALS_DATABASE_URL
78+
value: {{ .Values.database.postgres.url | quote }}
79+
{{- else if .Values.database.postgres.bundled.enabled }}
80+
- name: POSTGRES_PASSWORD
81+
valueFrom:
82+
secretKeyRef:
83+
name: {{ include "agentevals.passwordSecretName" . }}
84+
key: POSTGRES_PASSWORD
85+
- name: AGENTEVALS_DATABASE_URL
86+
value: {{ printf "postgresql://agentevals:$(POSTGRES_PASSWORD)@%s.%s.svc.cluster.local:5432/agentevals?sslmode=disable" (include "agentevals.postgresqlServiceName" .) (include "agentevals.namespace" .) | quote }}
87+
{{- else }}
88+
{{ fail "storage.backend=postgres requires database.postgres.url, database.postgres.urlFile, or database.postgres.bundled.enabled=true" }}
89+
{{- end }}
90+
{{- end }}
6891
{{- with .Values.env }}
6992
{{- toYaml . | nindent 12 }}
7093
{{- end }}
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{{- if and (eq .Values.storage.backend "postgres") .Values.database.postgres.bundled.enabled (not .Values.database.postgres.url) (not .Values.database.postgres.urlFile) }}
2+
apiVersion: v1
3+
kind: Secret
4+
metadata:
5+
name: {{ include "agentevals.passwordSecretName" . }}
6+
namespace: {{ include "agentevals.namespace" . }}
7+
labels:
8+
{{- include "agentevals.labels" . | nindent 4 }}
9+
app.kubernetes.io/component: database
10+
type: Opaque
11+
data:
12+
POSTGRES_PASSWORD: {{ "agentevals" | b64enc | quote }}
13+
{{- end }}
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
{{- if and (eq .Values.storage.backend "postgres") .Values.database.postgres.bundled.enabled (not .Values.database.postgres.url) (not .Values.database.postgres.urlFile) }}
2+
{{- $pg := .Values.database.postgres.bundled }}
3+
{{- $fullname := include "agentevals.postgresqlServiceName" . }}
4+
---
5+
apiVersion: v1
6+
kind: ServiceAccount
7+
metadata:
8+
name: {{ $fullname }}
9+
namespace: {{ include "agentevals.namespace" . }}
10+
labels:
11+
{{- include "agentevals.labels" . | nindent 4 }}
12+
app.kubernetes.io/component: database
13+
---
14+
apiVersion: v1
15+
kind: PersistentVolumeClaim
16+
metadata:
17+
name: {{ $fullname }}
18+
namespace: {{ include "agentevals.namespace" . }}
19+
labels:
20+
{{- include "agentevals.labels" . | nindent 4 }}
21+
app.kubernetes.io/component: database
22+
spec:
23+
accessModes:
24+
- ReadWriteOnce
25+
{{- if $pg.storageClassName }}
26+
storageClassName: {{ $pg.storageClassName | quote }}
27+
{{- end }}
28+
resources:
29+
requests:
30+
storage: {{ $pg.storage | quote }}
31+
---
32+
apiVersion: apps/v1
33+
kind: Deployment
34+
metadata:
35+
name: {{ $fullname }}
36+
namespace: {{ include "agentevals.namespace" . }}
37+
labels:
38+
{{- include "agentevals.labels" . | nindent 4 }}
39+
app.kubernetes.io/component: database
40+
spec:
41+
replicas: 1
42+
strategy:
43+
type: Recreate
44+
selector:
45+
matchLabels:
46+
{{- include "agentevals.selectorLabels" . | nindent 6 }}
47+
app.kubernetes.io/component: database
48+
template:
49+
metadata:
50+
labels:
51+
{{- include "agentevals.selectorLabels" . | nindent 8 }}
52+
app.kubernetes.io/component: database
53+
spec:
54+
{{- with .Values.imagePullSecrets }}
55+
imagePullSecrets:
56+
{{- toYaml . | nindent 8 }}
57+
{{- end }}
58+
serviceAccountName: {{ $fullname }}
59+
securityContext:
60+
fsGroup: 999
61+
runAsUser: 999
62+
runAsGroup: 999
63+
runAsNonRoot: true
64+
containers:
65+
- name: postgresql
66+
image: {{ include "agentevals.postgresql.image" . }}
67+
imagePullPolicy: {{ $pg.image.pullPolicy }}
68+
securityContext:
69+
allowPrivilegeEscalation: false
70+
ports:
71+
- name: postgresql
72+
containerPort: 5432
73+
protocol: TCP
74+
env:
75+
- name: POSTGRES_DB
76+
value: "agentevals"
77+
- name: POSTGRES_USER
78+
value: "agentevals"
79+
- name: POSTGRES_PASSWORD
80+
valueFrom:
81+
secretKeyRef:
82+
name: {{ include "agentevals.passwordSecretName" . }}
83+
key: POSTGRES_PASSWORD
84+
- name: PGDATA
85+
value: /var/lib/postgresql/data/pgdata
86+
livenessProbe:
87+
exec:
88+
command:
89+
- pg_isready
90+
- -U
91+
- agentevals
92+
- -d
93+
- agentevals
94+
initialDelaySeconds: 20
95+
periodSeconds: 10
96+
timeoutSeconds: 5
97+
failureThreshold: 6
98+
successThreshold: 1
99+
readinessProbe:
100+
exec:
101+
command:
102+
- pg_isready
103+
- -U
104+
- agentevals
105+
- -d
106+
- agentevals
107+
initialDelaySeconds: 5
108+
periodSeconds: 5
109+
timeoutSeconds: 3
110+
failureThreshold: 3
111+
successThreshold: 1
112+
{{- with $pg.resources }}
113+
resources:
114+
{{- toYaml . | nindent 12 }}
115+
{{- end }}
116+
volumeMounts:
117+
- name: data
118+
mountPath: /var/lib/postgresql/data
119+
volumes:
120+
- name: data
121+
persistentVolumeClaim:
122+
claimName: {{ $fullname }}
123+
---
124+
apiVersion: v1
125+
kind: Service
126+
metadata:
127+
name: {{ $fullname }}
128+
namespace: {{ include "agentevals.namespace" . }}
129+
labels:
130+
{{- include "agentevals.labels" . | nindent 4 }}
131+
app.kubernetes.io/component: database
132+
spec:
133+
type: ClusterIP
134+
ports:
135+
- name: postgresql
136+
port: 5432
137+
targetPort: postgresql
138+
protocol: TCP
139+
selector:
140+
{{- include "agentevals.selectorLabels" . | nindent 4 }}
141+
app.kubernetes.io/component: database
142+
{{- end }}

charts/agentevals/templates/service.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,4 @@ spec:
2525
targetPort: mcp
2626
protocol: TCP
2727
selector:
28-
{{- include "agentevals.selectorLabels" . | nindent 4 }}
28+
{{- include "agentevals.app.selectorLabels" . | nindent 4 }}

0 commit comments

Comments
 (0)