Evals for LLMs to learn and benchmark their Svelte skills.
This repository is split into two main concepts:
- Evals live in
evals/<name>and include a prompt plus a runnable project and tests. - Experiments live in
experiments/<name>.tsand configure how evals are executed with@vercel/agent-eval.
- Install dependencies:
pnpm install
- Configure environment variables:
Fill in
cp .env.example .env
AI_GATEWAY_API_KEYplus eitherVERCEL_TOKENorVERCEL_OIDC_TOKEN.
Experiments are defined in experiments/*.ts. The CLI expects the evals/ folder to be a
sibling of experiments/.
# Run a single experiment by name (experiments/basic.ts)
npx @vercel/agent-eval basic
# Or run by path
npx @vercel/agent-eval experiments/basic.ts
# Run every experiment in the repository
npx @vercel/agent-evalResults are written to results/<experiment-name>/<timestamp>/.
The agent-eval CLI exposes a playground command that launches
@vercel/agent-eval-playground under the hood:
npx @vercel/agent-eval-playground --results-dir ./results --evals-dir ./evals --port 3000Open the URL it prints (default: http://localhost:3000) to browse results.
Use the script in scripts/add-eval.ts to scaffold a new eval:
pnpm run add-evalThe script will:
- Create
evals/<your-eval-name>/fromassets/default-project/ - Write your prompt to
evals/<your-eval-name>/PROMPT.md
Afterward, edit EVAL.ts and any tests inside the new eval to define success criteria.
- Add a new file in
experiments/(for example,experiments/my-experiment.ts). - Export an
ExperimentConfigusing the shared helper:
import { experiment } from '../shared/experiment-base.ts';
export default experiment({
evals: ['my-eval'],
runs: 2,
editPrompt(prompt) {
return `${prompt}\n\nExtra instructions...`;
},
});- Run it with:
npx @vercel/agent-eval my-experiment