feat: add evals infra #20

hyf0 · 2026-01-29T14:20:39Z

See https://github.com/vuejs-ai/skills/blob/1fe81810af24cce9ce7f100e492d8e0f94932e67/evals/README.md to understand how it works
Each skill * reference is a situation we need to test.
Each situation will be tested with 3 model tiers, haiku, sonnet, opus. We could add codex's model later.
Each situation will have 3 evaluations to check if it works.
Each evaluation will be checked via two steps
- One is the baseline that doesn't contain skills
- One is the with-skill that contains the skills
  - Each step will run two times to ensure stablity

Copilot

Pull request overview

This PR adds an evaluation infrastructure to systematically test and validate Vue.js skills. The system runs automated tests comparing model performance with and without skills installed.

Changes:

Added 27 evaluation scenarios across 9 skill categories
Each scenario includes a self-contained Vue project with test assertions
Implemented CLI runner with support for multiple models and tiers

Reviewed changes

Copilot reviewed 266 out of 356 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
evals/package.json	Package configuration for evals workspace
evals/lib/types.ts	TypeScript type definitions for eval system
evals/lib/runner.ts	Core eval execution engine with Claude Code integration
evals/lib/skill-coverage.test.ts	Automated test ensuring all skills have evals
evals/suites/skills/*/eval.json	Eval configurations with queries and expected behaviors
evals/suites/skills/*/eval.ts	Vitest test files validating generated code patterns
evals/suites/skills/*/results.json	Recorded results with timestamps from future dates
evals/README.md	Documentation for eval system usage
AGENTS.md	Development workflow and evaluation guidelines
README.md	Updated validation process documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

serkodev · 2026-01-29T17:48:56Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bfa569153

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

evals/lib/cli.ts

evals/lib/runner.ts

serkodev

LGTM, optional to follow the review by CodeX

feat: add evals infra

1fe8181

Copilot AI review requested due to automatic review settings January 29, 2026 14:20

Copilot AI reviewed Jan 29, 2026

View reviewed changes

serkodev added 2 commits January 30, 2026 01:41

feat: copy template files for eval suites

f7faf91

chore: remove duplicated files in eval suites

2bfa569

chatgpt-codex-connector bot reviewed Jan 29, 2026

View reviewed changes

evals/lib/cli.ts Show resolved Hide resolved

evals/lib/runner.ts Outdated Show resolved Hide resolved

evals/lib/runner.ts Outdated Show resolved Hide resolved

serkodev approved these changes Jan 29, 2026

View reviewed changes

Apply comments

8c6dc62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add evals infra #20

feat: add evals infra #20

hyf0 commented Jan 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

serkodev commented Jan 29, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

serkodev left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add evals infra #20

Are you sure you want to change the base?

feat: add evals infra #20

Conversation

hyf0 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

serkodev commented Jan 29, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

serkodev left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hyf0 commented Jan 29, 2026 •

edited

Loading