Skip to content

Conversation

@hyf0
Copy link
Collaborator

@hyf0 hyf0 commented Jan 29, 2026

  • See https://github.com/vuejs-ai/skills/blob/1fe81810af24cce9ce7f100e492d8e0f94932e67/evals/README.md to understand how it works
  • Each skill * reference is a situation we need to test.
  • Each situation will be tested with 3 model tiers, haiku, sonnet, opus. We could add codex's model later.
  • Each situation will have 3 evaluations to check if it works.
  • Each evaluation will be checked via two steps
    • One is the baseline that doesn't contain skills
    • One is the with-skill that contains the skills
      • Each step will run two times to ensure stablity

Copilot AI review requested due to automatic review settings January 29, 2026 14:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an evaluation infrastructure to systematically test and validate Vue.js skills. The system runs automated tests comparing model performance with and without skills installed.

Changes:

  • Added 27 evaluation scenarios across 9 skill categories
  • Each scenario includes a self-contained Vue project with test assertions
  • Implemented CLI runner with support for multiple models and tiers

Reviewed changes

Copilot reviewed 266 out of 356 changed files in this pull request and generated no comments.

Show a summary per file
File Description
evals/package.json Package configuration for evals workspace
evals/lib/types.ts TypeScript type definitions for eval system
evals/lib/runner.ts Core eval execution engine with Claude Code integration
evals/lib/skill-coverage.test.ts Automated test ensuring all skills have evals
evals/suites/skills/*/eval.json Eval configurations with queries and expected behaviors
evals/suites/skills/*/eval.ts Vitest test files validating generated code patterns
evals/suites/skills/*/results.json Recorded results with timestamps from future dates
evals/README.md Documentation for eval system usage
AGENTS.md Development workflow and evaluation guidelines
README.md Updated validation process documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@serkodev
Copy link
Member

@codex review this PR

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bfa569153

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Member

@serkodev serkodev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, optional to follow the review by CodeX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants