-
Notifications
You must be signed in to change notification settings - Fork 50
feat: add evals infra #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds an evaluation infrastructure to systematically test and validate Vue.js skills. The system runs automated tests comparing model performance with and without skills installed.
Changes:
- Added 27 evaluation scenarios across 9 skill categories
- Each scenario includes a self-contained Vue project with test assertions
- Implemented CLI runner with support for multiple models and tiers
Reviewed changes
Copilot reviewed 266 out of 356 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| evals/package.json | Package configuration for evals workspace |
| evals/lib/types.ts | TypeScript type definitions for eval system |
| evals/lib/runner.ts | Core eval execution engine with Claude Code integration |
| evals/lib/skill-coverage.test.ts | Automated test ensuring all skills have evals |
| evals/suites/skills/*/eval.json | Eval configurations with queries and expected behaviors |
| evals/suites/skills/*/eval.ts | Vitest test files validating generated code patterns |
| evals/suites/skills/*/results.json | Recorded results with timestamps from future dates |
| evals/README.md | Documentation for eval system usage |
| AGENTS.md | Development workflow and evaluation guidelines |
| README.md | Updated validation process documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2bfa569153
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
serkodev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, optional to follow the review by CodeX
baselinethat doesn't contain skillswith-skillthat contains the skills