Skip to content

eval: add Gatekeeper benchmark visualization interface#412

Open
Jazzcort wants to merge 1 commit intorhel-lightspeed:mainfrom
Jazzcort:eval-render-app
Open

eval: add Gatekeeper benchmark visualization interface#412
Jazzcort wants to merge 1 commit intorhel-lightspeed:mainfrom
Jazzcort:eval-render-app

Conversation

@Jazzcort
Copy link
Copy Markdown
Contributor

Introduces a new standalone HTML page (eval/gatekeeper/index.html) to visualize and compare Gatekeeper model benchmark results.

Key features include:

  • Dynamic loading of test results from JSON files via a manifest or local directory listing (/data/).
  • A comparison grid mapping test cases against different models.
  • Color-coded cells representing test status (e.g., exact matches, mismatches, and safety status regressions/improvements).
  • Automated score calculation and display for each model.
  • An interactive, detailed popup view to inspect specific test case data (scripts, expected vs. actual results, and metadata) on click.

@Jazzcort Jazzcort requested a review from a team as a code owner April 16, 2026 19:09
@github-actions
Copy link
Copy Markdown

For team members: test commit 8d7f7a9 in internal GitLab

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
unittests 96.46% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Introduces a new standalone HTML page (`eval/gatekeeper/index.html`) to
visualize and compare Gatekeeper model benchmark results.

Key features include:
- Dynamic loading of test results from JSON files via a manifest or
  local directory listing (`/data/`).
- A comparison grid mapping test cases against different models.
- Color-coded cells representing test status (e.g., exact matches,
  mismatches, and safety status regressions/improvements).
- Automated score calculation and display for each model.
- An interactive, detailed popup view to inspect specific test case
  data (scripts, expected vs. actual results, and metadata) on click.
@github-actions
Copy link
Copy Markdown

For team members: test commit 550764d in internal GitLab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant